1# Nanopb: Basic concepts 2 3The things outlined here are the underlying concepts of the nanopb 4design. 5 6## Proto files 7 8All Protocol Buffers implementations use .proto files to describe the 9message format. The point of these files is to be a portable interface 10description language. 11 12### Compiling .proto files for nanopb 13 14Nanopb comes with a Python script to generate `.pb.c` and 15`.pb.h` files from the `.proto` definition: 16 17 user@host:~$ nanopb/generator/nanopb_generator.py message.proto 18 Writing to message.pb.h and message.pb.c 19 20Internally this script uses Google `protoc` to parse the 21input file. If you do not have it available, you may receive an error 22message. You can install either `grpcio-tools` Python 23package using `pip`, or the `protoc` compiler 24itself from `protobuf-compiler` distribution package. 25Generally the Python package is recommended, because nanopb requires 26protoc version 3.6 or newer to support all features, and some distributions come with an older 27version. 28 29### Modifying generator behaviour 30 31Using generator options, you can set maximum sizes for fields in order 32to allocate them statically. The preferred way to do this is to create 33an .options file with the same name as your .proto file: 34 35 # Foo.proto 36 message Foo { 37 required string name = 1; 38 } 39 40 # Foo.options 41 Foo.name max_size:16 42 43For more information on this, see the [Proto file 44options](reference.html#proto-file-options) section in the reference 45manual. 46 47## Streams 48 49Nanopb uses streams for accessing the data in encoded format. The stream 50abstraction is very lightweight, and consists of a structure 51(`pb_ostream_t` or `pb_istream_t`) which contains a pointer to a 52callback function. 53 54There are a few generic rules for callback functions: 55 561) Return false on IO errors. The encoding or decoding process will 57 abort immediately. 582) Use state to store your own data, such as a file descriptor. 593) `bytes_written` and `bytes_left` are updated by pb_write and 60 pb_read. 614) Your callback may be used with substreams. In this case 62 `bytes_left`, `bytes_written` and `max_size` have smaller values 63 than the original stream. Don't use these values to calculate 64 pointers. 655) Always read or write the full requested length of data. For example, 66 POSIX `recv()` needs the `MSG_WAITALL` parameter to accomplish 67 this. 68 69### Output streams 70 71 struct _pb_ostream_t 72 { 73 bool (*callback)(pb_ostream_t *stream, const uint8_t *buf, size_t count); 74 void *state; 75 size_t max_size; 76 size_t bytes_written; 77 }; 78 79The `callback` for output stream may be NULL, in which case the stream 80simply counts the number of bytes written. In this case, `max_size` is 81ignored. 82 83Otherwise, if `bytes_written` + bytes_to_be_written is larger than 84`max_size`, pb_write returns false before doing anything else. If you 85don\'t want to limit the size of the stream, pass SIZE_MAX. 86 87**Example 1:** 88 89This is the way to get the size of the message without storing it 90anywhere: 91 92 Person myperson = ...; 93 pb_ostream_t sizestream = {0}; 94 pb_encode(&sizestream, Person_fields, &myperson); 95 printf("Encoded size is %d\n", sizestream.bytes_written); 96 97**Example 2:** 98 99Writing to stdout: 100 101 bool callback(pb_ostream_t `stream, const uint8_t `buf, size_t count) 102 { 103 FILE *file = (FILE*) stream->state; 104 return fwrite(buf, 1, count, file) == count; 105 } 106 107 pb_ostream_t stdoutstream = {&callback, stdout, SIZE_MAX, 0}; 108 109### Input streams 110 111For input streams, there is one extra rule: 112 1136) You don't need to know the length of the message in advance. After 114 getting EOF error when reading, set `bytes_left` to 0 and return 115 `false`. `pb_decode()` will detect this and if the EOF was in a proper 116 position, it will return true. 117 118Here is the structure: 119 120 struct _pb_istream_t 121 { 122 bool (*callback)(pb_istream_t *stream, uint8_t *buf, size_t count); 123 void *state; 124 size_t bytes_left; 125 }; 126 127The `callback` must always be a function pointer. `Bytes_left` is an 128upper limit on the number of bytes that will be read. You can use 129SIZE_MAX if your callback handles EOF as described above. 130 131**Example:** 132 133This function binds an input stream to stdin: 134 135 bool callback(pb_istream_t *stream, uint8_t *buf, size_t count) 136 { 137 FILE *file = (FILE*)stream->state; 138 bool status; 139 140 if (buf == NULL) 141 { 142 while (count-- && fgetc(file) != EOF); 143 return count == 0; 144 } 145 146 status = (fread(buf, 1, count, file) == count); 147 148 if (feof(file)) 149 stream->bytes_left = 0; 150 151 return status; 152 } 153 154 pb_istream_t stdinstream = {&callback, stdin, SIZE_MAX}; 155 156## Data types 157 158Most Protocol Buffers datatypes have directly corresponding C datatypes, 159such as `int32` is `int32_t`, `float` is `float` and `bool` is `bool`. However, the 160variable-length datatypes are more complex: 161 1621) Strings, bytes and repeated fields of any type map to callback 163 functions by default. 1642) If there is a special option `(nanopb).max_size` specified in the 165 .proto file, string maps to null-terminated char array and bytes map 166 to a structure containing a char array and a size field. 1673) If `(nanopb).fixed_length` is set to `true` and 168 `(nanopb).max_size` is also set, then bytes map to an inline byte 169 array of fixed size. 1704) If there is a special option `(nanopb).max_count` specified on a 171 repeated field, it maps to an array of whatever type is being 172 repeated. Another field will be created for the actual number of 173 entries stored. 1745) If `(nanopb).fixed_count` is set to `true` and 175 `(nanopb).max_count` is also set, the field for the actual number 176 of entries will not by created as the count is always assumed to be 177 max count. 178 179### Examples of .proto specifications vs. generated structure 180 181**Simple integer field:**\ 182.proto: `int32 age = 1;`\ 183.pb.h: `int32_t age;` 184 185**String with unknown length:**\ 186.proto: `string name = 1;`\ 187.pb.h: `pb_callback_t name;` 188 189**String with known maximum length:**\ 190.proto: `string name = 1 [(nanopb).max_length = 40];`\ 191.pb.h: `char name[41];` 192 193**Repeated string with unknown count:**\ 194.proto: `repeated string names = 1;`\ 195.pb.h: `pb_callback_t names;` 196 197**Repeated string with known maximum count and size:**\ 198.proto: `repeated string names = 1 [(nanopb).max_length = 40, (nanopb).max_count = 5];`\ 199.pb.h: `size_t names_count;` `char names[5][41];` 200 201**Bytes field with known maximum size:**\ 202.proto: `bytes data = 1 [(nanopb).max_size = 16];`\ 203.pb.h: `PB_BYTES_ARRAY_T(16) data;`, where the struct contains `{pb_size_t size; pb_byte_t bytes[n];}` 204 205**Bytes field with fixed length:**\ 206.proto: `bytes data = 1 [(nanopb).max_size = 16, (nanopb).fixed_length = true];`\ 207.pb.h: `pb_byte_t data[16];` 208 209**Repeated integer array with known maximum size:**\ 210.proto: `repeated int32 numbers = 1 [(nanopb).max_count = 5];`\ 211.pb.h: `pb_size_t numbers_count;` `int32_t numbers[5];` 212 213**Repeated integer array with fixed count:**\ 214.proto: `repeated int32 numbers = 1 [(nanopb).max_count = 5, (nanopb).fixed_count = true];`\ 215.pb.h: `int32_t numbers[5];` 216 217The maximum lengths are checked in runtime. If string/bytes/array 218exceeds the allocated length, `pb_decode()` will return false. 219 220> **Note:** For the `bytes` datatype, the field length checking may not be 221exact. The compiler may add some padding to the `pb_bytes_t` 222structure, and the nanopb runtime doesn't know how much of the 223structure size is padding. Therefore it uses the whole length of the 224structure for storing data, which is not very smart but shouldn't cause 225problems. In practise, this means that if you specify 226`(nanopb).max_size=5` on a `bytes` field, you may be able to store 6 227bytes there. For the `string` field type, the length limit is exact. 228 229> **Note:** The decoder only keeps track of one `fixed_count` repeated field at a time. Usually this it not an issue because all elements of a repeated field occur end-to-end. Interleaved array elements of several `fixed_count` repeated fields would be a valid protobuf message, but would get rejected by nanopb decoder with error `"wrong size for fixed count field"`. 230 231## Field callbacks 232 233The easiest way to handle repeated fields is to specify a maximum size for 234them, as shown in the previous section. However, sometimes you need to be 235able to handle arrays with unlimited length, possibly larger than available 236RAM memory. 237 238For these cases, nanopb provides a callback interface. Nanopb core invokes 239the callback function when it gets to the specific field in the message. 240Your code can then handle the field in custom ways, for example decode 241the data piece-by-piece and store to filesystem. 242 243The [pb_callback_t](reference.html#pb-callback-t) structure contains a 244function pointer and a `void` pointer called `arg` you can use for 245passing data to the callback. If the function pointer is NULL, the field 246will be skipped. A pointer to the `arg` is passed to the function, so 247that it can modify it and retrieve the value. 248 249The actual behavior of the callback function is different in encoding 250and decoding modes. In encoding mode, the callback is called once and 251should write out everything, including field tags. In decoding mode, the 252callback is called repeatedly for every data item. 253 254To write more complex field callbacks, it is recommended to read the 255[Google Protobuf Encoding Specification](https://developers.google.com/protocol-buffers/docs/encoding). 256 257### Encoding callbacks 258 259 bool (*encode)(pb_ostream_t *stream, const pb_field_iter_t *field, void * const *arg); 260 261| | | 262| ---------- | ------------------------------------------------------------------ | 263| `stream` | Output stream to write to | 264| `field` | Iterator for the field currently being encoded or decoded. | 265| `arg` | Pointer to the `arg` field in the `pb_callback_t` structure. | 266 267When encoding, the callback should write out complete fields, including 268the wire type and field number tag. It can write as many or as few 269fields as it likes. For example, if you want to write out an array as 270`repeated` field, you should do it all in a single call. 271 272Usually you can use [pb_encode_tag_for_field](reference.html#pb-encode-tag-for-field) to 273encode the wire type and tag number of the field. However, if you want 274to encode a repeated field as a packed array, you must call 275[pb_encode_tag](reference.html#pb-encode-tag) instead to specify a 276wire type of `PB_WT_STRING`. 277 278If the callback is used in a submessage, it will be called multiple 279times during a single call to [pb_encode](reference.html#pb-encode). In 280this case, it must produce the same amount of data every time. If the 281callback is directly in the main message, it is called only once. 282 283This callback writes out a dynamically sized string: 284 285 bool write_string(pb_ostream_t *stream, const pb_field_iter_t *field, void * const *arg) 286 { 287 char *str = get_string_from_somewhere(); 288 if (!pb_encode_tag_for_field(stream, field)) 289 return false; 290 291 return pb_encode_string(stream, (uint8_t*)str, strlen(str)); 292 } 293 294### Decoding callbacks 295 296 bool (*decode)(pb_istream_t *stream, const pb_field_iter_t *field, void **arg); 297 298| | | 299| ---------- | ------------------------------------------------------------------ | 300| `stream` | Input stream to read from | 301| `field` | Iterator for the field currently being encoded or decoded. | 302| `arg` | Pointer to the `arg` field in the `pb_callback_t` structure. | 303 304When decoding, the callback receives a length-limited substring that 305reads the contents of a single field. The field tag has already been 306read. For `string` and `bytes`, the length value has already been 307parsed, and is available at `stream->bytes_left`. 308 309The callback will be called multiple times for repeated fields. For 310packed fields, you can either read multiple values until the stream 311ends, or leave it to [pb_decode](reference.html#pb-decode) to call your 312function over and over until all values have been read. 313 314This callback reads multiple integers and prints them: 315 316 bool read_ints(pb_istream_t *stream, const pb_field_iter_t *field, void **arg) 317 { 318 while (stream->bytes_left) 319 { 320 uint64_t value; 321 if (!pb_decode_varint(stream, &value)) 322 return false; 323 printf("%lld\n", value); 324 } 325 return true; 326 } 327 328### Function name bound callbacks 329 330 bool MyMessage_callback(pb_istream_t *istream, pb_ostream_t *ostream, const pb_field_iter_t *field); 331 332| | | 333| ---------- | ------------------------------------------------------------------ | 334| `istream` | Input stream to read from, or NULL if called in encoding context. | 335| `ostream` | Output stream to write to, or NULL if called in decoding context. | 336| `field` | Iterator for the field currently being encoded or decoded. | 337 338Storing function pointer in `pb_callback_t` fields inside 339the message requires extra storage space and is often cumbersome. As an 340alternative, the generator options `callback_function` and 341`callback_datatype` can be used to bind a callback function 342based on its name. 343 344Typically this feature is used by setting `callback_datatype` to e.g. `void\*` or even a struct type used to store encoded or decoded data. 345The generator will automatically set `callback_function` to `MessageName_callback` and produce a prototype for it in generated `.pb.h`. 346By implementing this function in your own code, you will receive callbacks for fields without having to separately set function pointers. 347 348If you want to use function name bound callbacks for some fields and 349`pb_callback_t` for other fields, you can call 350`pb_default_field_callback` from the message-level 351callback. It will then read a function pointer from 352`pb_callback_t` and call it. 353 354## Message descriptor 355 356For using the `pb_encode()` and `pb_decode()` functions, you need a 357description of all the fields contained in a message. This description 358is usually autogenerated from .proto file. 359 360For example this submessage in the Person.proto file: 361 362~~~~ protobuf 363message Person { 364 message PhoneNumber { 365 required string number = 1 [(nanopb).max_size = 40]; 366 optional PhoneType type = 2 [default = HOME]; 367 } 368} 369~~~~ 370 371This in turn generates a macro list in the `.pb.h` file: 372 373 #define Person_PhoneNumber_FIELDLIST(X, a) \ 374 X(a, STATIC, REQUIRED, STRING, number, 1) \ 375 X(a, STATIC, OPTIONAL, UENUM, type, 2) 376 377Inside the `.pb.c` file there is a macro call to 378`PB_BIND`: 379 380 PB_BIND(Person_PhoneNumber, Person_PhoneNumber, AUTO) 381 382These macros will in combination generate `pb_msgdesc_t` 383structure and associated lists: 384 385 const uint32_t Person_PhoneNumber_field_info[] = { ... }; 386 const pb_msgdesc_t * const Person_PhoneNumber_submsg_info[] = { ... }; 387 const pb_msgdesc_t Person_PhoneNumber_msg = { 388 2, 389 Person_PhoneNumber_field_info, 390 Person_PhoneNumber_submsg_info, 391 Person_PhoneNumber_DEFAULT, 392 NULL, 393 }; 394 395The encoding and decoding functions take a pointer to this structure and 396use it to process each field in the message. 397 398## Oneof 399 400Protocol Buffers supports 401[oneof](https://developers.google.com/protocol-buffers/docs/reference/proto2-spec#oneof_and_oneof_field) 402sections, where only one of the fields contained within can be present. Here is an example of `oneof` usage: 403 404~~~~ protobuf 405message MsgType1 { 406 required int32 value = 1; 407} 408 409message MsgType2 { 410 required bool value = 1; 411} 412 413message MsgType3 { 414 required int32 value1 = 1; 415 required int32 value2 = 2; 416} 417 418message MyMessage { 419 required uint32 uid = 1; 420 required uint32 pid = 2; 421 required uint32 utime = 3; 422 423 oneof payload { 424 MsgType1 msg1 = 4; 425 MsgType2 msg2 = 5; 426 MsgType3 msg3 = 6; 427 } 428} 429~~~~ 430 431Nanopb will generate `payload` as a C union and add an additional field 432`which_payload`: 433 434 typedef struct _MyMessage { 435 uint32_t uid; 436 uint32_t pid; 437 uint32_t utime; 438 pb_size_t which_payload; 439 union { 440 MsgType1 msg1; 441 MsgType2 msg2; 442 MsgType3 msg3; 443 } payload; 444 } MyMessage; 445 446`which_payload` indicates which of the `oneof` fields is actually set. 447The user is expected to set the field manually using the correct field 448tag: 449 450 MyMessage msg = MyMessage_init_zero; 451 msg.payload.msg2.value = true; 452 msg.which_payload = MyMessage_msg2_tag; 453 454Notice that neither `which_payload` field nor the unused fields in 455`payload` will consume any space in the resulting encoded message. 456 457When a field inside `oneof` contains `pb_callback_t` 458fields, the callback values cannot be set before decoding. This is 459because the different fields share the same storage space in C 460`union`. Instead either function name bound callbacks or a 461separate message level callback can be used. See 462[tests/oneof_callback](https://github.com/nanopb/nanopb/tree/master/tests/oneof_callback) 463for an example on this. 464 465## Extension fields 466 467Protocol Buffers supports a concept of [extension 468fields](https://developers.google.com/protocol-buffers/docs/proto#extensions), 469which are additional fields to a message, but defined outside the actual 470message. The definition can even be in a completely separate .proto 471file. 472 473The base message is declared as extensible by keyword `extensions` in 474the .proto file: 475 476~~~~ protobuf 477message MyMessage { 478 .. fields .. 479 extensions 100 to 199; 480} 481~~~~ 482 483For each extensible message, `nanopb_generator.py` declares an 484additional callback field called `extensions`. The field and associated 485datatype `pb_extension_t` forms a linked list of handlers. When an 486unknown field is encountered, the decoder calls each handler in turn 487until either one of them handles the field, or the list is exhausted. 488 489The actual extensions are declared using the `extend` keyword in the 490.proto, and are in the global namespace: 491 492~~~~ protobuf 493extend MyMessage { 494 optional int32 myextension = 100; 495} 496~~~~ 497 498For each extension, `nanopb_generator.py` creates a constant of type 499`pb_extension_type_t`. To link together the base message and the 500extension, you have to: 501 5021. Allocate storage for your field, matching the datatype in the 503 .proto. For example, for a `int32` field, you need a `int32_t` 504 variable to store the value. 5052. Create a `pb_extension_t` constant, with pointers to your variable 506 and to the generated `pb_extension_type_t`. 5073. Set the `message.extensions` pointer to point to the 508 `pb_extension_t`. 509 510An example of this is available in `tests/test_encode_extensions.c` 511and `tests/test_decode_extensions.c`. 512 513## Default values 514 515Protobuf has two syntax variants, proto2 and proto3. Of these proto2 has 516user definable default values that can be given in .proto file: 517 518~~~~ protobuf 519message MyMessage { 520 optional bytes foo = 1 [default = "ABC\x01\x02\x03"]; 521 optional string bar = 2 [default = "åäö"]; 522} 523~~~~ 524 525Nanopb will generate both static and runtime initialization for the 526default values. In `myproto.pb.h` there will be a 527`#define MyMessage_init_default {...}` that can be used to initialize 528whole message into default values: 529 530 MyMessage msg = MyMessage_init_default; 531 532In addition to this, `pb_decode()` will initialize message 533fields to defaults at runtime. If this is not desired, 534`pb_decode_ex()` can be used instead. 535 536## Message framing 537 538Protocol Buffers does not specify a method of framing the messages for 539transmission. This is something that must be provided by the library 540user, as there is no one-size-fits-all solution. Typical needs for a 541framing format are to: 542 5431. Encode the message length. 5442. Encode the message type. 5453. Perform any synchronization and error checking that may be needed 546 depending on application. 547 548For example UDP packets already fulfill all the requirements, and TCP 549streams typically only need a way to identify the message length and 550type. Lower level interfaces such as serial ports may need a more robust 551frame format, such as HDLC (high-level data link control). 552 553Nanopb provides a few helpers to facilitate implementing framing 554formats: 555 5561. Functions `pb_encode_ex` and `pb_decode_ex` prefix the message 557 data with a varint-encoded length. 5582. Union messages and oneofs are supported in order to implement 559 top-level container messages. 5603. Message IDs can be specified using the `(nanopb_msgopt).msgid` 561 option and can then be accessed from the header. 562 563## Return values and error handling 564 565Most functions in nanopb return bool: `true` means success, `false` 566means failure. There is also support for error messages for 567debugging purposes: the error messages go in `stream->errmsg`. 568 569The error messages help in guessing what is the underlying cause of the 570error. The most common error conditions are: 571 5721) Invalid protocol buffers binary message. 5732) Mismatch between binary message and .proto message type. 5743) Unterminated message (incorrect message length). 5754) Exceeding the max_size or bytes_left of a stream. 5765) Exceeding the max_size/max_count of a string or array field 5776) IO errors in your own stream callbacks. 5787) Errors that happen in your callback functions. 5798) Running out of memory, i.e. stack overflow. 5809) Invalid field descriptors (would usually mean a bug in the generator). 581 582## Static assertions 583 584Nanopb code uses static assertions to check size of structures at the compile 585time. The `PB_STATIC_ASSERT` macro is defined in `pb.h`. If ISO C11 standard 586is available, the C standard `_Static_assert` keyword is used, otherwise a 587negative sized array definition trick is used. 588 589Common reasons for static assertion errors are: 590 5911. `FIELDINFO_DOES_NOT_FIT_width2` with `width1` or `width2`: 592 Message that is larger than 256 bytes, but nanopb generator does not detect 593 it for some reason. Often resolved by giving all `.proto` files as argument 594 to `nanopb_generator.py` at the same time, to ensure submessage definitions 595 are found. Alternatively `(nanopb).descriptorsize = DS_4` option can be 596 given manually. 597 5982. `FIELDINFO_DOES_NOT_FIT_width4` with `width4`: 599 Message that is larger than 64 kilobytes. There will be a better error 600 message for this in a future nanopb version, but currently it asserts here. 601 The compile time option `PB_FIELD_32BIT` should be specified either on 602 C compiler command line or by editing `pb.h`. This will increase the sizes 603 of integer types used internally in nanopb code. 604 6053. `DOUBLE_MUST_BE_8_BYTES`: 606 Some platforms, most notably AVR, do not support the 64-bit `double` type, 607 only 32-bit `float`. The compile time option `PB_CONVERT_DOUBLE_FLOAT` can 608 be defined to convert between the types automatically. The conversion 609 results in small rounding errors and takes unnecessary space in transmission, 610 so changing the `.proto` to use `float` type is often better. 611 6124. `INT64_T_WRONG_SIZE`: 613 The `stdint.h` system header is incorrect for the C compiler being used. 614 This can result from erroneous compiler include path. 615 If the compiler actually does not support 64-bit types, the compile time 616 option `PB_WITHOUT_64BIT` can be used. 617 6185. `variably modified array size`: 619 The compiler used has problems resolving the array-based static assert at 620 compile time. Try setting the compiler to C11 standard mode if possible. 621 If static assertions cannot be made to work on the compiler used, the 622 compile-time option `PB_NO_STATIC_ASSERT` can be specified to turn them off. 623