1# Nanopb: Basic concepts 2 3The things outlined here are the underlying concepts of the nanopb 4design. 5 6## Proto files 7 8All Protocol Buffers implementations use .proto files to describe the 9message format. The point of these files is to be a portable interface 10description language. 11 12### Compiling .proto files for nanopb 13 14Nanopb comes with a Python script to generate `.pb.c` and 15`.pb.h` files from the `.proto` definition: 16 17 user@host:~$ nanopb/generator/nanopb_generator.py message.proto 18 Writing to message.pb.h and message.pb.c 19 20Internally this script uses Google `protoc` to parse the 21input file. If you do not have it available, you may receive an error 22message. You can install either `grpcio-tools` Python 23package using `pip`, or the `protoc` compiler 24itself from `protobuf-compiler` distribution package. 25Generally the Python package is recommended, because nanopb requires 26protoc version 3.6 or newer to support all features, and some distributions come with an older 27version. 28 29### Modifying generator behaviour 30 31Using generator options, you can set maximum sizes for fields in order 32to allocate them statically. The preferred way to do this is to create 33an .options file with the same name as your .proto file: 34 35 # Foo.proto 36 message Foo { 37 required string name = 1; 38 } 39 40 # Foo.options 41 Foo.name max_size:16 42 43For more information on this, see the [Proto file 44options](reference.html#proto-file-options) section in the reference 45manual. 46 47## Streams 48 49Nanopb uses streams for accessing the data in encoded format. The stream 50abstraction is very lightweight, and consists of a structure 51(`pb_ostream_t` or `pb_istream_t`) which contains a pointer to a 52callback function. 53 54There are a few generic rules for callback functions: 55 561) Return false on IO errors. The encoding or decoding process will 57 abort immediately. 582) Use state to store your own data, such as a file descriptor. 593) `bytes_written` and `bytes_left` are updated by pb_write and 60 pb_read. 614) Your callback may be used with substreams. In this case 62 `bytes_left`, `bytes_written` and `max_size` have smaller values 63 than the original stream. Don't use these values to calculate 64 pointers. 655) Always read or write the full requested length of data. For example, 66 POSIX `recv()` needs the `MSG_WAITALL` parameter to accomplish 67 this. 68 69### Output streams 70 71 struct _pb_ostream_t 72 { 73 bool (*callback)(pb_ostream_t *stream, const uint8_t *buf, size_t count); 74 void *state; 75 size_t max_size; 76 size_t bytes_written; 77 }; 78 79The `callback` for output stream may be NULL, in which case the stream 80simply counts the number of bytes written. In this case, `max_size` is 81ignored. 82 83Otherwise, if `bytes_written` + bytes_to_be_written is larger than 84`max_size`, pb_write returns false before doing anything else. If you 85don\'t want to limit the size of the stream, pass SIZE_MAX. 86 87**Example 1:** 88 89This is the way to get the size of the message without storing it 90anywhere: 91 92 Person myperson = ...; 93 pb_ostream_t sizestream = {0}; 94 pb_encode(&sizestream, Person_fields, &myperson); 95 printf("Encoded size is %d\n", sizestream.bytes_written); 96 97**Example 2:** 98 99Writing to stdout: 100 101 bool callback(pb_ostream_t `stream, const uint8_t `buf, size_t count) 102 { 103 FILE *file = (FILE*) stream->state; 104 return fwrite(buf, 1, count, file) == count; 105 } 106 107 pb_ostream_t stdoutstream = {&callback, stdout, SIZE_MAX, 0}; 108 109### Input streams 110 111For input streams, there is one extra rule: 112 1136) You don't need to know the length of the message in advance. After 114 getting EOF error when reading, set `bytes_left` to 0 and return 115 `false`. `pb_decode()` will detect this and if the EOF was in a proper 116 position, it will return true. 117 118Here is the structure: 119 120 struct _pb_istream_t 121 { 122 bool (*callback)(pb_istream_t *stream, uint8_t *buf, size_t count); 123 void *state; 124 size_t bytes_left; 125 }; 126 127The `callback` must always be a function pointer. `Bytes_left` is an 128upper limit on the number of bytes that will be read. You can use 129SIZE_MAX if your callback handles EOF as described above. 130 131**Example:** 132 133This function binds an input stream to stdin: 134 135 bool callback(pb_istream_t *stream, uint8_t *buf, size_t count) 136 { 137 FILE *file = (FILE*)stream->state; 138 bool status; 139 140 if (buf == NULL) 141 { 142 while (count-- && fgetc(file) != EOF); 143 return count == 0; 144 } 145 146 status = (fread(buf, 1, count, file) == count); 147 148 if (feof(file)) 149 stream->bytes_left = 0; 150 151 return status; 152 } 153 154 pb_istream_t stdinstream = {&callback, stdin, SIZE_MAX}; 155 156## Data types 157 158Most Protocol Buffers datatypes have directly corresponding C datatypes, 159such as `int32` is `int32_t`, `float` is `float` and `bool` is `bool`. However, the 160variable-length datatypes are more complex: 161 1621) Strings, bytes and repeated fields of any type map to callback 163 functions by default. 1642) If there is a special option `(nanopb).max_size` specified in the 165 .proto file, string maps to null-terminated char array and bytes map 166 to a structure containing a char array and a size field. 1673) If `(nanopb).fixed_length` is set to `true` and 168 `(nanopb).max_size` is also set, then bytes map to an inline byte 169 array of fixed size. 1704) If there is a special option `(nanopb).max_count` specified on a 171 repeated field, it maps to an array of whatever type is being 172 repeated. Another field will be created for the actual number of 173 entries stored. 1745) If `(nanopb).fixed_count` is set to `true` and 175 `(nanopb).max_count` is also set, the field for the actual number 176 of entries will not by created as the count is always assumed to be 177 max count. 178 179### Examples of .proto specifications vs. generated structure 180 181**Simple integer field:**\ 182.proto: `int32 age = 1;`\ 183.pb.h: `int32_t age;` 184 185**String with unknown length:**\ 186.proto: `string name = 1;`\ 187.pb.h: `pb_callback_t name;` 188 189**String with known maximum length:**\ 190.proto: `string name = 1 [(nanopb).max_length = 40];`\ 191.pb.h: `char name[41];` 192 193**Repeated string with unknown count:**\ 194.proto: `repeated string names = 1;`\ 195.pb.h: `pb_callback_t names;` 196 197**Repeated string with known maximum count and size:**\ 198.proto: `repeated string names = 1 [(nanopb).max_length = 40, (nanopb).max_count = 5];`\ 199.pb.h: `size_t names_count;` `char names[5][41];` 200 201**Bytes field with known maximum size:**\ 202.proto: `bytes data = 1 [(nanopb).max_size = 16];`\ 203.pb.h: `PB_BYTES_ARRAY_T(16) data;`, where the struct contains `{pb_size_t size; pb_byte_t bytes[n];}` 204 205**Bytes field with fixed length:**\ 206.proto: `bytes data = 1 [(nanopb).max_size = 16, (nanopb).fixed_length = true];`\ 207.pb.h: `pb_byte_t data[16];` 208 209**Repeated integer array with known maximum size:**\ 210.proto: `repeated int32 numbers = 1 [(nanopb).max_count = 5];`\ 211.pb.h: `pb_size_t numbers_count;` `int32_t numbers[5];` 212 213**Repeated integer array with fixed count:**\ 214.proto: `repeated int32 numbers = 1 [(nanopb).max_count = 5, (nanopb).fixed_count = true];`\ 215.pb.h: `int32_t numbers[5];` 216 217The maximum lengths are checked in runtime. If string/bytes/array 218exceeds the allocated length, `pb_decode()` will return false. 219 220> **Note:** For the `bytes` datatype, the field length checking may not be 221exact. The compiler may add some padding to the `pb_bytes_t` 222structure, and the nanopb runtime doesn't know how much of the 223structure size is padding. Therefore it uses the whole length of the 224structure for storing data, which is not very smart but shouldn't cause 225problems. In practise, this means that if you specify 226`(nanopb).max_size=5` on a `bytes` field, you may be able to store 6 227bytes there. For the `string` field type, the length limit is exact. 228 229> **Note:** The decoder only keeps track of one `fixed_count` repeated field at a time. Usually this it not an issue because all elements of a repeated field occur end-to-end. Interleaved array elements of several `fixed_count` repeated fields would be a valid protobuf message, but would get rejected by nanopb decoder with error `"wrong size for fixed count field"`. 230 231## Field callbacks 232 233When a field has dynamic length, nanopb cannot statically allocate 234storage for it. Instead, it allows you to handle the field in whatever 235way you want, using a callback function. 236 237The [pb_callback_t](reference.html#pb-callback-t) structure contains a 238function pointer and a `void` pointer called `arg` you can use for 239passing data to the callback. If the function pointer is NULL, the field 240will be skipped. A pointer to the `arg` is passed to the function, so 241that it can modify it and retrieve the value. 242 243The actual behavior of the callback function is different in encoding 244and decoding modes. In encoding mode, the callback is called once and 245should write out everything, including field tags. In decoding mode, the 246callback is called repeatedly for every data item. 247 248To write more complex field callbacks, it is recommended to read the 249[Google Protobuf Encoding Specification](https://developers.google.com/protocol-buffers/docs/encoding). 250 251### Encoding callbacks 252 253 bool (*encode)(pb_ostream_t *stream, const pb_field_iter_t *field, void * const *arg); 254 255| | | 256| ---------- | ------------------------------------------------------------------ | 257| `stream` | Output stream to write to | 258| `field` | Iterator for the field currently being encoded or decoded. | 259| `arg` | Pointer to the `arg` field in the `pb_callback_t` structure. | 260 261When encoding, the callback should write out complete fields, including 262the wire type and field number tag. It can write as many or as few 263fields as it likes. For example, if you want to write out an array as 264`repeated` field, you should do it all in a single call. 265 266Usually you can use [pb_encode_tag_for_field](reference.html#pb-encode-tag-for-field) to 267encode the wire type and tag number of the field. However, if you want 268to encode a repeated field as a packed array, you must call 269[pb_encode_tag](reference.html#pb-encode-tag) instead to specify a 270wire type of `PB_WT_STRING`. 271 272If the callback is used in a submessage, it will be called multiple 273times during a single call to [pb_encode](reference.html#pb-encode). In 274this case, it must produce the same amount of data every time. If the 275callback is directly in the main message, it is called only once. 276 277This callback writes out a dynamically sized string: 278 279 bool write_string(pb_ostream_t *stream, const pb_field_iter_t *field, void * const *arg) 280 { 281 char *str = get_string_from_somewhere(); 282 if (!pb_encode_tag_for_field(stream, field)) 283 return false; 284 285 return pb_encode_string(stream, (uint8_t*)str, strlen(str)); 286 } 287 288### Decoding callbacks 289 290 bool (*decode)(pb_istream_t *stream, const pb_field_iter_t *field, void **arg); 291 292| | | 293| ---------- | ------------------------------------------------------------------ | 294| `stream` | Input stream to read from | 295| `field` | Iterator for the field currently being encoded or decoded. | 296| `arg` | Pointer to the `arg` field in the `pb_callback_t` structure. | 297 298When decoding, the callback receives a length-limited substring that 299reads the contents of a single field. The field tag has already been 300read. For `string` and `bytes`, the length value has already been 301parsed, and is available at `stream->bytes_left`. 302 303The callback will be called multiple times for repeated fields. For 304packed fields, you can either read multiple values until the stream 305ends, or leave it to [pb_decode](reference.html#pb-decode) to call your 306function over and over until all values have been read. 307 308This callback reads multiple integers and prints them: 309 310 bool read_ints(pb_istream_t *stream, const pb_field_iter_t *field, void **arg) 311 { 312 while (stream->bytes_left) 313 { 314 uint64_t value; 315 if (!pb_decode_varint(stream, &value)) 316 return false; 317 printf("%lld\n", value); 318 } 319 return true; 320 } 321 322### Function name bound callbacks 323 324 bool MyMessage_callback(pb_istream_t *istream, pb_ostream_t *ostream, const pb_field_iter_t *field); 325 326| | | 327| ---------- | ------------------------------------------------------------------ | 328| `istream` | Input stream to read from, or NULL if called in encoding context. | 329| `ostream` | Output stream to write to, or NULL if called in decoding context. | 330| `field` | Iterator for the field currently being encoded or decoded. | 331 332Storing function pointer in `pb_callback_t` fields inside 333the message requires extra storage space and is often cumbersome. As an 334alternative, the generator options `callback_function` and 335`callback_datatype` can be used to bind a callback function 336based on its name. 337 338Typically this feature is used by setting 339`callback_datatype` to e.g. `void\*` or other 340data type used for callback state. Then the generator will automatically 341set `callback_function` to 342`MessageName_callback` and produce a prototype for it in 343generated `.pb.h`. By implementing this function in your own 344code, you will receive callbacks for fields without having to separately 345set function pointers. 346 347If you want to use function name bound callbacks for some fields and 348`pb_callback_t` for other fields, you can call 349`pb_default_field_callback` from the message-level 350callback. It will then read a function pointer from 351`pb_callback_t` and call it. 352 353## Message descriptor 354 355For using the `pb_encode()` and `pb_decode()` functions, you need a 356description of all the fields contained in a message. This description 357is usually autogenerated from .proto file. 358 359For example this submessage in the Person.proto file: 360 361~~~~ protobuf 362message Person { 363 message PhoneNumber { 364 required string number = 1 [(nanopb).max_size = 40]; 365 optional PhoneType type = 2 [default = HOME]; 366 } 367} 368~~~~ 369 370This in turn generates a macro list in the `.pb.h` file: 371 372 #define Person_PhoneNumber_FIELDLIST(X, a) \ 373 X(a, STATIC, REQUIRED, STRING, number, 1) \ 374 X(a, STATIC, OPTIONAL, UENUM, type, 2) 375 376Inside the `.pb.c` file there is a macro call to 377`PB_BIND`: 378 379 PB_BIND(Person_PhoneNumber, Person_PhoneNumber, AUTO) 380 381These macros will in combination generate `pb_msgdesc_t` 382structure and associated lists: 383 384 const uint32_t Person_PhoneNumber_field_info[] = { ... }; 385 const pb_msgdesc_t * const Person_PhoneNumber_submsg_info[] = { ... }; 386 const pb_msgdesc_t Person_PhoneNumber_msg = { 387 2, 388 Person_PhoneNumber_field_info, 389 Person_PhoneNumber_submsg_info, 390 Person_PhoneNumber_DEFAULT, 391 NULL, 392 }; 393 394The encoding and decoding functions take a pointer to this structure and 395use it to process each field in the message. 396 397## Oneof 398 399Protocol Buffers supports 400[oneof](https://developers.google.com/protocol-buffers/docs/reference/proto2-spec#oneof_and_oneof_field) 401sections, where only one of the fields contained within can be present. Here is an example of `oneof` usage: 402 403~~~~ protobuf 404message MsgType1 { 405 required int32 value = 1; 406} 407 408message MsgType2 { 409 required bool value = 1; 410} 411 412message MsgType3 { 413 required int32 value1 = 1; 414 required int32 value2 = 2; 415} 416 417message MyMessage { 418 required uint32 uid = 1; 419 required uint32 pid = 2; 420 required uint32 utime = 3; 421 422 oneof payload { 423 MsgType1 msg1 = 4; 424 MsgType2 msg2 = 5; 425 MsgType3 msg3 = 6; 426 } 427} 428~~~~ 429 430Nanopb will generate `payload` as a C union and add an additional field 431`which_payload`: 432 433 typedef struct _MyMessage { 434 uint32_t uid; 435 uint32_t pid; 436 uint32_t utime; 437 pb_size_t which_payload; 438 union { 439 MsgType1 msg1; 440 MsgType2 msg2; 441 MsgType3 msg3; 442 } payload; 443 } MyMessage; 444 445`which_payload` indicates which of the `oneof` fields is actually set. 446The user is expected to set the field manually using the correct field 447tag: 448 449 MyMessage msg = MyMessage_init_zero; 450 msg.payload.msg2.value = true; 451 msg.which_payload = MyMessage_msg2_tag; 452 453Notice that neither `which_payload` field nor the unused fields in 454`payload` will consume any space in the resulting encoded message. 455 456When a field inside `oneof` contains `pb_callback_t` 457fields, the callback values cannot be set before decoding. This is 458because the different fields share the same storage space in C 459`union`. Instead either function name bound callbacks or a 460separate message level callback can be used. See 461[tests/oneof_callback](https://github.com/nanopb/nanopb/tree/master/tests/oneof_callback) 462for an example on this. 463 464## Extension fields 465 466Protocol Buffers supports a concept of [extension 467fields](https://developers.google.com/protocol-buffers/docs/proto#extensions), 468which are additional fields to a message, but defined outside the actual 469message. The definition can even be in a completely separate .proto 470file. 471 472The base message is declared as extensible by keyword `extensions` in 473the .proto file: 474 475~~~~ protobuf 476message MyMessage { 477 .. fields .. 478 extensions 100 to 199; 479} 480~~~~ 481 482For each extensible message, `nanopb_generator.py` declares an 483additional callback field called `extensions`. The field and associated 484datatype `pb_extension_t` forms a linked list of handlers. When an 485unknown field is encountered, the decoder calls each handler in turn 486until either one of them handles the field, or the list is exhausted. 487 488The actual extensions are declared using the `extend` keyword in the 489.proto, and are in the global namespace: 490 491~~~~ protobuf 492extend MyMessage { 493 optional int32 myextension = 100; 494} 495~~~~ 496 497For each extension, `nanopb_generator.py` creates a constant of type 498`pb_extension_type_t`. To link together the base message and the 499extension, you have to: 500 5011. Allocate storage for your field, matching the datatype in the 502 .proto. For example, for a `int32` field, you need a `int32_t` 503 variable to store the value. 5042. Create a `pb_extension_t` constant, with pointers to your variable 505 and to the generated `pb_extension_type_t`. 5063. Set the `message.extensions` pointer to point to the 507 `pb_extension_t`. 508 509An example of this is available in `tests/test_encode_extensions.c` 510and `tests/test_decode_extensions.c`. 511 512## Default values 513 514Protobuf has two syntax variants, proto2 and proto3. Of these proto2 has 515user definable default values that can be given in .proto file: 516 517~~~~ protobuf 518message MyMessage { 519 optional bytes foo = 1 [default = "ABC\x01\x02\x03"]; 520 optional string bar = 2 [default = "åäö"]; 521} 522~~~~ 523 524Nanopb will generate both static and runtime initialization for the 525default values. In `myproto.pb.h` there will be a 526`#define MyMessage_init_default {...}` that can be used to initialize 527whole message into default values: 528 529 MyMessage msg = MyMessage_init_default; 530 531In addition to this, `pb_decode()` will initialize message 532fields to defaults at runtime. If this is not desired, 533`pb_decode_ex()` can be used instead. 534 535## Message framing 536 537Protocol Buffers does not specify a method of framing the messages for 538transmission. This is something that must be provided by the library 539user, as there is no one-size-fits-all solution. Typical needs for a 540framing format are to: 541 5421. Encode the message length. 5432. Encode the message type. 5443. Perform any synchronization and error checking that may be needed 545 depending on application. 546 547For example UDP packets already fullfill all the requirements, and TCP 548streams typically only need a way to identify the message length and 549type. Lower level interfaces such as serial ports may need a more robust 550frame format, such as HDLC (high-level data link control). 551 552Nanopb provides a few helpers to facilitate implementing framing 553formats: 554 5551. Functions `pb_encode_ex` and `pb_decode_ex` prefix the message 556 data with a varint-encoded length. 5572. Union messages and oneofs are supported in order to implement 558 top-level container messages. 5593. Message IDs can be specified using the `(nanopb_msgopt).msgid` 560 option and can then be accessed from the header. 561 562## Return values and error handling 563 564Most functions in nanopb return bool: `true` means success, `false` 565means failure. There is also support for error messages for 566debugging purposes: the error messages go in `stream->errmsg`. 567 568The error messages help in guessing what is the underlying cause of the 569error. The most common error conditions are: 570 5711) Invalid protocol buffers binary message. 5722) Mismatch between binary message and .proto message type. 5733) Unterminated message (incorrect message length). 5744) Exceeding the max_size or bytes_left of a stream. 5755) Exceeding the max_size/max_count of a string or array field 5766) IO errors in your own stream callbacks. 5777) Errors that happen in your callback functions. 5788) Running out of memory, i.e. stack overflow. 5799) Invalid field descriptors (would usually mean a bug in the generator). 580