1# Nanopb: Basic concepts
2
3The things outlined here are the underlying concepts of the nanopb
4design.
5
6## Proto files
7
8All Protocol Buffers implementations use .proto files to describe the
9message format. The point of these files is to be a portable interface
10description language.
11
12### Compiling .proto files for nanopb
13
14Nanopb comes with a Python script to generate `.pb.c` and
15`.pb.h` files from the `.proto` definition:
16
17    user@host:~$ nanopb/generator/nanopb_generator.py message.proto
18    Writing to message.pb.h and message.pb.c
19
20Internally this script uses Google `protoc` to parse the
21input file. If you do not have it available, you may receive an error
22message. You can install either `grpcio-tools` Python
23package using `pip`, or the `protoc` compiler
24itself from `protobuf-compiler` distribution package.
25Generally the Python package is recommended, because nanopb requires
26protoc version 3.6 or newer to support all features, and some distributions come with an older
27version.
28
29### Modifying generator behaviour
30
31Using generator options, you can set maximum sizes for fields in order
32to allocate them statically. The preferred way to do this is to create
33an .options file with the same name as your .proto file:
34
35    # Foo.proto
36    message Foo {
37       required string name = 1;
38    }
39
40    # Foo.options
41    Foo.name max_size:16
42
43For more information on this, see the [Proto file
44options](reference.html#proto-file-options) section in the reference
45manual.
46
47## Streams
48
49Nanopb uses streams for accessing the data in encoded format. The stream
50abstraction is very lightweight, and consists of a structure
51(`pb_ostream_t` or `pb_istream_t`) which contains a pointer to a
52callback function.
53
54There are a few generic rules for callback functions:
55
561)  Return false on IO errors. The encoding or decoding process will
57    abort immediately.
582)  Use state to store your own data, such as a file descriptor.
593)  `bytes_written` and `bytes_left` are updated by pb_write and
60    pb_read.
614)  Your callback may be used with substreams. In this case
62    `bytes_left`, `bytes_written` and `max_size` have smaller values
63    than the original stream. Don't use these values to calculate
64    pointers.
655)  Always read or write the full requested length of data. For example,
66    POSIX `recv()` needs the `MSG_WAITALL` parameter to accomplish
67    this.
68
69### Output streams
70
71    struct _pb_ostream_t
72    {
73       bool (*callback)(pb_ostream_t *stream, const uint8_t *buf, size_t count);
74       void *state;
75       size_t max_size;
76       size_t bytes_written;
77    };
78
79The `callback` for output stream may be NULL, in which case the stream
80simply counts the number of bytes written. In this case, `max_size` is
81ignored.
82
83Otherwise, if `bytes_written` + bytes_to_be_written is larger than
84`max_size`, pb_write returns false before doing anything else. If you
85don\'t want to limit the size of the stream, pass SIZE_MAX.
86
87**Example 1:**
88
89This is the way to get the size of the message without storing it
90anywhere:
91
92    Person myperson = ...;
93    pb_ostream_t sizestream = {0};
94    pb_encode(&sizestream, Person_fields, &myperson);
95    printf("Encoded size is %d\n", sizestream.bytes_written);
96
97**Example 2:**
98
99Writing to stdout:
100
101    bool callback(pb_ostream_t `stream, const uint8_t `buf, size_t count)
102    {
103       FILE *file = (FILE*) stream->state;
104       return fwrite(buf, 1, count, file) == count;
105    }
106
107    pb_ostream_t stdoutstream = {&callback, stdout, SIZE_MAX, 0};
108
109### Input streams
110
111For input streams, there is one extra rule:
112
1136)  You don't need to know the length of the message in advance. After
114    getting EOF error when reading, set `bytes_left` to 0 and return
115    `false`. `pb_decode()` will detect this and if the EOF was in a proper
116    position, it will return true.
117
118Here is the structure:
119
120    struct _pb_istream_t
121    {
122       bool (*callback)(pb_istream_t *stream, uint8_t *buf, size_t count);
123       void *state;
124       size_t bytes_left;
125    };
126
127The `callback` must always be a function pointer. `Bytes_left` is an
128upper limit on the number of bytes that will be read. You can use
129SIZE_MAX if your callback handles EOF as described above.
130
131**Example:**
132
133This function binds an input stream to stdin:
134
135    bool callback(pb_istream_t *stream, uint8_t *buf, size_t count)
136    {
137       FILE *file = (FILE*)stream->state;
138       bool status;
139
140       if (buf == NULL)
141       {
142           while (count-- && fgetc(file) != EOF);
143           return count == 0;
144       }
145
146       status = (fread(buf, 1, count, file) == count);
147
148       if (feof(file))
149           stream->bytes_left = 0;
150
151       return status;
152    }
153
154    pb_istream_t stdinstream = {&callback, stdin, SIZE_MAX};
155
156## Data types
157
158Most Protocol Buffers datatypes have directly corresponding C datatypes,
159such as `int32` is `int32_t`, `float` is `float` and `bool` is `bool`. However, the
160variable-length datatypes are more complex:
161
1621)  Strings, bytes and repeated fields of any type map to callback
163    functions by default.
1642)  If there is a special option `(nanopb).max_size` specified in the
165    .proto file, string maps to null-terminated char array and bytes map
166    to a structure containing a char array and a size field.
1673)  If `(nanopb).fixed_length` is set to `true` and
168    `(nanopb).max_size` is also set, then bytes map to an inline byte
169    array of fixed size.
1704)  If there is a special option `(nanopb).max_count` specified on a
171    repeated field, it maps to an array of whatever type is being
172    repeated. Another field will be created for the actual number of
173    entries stored.
1745)  If `(nanopb).fixed_count` is set to `true` and
175    `(nanopb).max_count` is also set, the field for the actual number
176    of entries will not by created as the count is always assumed to be
177    max count.
178
179### Examples of .proto specifications vs. generated structure
180
181**Simple integer field:**\
182.proto: `int32 age = 1;`\
183.pb.h: `int32_t age;`
184
185**String with unknown length:**\
186.proto: `string name = 1;`\
187.pb.h: `pb_callback_t name;`
188
189**String with known maximum length:**\
190.proto: `string name = 1 [(nanopb).max_length = 40];`\
191.pb.h: `char name[41];`
192
193**Repeated string with unknown count:**\
194.proto: `repeated string names = 1;`\
195.pb.h: `pb_callback_t names;`
196
197**Repeated string with known maximum count and size:**\
198.proto: `repeated string names = 1 [(nanopb).max_length = 40, (nanopb).max_count = 5];`\
199.pb.h: `size_t names_count;` `char names[5][41];`
200
201**Bytes field with known maximum size:**\
202.proto: `bytes data = 1 [(nanopb).max_size = 16];`\
203.pb.h: `PB_BYTES_ARRAY_T(16) data;`, where the struct contains `{pb_size_t size; pb_byte_t bytes[n];}`
204
205**Bytes field with fixed length:**\
206.proto: `bytes data = 1 [(nanopb).max_size = 16, (nanopb).fixed_length = true];`\
207.pb.h: `pb_byte_t data[16];`
208
209**Repeated integer array with known maximum size:**\
210.proto: `repeated int32 numbers = 1 [(nanopb).max_count = 5];`\
211.pb.h: `pb_size_t numbers_count;` `int32_t numbers[5];`
212
213**Repeated integer array with fixed count:**\
214.proto: `repeated int32 numbers = 1 [(nanopb).max_count = 5, (nanopb).fixed_count = true];`\
215.pb.h: `int32_t numbers[5];`
216
217The maximum lengths are checked in runtime. If string/bytes/array
218exceeds the allocated length, `pb_decode()` will return false.
219
220> **Note:** For the `bytes` datatype, the field length checking may not be
221exact. The compiler may add some padding to the `pb_bytes_t`
222structure, and the nanopb runtime doesn't know how much of the
223structure size is padding. Therefore it uses the whole length of the
224structure for storing data, which is not very smart but shouldn't cause
225problems. In practise, this means that if you specify
226`(nanopb).max_size=5` on a `bytes` field, you may be able to store 6
227bytes there. For the `string` field type, the length limit is exact.
228
229> **Note:** The decoder only keeps track of one `fixed_count` repeated field at a time. Usually this it not an issue because all elements of a repeated field occur end-to-end. Interleaved array elements of several `fixed_count` repeated fields would be a valid protobuf message, but would get rejected by nanopb decoder with error `"wrong size for fixed count field"`.
230
231## Field callbacks
232
233The easiest way to handle repeated fields is to specify a maximum size for
234them, as shown in the previous section. However, sometimes you need to be
235able to handle arrays with unlimited length, possibly larger than available
236RAM memory.
237
238For these cases, nanopb provides a callback interface. Nanopb core invokes
239the callback function when it gets to the specific field in the message.
240Your code can then handle the field in custom ways, for example decode
241the data piece-by-piece and store to filesystem.
242
243The [pb_callback_t](reference.html#pb-callback-t) structure contains a
244function pointer and a `void` pointer called `arg` you can use for
245passing data to the callback. If the function pointer is NULL, the field
246will be skipped. A pointer to the `arg` is passed to the function, so
247that it can modify it and retrieve the value.
248
249The actual behavior of the callback function is different in encoding
250and decoding modes. In encoding mode, the callback is called once and
251should write out everything, including field tags. In decoding mode, the
252callback is called repeatedly for every data item.
253
254To write more complex field callbacks, it is recommended to read the
255[Google Protobuf Encoding Specification](https://developers.google.com/protocol-buffers/docs/encoding).
256
257### Encoding callbacks
258
259    bool (*encode)(pb_ostream_t *stream, const pb_field_iter_t *field, void * const *arg);
260
261|            |                                                                    |
262| ---------- | ------------------------------------------------------------------ |
263| `stream`   | Output stream to write to                                          |
264| `field`    | Iterator for the field currently being encoded or decoded.         |
265| `arg`      | Pointer to the `arg` field in the `pb_callback_t` structure.       |
266
267When encoding, the callback should write out complete fields, including
268the wire type and field number tag. It can write as many or as few
269fields as it likes. For example, if you want to write out an array as
270`repeated` field, you should do it all in a single call.
271
272Usually you can use [pb_encode_tag_for_field](reference.html#pb-encode-tag-for-field) to
273encode the wire type and tag number of the field. However, if you want
274to encode a repeated field as a packed array, you must call
275[pb_encode_tag](reference.html#pb-encode-tag) instead to specify a
276wire type of `PB_WT_STRING`.
277
278If the callback is used in a submessage, it will be called multiple
279times during a single call to [pb_encode](reference.html#pb-encode). In
280this case, it must produce the same amount of data every time. If the
281callback is directly in the main message, it is called only once.
282
283This callback writes out a dynamically sized string:
284
285    bool write_string(pb_ostream_t *stream, const pb_field_iter_t *field, void * const *arg)
286    {
287        char *str = get_string_from_somewhere();
288        if (!pb_encode_tag_for_field(stream, field))
289            return false;
290
291        return pb_encode_string(stream, (uint8_t*)str, strlen(str));
292    }
293
294### Decoding callbacks
295
296    bool (*decode)(pb_istream_t *stream, const pb_field_iter_t *field, void **arg);
297
298|            |                                                                    |
299| ---------- | ------------------------------------------------------------------ |
300| `stream`   | Input stream to read from                                          |
301| `field`    | Iterator for the field currently being encoded or decoded.         |
302| `arg`      | Pointer to the `arg` field in the `pb_callback_t` structure.       |
303
304When decoding, the callback receives a length-limited substring that
305reads the contents of a single field. The field tag has already been
306read. For `string` and `bytes`, the length value has already been
307parsed, and is available at `stream->bytes_left`.
308
309The callback will be called multiple times for repeated fields. For
310packed fields, you can either read multiple values until the stream
311ends, or leave it to [pb_decode](reference.html#pb-decode) to call your
312function over and over until all values have been read.
313
314This callback reads multiple integers and prints them:
315
316    bool read_ints(pb_istream_t *stream, const pb_field_iter_t *field, void **arg)
317    {
318        while (stream->bytes_left)
319        {
320            uint64_t value;
321            if (!pb_decode_varint(stream, &value))
322                return false;
323            printf("%lld\n", value);
324        }
325        return true;
326    }
327
328### Function name bound callbacks
329
330    bool MyMessage_callback(pb_istream_t *istream, pb_ostream_t *ostream, const pb_field_iter_t *field);
331
332|            |                                                                    |
333| ---------- | ------------------------------------------------------------------ |
334| `istream`  | Input stream to read from, or NULL if called in encoding context.  |
335| `ostream`  | Output stream to write to, or NULL if called in decoding context.  |
336| `field`    | Iterator for the field currently being encoded or decoded.         |
337
338Storing function pointer in `pb_callback_t` fields inside
339the message requires extra storage space and is often cumbersome. As an
340alternative, the generator options `callback_function` and
341`callback_datatype` can be used to bind a callback function
342based on its name.
343
344Typically this feature is used by setting `callback_datatype` to e.g. `void\*` or even a struct type used to store encoded or decoded data.
345The generator will automatically set `callback_function` to `MessageName_callback` and produce a prototype for it in generated `.pb.h`.
346By implementing this function in your own code, you will receive callbacks for fields without having to separately set function pointers.
347
348If you want to use function name bound callbacks for some fields and
349`pb_callback_t` for other fields, you can call
350`pb_default_field_callback` from the message-level
351callback. It will then read a function pointer from
352`pb_callback_t` and call it.
353
354## Message descriptor
355
356For using the `pb_encode()` and `pb_decode()` functions, you need a
357description of all the fields contained in a message. This description
358is usually autogenerated from .proto file.
359
360For example this submessage in the Person.proto file:
361
362~~~~ protobuf
363message Person {
364    message PhoneNumber {
365        required string number = 1 [(nanopb).max_size = 40];
366        optional PhoneType type = 2 [default = HOME];
367    }
368}
369~~~~
370
371This in turn generates a macro list in the `.pb.h` file:
372
373    #define Person_PhoneNumber_FIELDLIST(X, a) \
374    X(a, STATIC,   REQUIRED, STRING,   number,            1) \
375    X(a, STATIC,   OPTIONAL, UENUM,    type,              2)
376
377Inside the `.pb.c` file there is a macro call to
378`PB_BIND`:
379
380    PB_BIND(Person_PhoneNumber, Person_PhoneNumber, AUTO)
381
382These macros will in combination generate `pb_msgdesc_t`
383structure and associated lists:
384
385    const uint32_t Person_PhoneNumber_field_info[] = { ... };
386    const pb_msgdesc_t * const Person_PhoneNumber_submsg_info[] = { ... };
387    const pb_msgdesc_t Person_PhoneNumber_msg = {
388      2,
389      Person_PhoneNumber_field_info,
390      Person_PhoneNumber_submsg_info,
391      Person_PhoneNumber_DEFAULT,
392      NULL,
393    };
394
395The encoding and decoding functions take a pointer to this structure and
396use it to process each field in the message.
397
398## Oneof
399
400Protocol Buffers supports
401[oneof](https://developers.google.com/protocol-buffers/docs/reference/proto2-spec#oneof_and_oneof_field)
402sections, where only one of the fields contained within can be present. Here is an example of `oneof` usage:
403
404~~~~ protobuf
405message MsgType1 {
406    required int32 value = 1;
407}
408
409message MsgType2 {
410    required bool value = 1;
411}
412
413message MsgType3 {
414    required int32 value1 = 1;
415    required int32 value2 = 2;
416}
417
418message MyMessage {
419    required uint32 uid = 1;
420    required uint32 pid = 2;
421    required uint32 utime = 3;
422
423    oneof payload {
424        MsgType1 msg1 = 4;
425        MsgType2 msg2 = 5;
426        MsgType3 msg3 = 6;
427    }
428}
429~~~~
430
431Nanopb will generate `payload` as a C union and add an additional field
432`which_payload`:
433
434    typedef struct _MyMessage {
435      uint32_t uid;
436      uint32_t pid;
437      uint32_t utime;
438      pb_size_t which_payload;
439      union {
440          MsgType1 msg1;
441          MsgType2 msg2;
442          MsgType3 msg3;
443      } payload;
444    } MyMessage;
445
446`which_payload` indicates which of the `oneof` fields is actually set.
447The user is expected to set the field manually using the correct field
448tag:
449
450    MyMessage msg = MyMessage_init_zero;
451    msg.payload.msg2.value = true;
452    msg.which_payload = MyMessage_msg2_tag;
453
454Notice that neither `which_payload` field nor the unused fields in
455`payload` will consume any space in the resulting encoded message.
456
457When a field inside `oneof` contains `pb_callback_t`
458fields, the callback values cannot be set before decoding. This is
459because the different fields share the same storage space in C
460`union`. Instead either function name bound callbacks or a
461separate message level callback can be used. See
462[tests/oneof_callback](https://github.com/nanopb/nanopb/tree/master/tests/oneof_callback)
463for an example on this.
464
465## Extension fields
466
467Protocol Buffers supports a concept of [extension
468fields](https://developers.google.com/protocol-buffers/docs/proto#extensions),
469which are additional fields to a message, but defined outside the actual
470message. The definition can even be in a completely separate .proto
471file.
472
473The base message is declared as extensible by keyword `extensions` in
474the .proto file:
475
476~~~~ protobuf
477message MyMessage {
478    .. fields ..
479    extensions 100 to 199;
480}
481~~~~
482
483For each extensible message, `nanopb_generator.py` declares an
484additional callback field called `extensions`. The field and associated
485datatype `pb_extension_t` forms a linked list of handlers. When an
486unknown field is encountered, the decoder calls each handler in turn
487until either one of them handles the field, or the list is exhausted.
488
489The actual extensions are declared using the `extend` keyword in the
490.proto, and are in the global namespace:
491
492~~~~ protobuf
493extend MyMessage {
494    optional int32 myextension = 100;
495}
496~~~~
497
498For each extension, `nanopb_generator.py` creates a constant of type
499`pb_extension_type_t`. To link together the base message and the
500extension, you have to:
501
5021.  Allocate storage for your field, matching the datatype in the
503    .proto. For example, for a `int32` field, you need a `int32_t`
504    variable to store the value.
5052.  Create a `pb_extension_t` constant, with pointers to your variable
506    and to the generated `pb_extension_type_t`.
5073.  Set the `message.extensions` pointer to point to the
508    `pb_extension_t`.
509
510An example of this is available in `tests/test_encode_extensions.c`
511and `tests/test_decode_extensions.c`.
512
513## Default values
514
515Protobuf has two syntax variants, proto2 and proto3. Of these proto2 has
516user definable default values that can be given in .proto file:
517
518~~~~ protobuf
519message MyMessage {
520    optional bytes foo = 1 [default = "ABC\x01\x02\x03"];
521    optional string bar = 2 [default = "åäö"];
522}
523~~~~
524
525Nanopb will generate both static and runtime initialization for the
526default values. In `myproto.pb.h` there will be a
527`#define MyMessage_init_default {...}` that can be used to initialize
528whole message into default values:
529
530    MyMessage msg = MyMessage_init_default;
531
532In addition to this, `pb_decode()` will initialize message
533fields to defaults at runtime. If this is not desired,
534`pb_decode_ex()` can be used instead.
535
536## Message framing
537
538Protocol Buffers does not specify a method of framing the messages for
539transmission. This is something that must be provided by the library
540user, as there is no one-size-fits-all solution. Typical needs for a
541framing format are to:
542
5431.  Encode the message length.
5442.  Encode the message type.
5453.  Perform any synchronization and error checking that may be needed
546    depending on application.
547
548For example UDP packets already fulfill all the requirements, and TCP
549streams typically only need a way to identify the message length and
550type. Lower level interfaces such as serial ports may need a more robust
551frame format, such as HDLC (high-level data link control).
552
553Nanopb provides a few helpers to facilitate implementing framing
554formats:
555
5561.  Functions `pb_encode_ex` and `pb_decode_ex` prefix the message
557    data with a varint-encoded length.
5582.  Union messages and oneofs are supported in order to implement
559    top-level container messages.
5603.  Message IDs can be specified using the `(nanopb_msgopt).msgid`
561    option and can then be accessed from the header.
562
563## Return values and error handling
564
565Most functions in nanopb return bool: `true` means success, `false`
566means failure. There is also support for error messages for
567debugging purposes: the error messages go in `stream->errmsg`.
568
569The error messages help in guessing what is the underlying cause of the
570error. The most common error conditions are:
571
5721)  Invalid protocol buffers binary message.
5732)  Mismatch between binary message and .proto message type.
5743)  Unterminated message (incorrect message length).
5754) Exceeding the max_size or bytes_left of a stream.
5765) Exceeding the max_size/max_count of a string or array field
5776) IO errors in your own stream callbacks.
5787) Errors that happen in your callback functions.
5798) Running out of memory, i.e. stack overflow.
5809) Invalid field descriptors (would usually mean a bug in the generator).
581
582## Static assertions
583
584Nanopb code uses static assertions to check size of structures at the compile
585time. The `PB_STATIC_ASSERT` macro is defined in `pb.h`. If ISO C11 standard
586is available, the C standard `_Static_assert` keyword is used, otherwise a
587negative sized array definition trick is used.
588
589Common reasons for static assertion errors are:
590
5911. `FIELDINFO_DOES_NOT_FIT_width2` with `width1` or `width2`:
592    Message that is larger than 256 bytes, but nanopb generator does not detect
593    it for some reason. Often resolved by giving all `.proto` files as argument
594    to `nanopb_generator.py` at the same time, to ensure submessage definitions
595    are found. Alternatively `(nanopb).descriptorsize = DS_4` option can be
596    given manually.
597
5982. `FIELDINFO_DOES_NOT_FIT_width4` with `width4`:
599    Message that is larger than 64 kilobytes. There will be a better error
600    message for this in a future nanopb version, but currently it asserts here.
601    The compile time option `PB_FIELD_32BIT` should be specified either on
602    C compiler command line or by editing `pb.h`. This will increase the sizes
603    of integer types used internally in nanopb code.
604
6053. `DOUBLE_MUST_BE_8_BYTES`:
606    Some platforms, most notably AVR, do not support the 64-bit `double` type,
607    only 32-bit `float`. The compile time option `PB_CONVERT_DOUBLE_FLOAT` can
608    be defined to convert between the types automatically. The conversion
609    results in small rounding errors and takes unnecessary space in transmission,
610    so changing the `.proto` to use `float` type is often better.
611
6124.  `INT64_T_WRONG_SIZE`:
613    The `stdint.h` system header is incorrect for the C compiler being used.
614    This can result from erroneous compiler include path.
615    If the compiler actually does not support 64-bit types, the compile time
616    option `PB_WITHOUT_64BIT` can be used.
617
6185.  `variably modified array size`:
619    The compiler used has problems resolving the array-based static assert at
620    compile time. Try setting the compiler to C11 standard mode if possible.
621    If static assertions cannot be made to work on the compiler used, the
622    compile-time option `PB_NO_STATIC_ASSERT` can be specified to turn them off.
623