1# Nanopb: Basic concepts
2
3The things outlined here are the underlying concepts of the nanopb
4design.
5
6## Proto files
7
8All Protocol Buffers implementations use .proto files to describe the
9message format. The point of these files is to be a portable interface
10description language.
11
12### Compiling .proto files for nanopb
13
14Nanopb comes with a Python script to generate `.pb.c` and
15`.pb.h` files from the `.proto` definition:
16
17    user@host:~$ nanopb/generator/nanopb_generator.py message.proto
18    Writing to message.pb.h and message.pb.c
19
20Internally this script uses Google `protoc` to parse the
21input file. If you do not have it available, you may receive an error
22message. You can install either `grpcio-tools` Python
23package using `pip`, or the `protoc` compiler
24itself from `protobuf-compiler` distribution package.
25Generally the Python package is recommended, because nanopb requires
26protoc version 3.6 or newer to support all features, and some distributions come with an older
27version.
28
29### Modifying generator behaviour
30
31Using generator options, you can set maximum sizes for fields in order
32to allocate them statically. The preferred way to do this is to create
33an .options file with the same name as your .proto file:
34
35    # Foo.proto
36    message Foo {
37       required string name = 1;
38    }
39
40    # Foo.options
41    Foo.name max_size:16
42
43For more information on this, see the [Proto file
44options](reference.html#proto-file-options) section in the reference
45manual.
46
47## Streams
48
49Nanopb uses streams for accessing the data in encoded format. The stream
50abstraction is very lightweight, and consists of a structure
51(`pb_ostream_t` or `pb_istream_t`) which contains a pointer to a
52callback function.
53
54There are a few generic rules for callback functions:
55
561)  Return false on IO errors. The encoding or decoding process will
57    abort immediately.
582)  Use state to store your own data, such as a file descriptor.
593)  `bytes_written` and `bytes_left` are updated by pb_write and
60    pb_read.
614)  Your callback may be used with substreams. In this case
62    `bytes_left`, `bytes_written` and `max_size` have smaller values
63    than the original stream. Don't use these values to calculate
64    pointers.
655)  Always read or write the full requested length of data. For example,
66    POSIX `recv()` needs the `MSG_WAITALL` parameter to accomplish
67    this.
68
69### Output streams
70
71    struct _pb_ostream_t
72    {
73       bool (*callback)(pb_ostream_t *stream, const uint8_t *buf, size_t count);
74       void *state;
75       size_t max_size;
76       size_t bytes_written;
77    };
78
79The `callback` for output stream may be NULL, in which case the stream
80simply counts the number of bytes written. In this case, `max_size` is
81ignored.
82
83Otherwise, if `bytes_written` + bytes_to_be_written is larger than
84`max_size`, pb_write returns false before doing anything else. If you
85don\'t want to limit the size of the stream, pass SIZE_MAX.
86
87**Example 1:**
88
89This is the way to get the size of the message without storing it
90anywhere:
91
92    Person myperson = ...;
93    pb_ostream_t sizestream = {0};
94    pb_encode(&sizestream, Person_fields, &myperson);
95    printf("Encoded size is %d\n", sizestream.bytes_written);
96
97**Example 2:**
98
99Writing to stdout:
100
101    bool callback(pb_ostream_t `stream, const uint8_t `buf, size_t count)
102    {
103       FILE *file = (FILE*) stream->state;
104       return fwrite(buf, 1, count, file) == count;
105    }
106
107    pb_ostream_t stdoutstream = {&callback, stdout, SIZE_MAX, 0};
108
109### Input streams
110
111For input streams, there is one extra rule:
112
1136)  You don't need to know the length of the message in advance. After
114    getting EOF error when reading, set `bytes_left` to 0 and return
115    `false`. `pb_decode()` will detect this and if the EOF was in a proper
116    position, it will return true.
117
118Here is the structure:
119
120    struct _pb_istream_t
121    {
122       bool (*callback)(pb_istream_t *stream, uint8_t *buf, size_t count);
123       void *state;
124       size_t bytes_left;
125    };
126
127The `callback` must always be a function pointer. `Bytes_left` is an
128upper limit on the number of bytes that will be read. You can use
129SIZE_MAX if your callback handles EOF as described above.
130
131**Example:**
132
133This function binds an input stream to stdin:
134
135    bool callback(pb_istream_t *stream, uint8_t *buf, size_t count)
136    {
137       FILE *file = (FILE*)stream->state;
138       bool status;
139
140       if (buf == NULL)
141       {
142           while (count-- && fgetc(file) != EOF);
143           return count == 0;
144       }
145
146       status = (fread(buf, 1, count, file) == count);
147
148       if (feof(file))
149           stream->bytes_left = 0;
150
151       return status;
152    }
153
154    pb_istream_t stdinstream = {&callback, stdin, SIZE_MAX};
155
156## Data types
157
158Most Protocol Buffers datatypes have directly corresponding C datatypes,
159such as `int32` is `int32_t`, `float` is `float` and `bool` is `bool`. However, the
160variable-length datatypes are more complex:
161
1621)  Strings, bytes and repeated fields of any type map to callback
163    functions by default.
1642)  If there is a special option `(nanopb).max_size` specified in the
165    .proto file, string maps to null-terminated char array and bytes map
166    to a structure containing a char array and a size field.
1673)  If `(nanopb).fixed_length` is set to `true` and
168    `(nanopb).max_size` is also set, then bytes map to an inline byte
169    array of fixed size.
1704)  If there is a special option `(nanopb).max_count` specified on a
171    repeated field, it maps to an array of whatever type is being
172    repeated. Another field will be created for the actual number of
173    entries stored.
1745)  If `(nanopb).fixed_count` is set to `true` and
175    `(nanopb).max_count` is also set, the field for the actual number
176    of entries will not by created as the count is always assumed to be
177    max count.
178
179### Examples of .proto specifications vs. generated structure
180
181**Simple integer field:**\
182.proto: `int32 age = 1;`\
183.pb.h: `int32_t age;`
184
185**String with unknown length:**\
186.proto: `string name = 1;`\
187.pb.h: `pb_callback_t name;`
188
189**String with known maximum length:**\
190.proto: `string name = 1 [(nanopb).max_length = 40];`\
191.pb.h: `char name[41];`
192
193**Repeated string with unknown count:**\
194.proto: `repeated string names = 1;`\
195.pb.h: `pb_callback_t names;`
196
197**Repeated string with known maximum count and size:**\
198.proto: `repeated string names = 1 [(nanopb).max_length = 40, (nanopb).max_count = 5];`\
199.pb.h: `size_t names_count;` `char names[5][41];`
200
201**Bytes field with known maximum size:**\
202.proto: `bytes data = 1 [(nanopb).max_size = 16];`\
203.pb.h: `PB_BYTES_ARRAY_T(16) data;`, where the struct contains `{pb_size_t size; pb_byte_t bytes[n];}`
204
205**Bytes field with fixed length:**\
206.proto: `bytes data = 1 [(nanopb).max_size = 16, (nanopb).fixed_length = true];`\
207.pb.h: `pb_byte_t data[16];`
208
209**Repeated integer array with known maximum size:**\
210.proto: `repeated int32 numbers = 1 [(nanopb).max_count = 5];`\
211.pb.h: `pb_size_t numbers_count;` `int32_t numbers[5];`
212
213**Repeated integer array with fixed count:**\
214.proto: `repeated int32 numbers = 1 [(nanopb).max_count = 5, (nanopb).fixed_count = true];`\
215.pb.h: `int32_t numbers[5];`
216
217The maximum lengths are checked in runtime. If string/bytes/array
218exceeds the allocated length, `pb_decode()` will return false.
219
220> **Note:** For the `bytes` datatype, the field length checking may not be
221exact. The compiler may add some padding to the `pb_bytes_t`
222structure, and the nanopb runtime doesn't know how much of the
223structure size is padding. Therefore it uses the whole length of the
224structure for storing data, which is not very smart but shouldn't cause
225problems. In practise, this means that if you specify
226`(nanopb).max_size=5` on a `bytes` field, you may be able to store 6
227bytes there. For the `string` field type, the length limit is exact.
228
229> **Note:** The decoder only keeps track of one `fixed_count` repeated field at a time. Usually this it not an issue because all elements of a repeated field occur end-to-end. Interleaved array elements of several `fixed_count` repeated fields would be a valid protobuf message, but would get rejected by nanopb decoder with error `"wrong size for fixed count field"`.
230
231## Field callbacks
232
233When a field has dynamic length, nanopb cannot statically allocate
234storage for it. Instead, it allows you to handle the field in whatever
235way you want, using a callback function.
236
237The [pb_callback_t](reference.html#pb-callback-t) structure contains a
238function pointer and a `void` pointer called `arg` you can use for
239passing data to the callback. If the function pointer is NULL, the field
240will be skipped. A pointer to the `arg` is passed to the function, so
241that it can modify it and retrieve the value.
242
243The actual behavior of the callback function is different in encoding
244and decoding modes. In encoding mode, the callback is called once and
245should write out everything, including field tags. In decoding mode, the
246callback is called repeatedly for every data item.
247
248To write more complex field callbacks, it is recommended to read the
249[Google Protobuf Encoding Specification](https://developers.google.com/protocol-buffers/docs/encoding).
250
251### Encoding callbacks
252
253    bool (*encode)(pb_ostream_t *stream, const pb_field_iter_t *field, void * const *arg);
254
255|            |                                                                    |
256| ---------- | ------------------------------------------------------------------ |
257| `stream`   | Output stream to write to                                          |
258| `field`    | Iterator for the field currently being encoded or decoded.         |
259| `arg`      | Pointer to the `arg` field in the `pb_callback_t` structure.       |
260
261When encoding, the callback should write out complete fields, including
262the wire type and field number tag. It can write as many or as few
263fields as it likes. For example, if you want to write out an array as
264`repeated` field, you should do it all in a single call.
265
266Usually you can use [pb_encode_tag_for_field](reference.html#pb-encode-tag-for-field) to
267encode the wire type and tag number of the field. However, if you want
268to encode a repeated field as a packed array, you must call
269[pb_encode_tag](reference.html#pb-encode-tag) instead to specify a
270wire type of `PB_WT_STRING`.
271
272If the callback is used in a submessage, it will be called multiple
273times during a single call to [pb_encode](reference.html#pb-encode). In
274this case, it must produce the same amount of data every time. If the
275callback is directly in the main message, it is called only once.
276
277This callback writes out a dynamically sized string:
278
279    bool write_string(pb_ostream_t *stream, const pb_field_iter_t *field, void * const *arg)
280    {
281        char *str = get_string_from_somewhere();
282        if (!pb_encode_tag_for_field(stream, field))
283            return false;
284
285        return pb_encode_string(stream, (uint8_t*)str, strlen(str));
286    }
287
288### Decoding callbacks
289
290    bool (*decode)(pb_istream_t *stream, const pb_field_iter_t *field, void **arg);
291
292|            |                                                                    |
293| ---------- | ------------------------------------------------------------------ |
294| `stream`   | Input stream to read from                                          |
295| `field`    | Iterator for the field currently being encoded or decoded.         |
296| `arg`      | Pointer to the `arg` field in the `pb_callback_t` structure.       |
297
298When decoding, the callback receives a length-limited substring that
299reads the contents of a single field. The field tag has already been
300read. For `string` and `bytes`, the length value has already been
301parsed, and is available at `stream->bytes_left`.
302
303The callback will be called multiple times for repeated fields. For
304packed fields, you can either read multiple values until the stream
305ends, or leave it to [pb_decode](reference.html#pb-decode) to call your
306function over and over until all values have been read.
307
308This callback reads multiple integers and prints them:
309
310    bool read_ints(pb_istream_t *stream, const pb_field_iter_t *field, void **arg)
311    {
312        while (stream->bytes_left)
313        {
314            uint64_t value;
315            if (!pb_decode_varint(stream, &value))
316                return false;
317            printf("%lld\n", value);
318        }
319        return true;
320    }
321
322### Function name bound callbacks
323
324    bool MyMessage_callback(pb_istream_t *istream, pb_ostream_t *ostream, const pb_field_iter_t *field);
325
326|            |                                                                    |
327| ---------- | ------------------------------------------------------------------ |
328| `istream`  | Input stream to read from, or NULL if called in encoding context.  |
329| `ostream`  | Output stream to write to, or NULL if called in decoding context.  |
330| `field`    | Iterator for the field currently being encoded or decoded.         |
331
332Storing function pointer in `pb_callback_t` fields inside
333the message requires extra storage space and is often cumbersome. As an
334alternative, the generator options `callback_function` and
335`callback_datatype` can be used to bind a callback function
336based on its name.
337
338Typically this feature is used by setting
339`callback_datatype` to e.g. `void\*` or other
340data type used for callback state. Then the generator will automatically
341set `callback_function` to
342`MessageName_callback` and produce a prototype for it in
343generated `.pb.h`. By implementing this function in your own
344code, you will receive callbacks for fields without having to separately
345set function pointers.
346
347If you want to use function name bound callbacks for some fields and
348`pb_callback_t` for other fields, you can call
349`pb_default_field_callback` from the message-level
350callback. It will then read a function pointer from
351`pb_callback_t` and call it.
352
353## Message descriptor
354
355For using the `pb_encode()` and `pb_decode()` functions, you need a
356description of all the fields contained in a message. This description
357is usually autogenerated from .proto file.
358
359For example this submessage in the Person.proto file:
360
361~~~~ protobuf
362message Person {
363    message PhoneNumber {
364        required string number = 1 [(nanopb).max_size = 40];
365        optional PhoneType type = 2 [default = HOME];
366    }
367}
368~~~~
369
370This in turn generates a macro list in the `.pb.h` file:
371
372    #define Person_PhoneNumber_FIELDLIST(X, a) \
373    X(a, STATIC,   REQUIRED, STRING,   number,            1) \
374    X(a, STATIC,   OPTIONAL, UENUM,    type,              2)
375
376Inside the `.pb.c` file there is a macro call to
377`PB_BIND`:
378
379    PB_BIND(Person_PhoneNumber, Person_PhoneNumber, AUTO)
380
381These macros will in combination generate `pb_msgdesc_t`
382structure and associated lists:
383
384    const uint32_t Person_PhoneNumber_field_info[] = { ... };
385    const pb_msgdesc_t * const Person_PhoneNumber_submsg_info[] = { ... };
386    const pb_msgdesc_t Person_PhoneNumber_msg = {
387      2,
388      Person_PhoneNumber_field_info,
389      Person_PhoneNumber_submsg_info,
390      Person_PhoneNumber_DEFAULT,
391      NULL,
392    };
393
394The encoding and decoding functions take a pointer to this structure and
395use it to process each field in the message.
396
397## Oneof
398
399Protocol Buffers supports
400[oneof](https://developers.google.com/protocol-buffers/docs/reference/proto2-spec#oneof_and_oneof_field)
401sections, where only one of the fields contained within can be present. Here is an example of `oneof` usage:
402
403~~~~ protobuf
404message MsgType1 {
405    required int32 value = 1;
406}
407
408message MsgType2 {
409    required bool value = 1;
410}
411
412message MsgType3 {
413    required int32 value1 = 1;
414    required int32 value2 = 2;
415}
416
417message MyMessage {
418    required uint32 uid = 1;
419    required uint32 pid = 2;
420    required uint32 utime = 3;
421
422    oneof payload {
423        MsgType1 msg1 = 4;
424        MsgType2 msg2 = 5;
425        MsgType3 msg3 = 6;
426    }
427}
428~~~~
429
430Nanopb will generate `payload` as a C union and add an additional field
431`which_payload`:
432
433    typedef struct _MyMessage {
434      uint32_t uid;
435      uint32_t pid;
436      uint32_t utime;
437      pb_size_t which_payload;
438      union {
439          MsgType1 msg1;
440          MsgType2 msg2;
441          MsgType3 msg3;
442      } payload;
443    } MyMessage;
444
445`which_payload` indicates which of the `oneof` fields is actually set.
446The user is expected to set the field manually using the correct field
447tag:
448
449    MyMessage msg = MyMessage_init_zero;
450    msg.payload.msg2.value = true;
451    msg.which_payload = MyMessage_msg2_tag;
452
453Notice that neither `which_payload` field nor the unused fields in
454`payload` will consume any space in the resulting encoded message.
455
456When a field inside `oneof` contains `pb_callback_t`
457fields, the callback values cannot be set before decoding. This is
458because the different fields share the same storage space in C
459`union`. Instead either function name bound callbacks or a
460separate message level callback can be used. See
461[tests/oneof_callback](https://github.com/nanopb/nanopb/tree/master/tests/oneof_callback)
462for an example on this.
463
464## Extension fields
465
466Protocol Buffers supports a concept of [extension
467fields](https://developers.google.com/protocol-buffers/docs/proto#extensions),
468which are additional fields to a message, but defined outside the actual
469message. The definition can even be in a completely separate .proto
470file.
471
472The base message is declared as extensible by keyword `extensions` in
473the .proto file:
474
475~~~~ protobuf
476message MyMessage {
477    .. fields ..
478    extensions 100 to 199;
479}
480~~~~
481
482For each extensible message, `nanopb_generator.py` declares an
483additional callback field called `extensions`. The field and associated
484datatype `pb_extension_t` forms a linked list of handlers. When an
485unknown field is encountered, the decoder calls each handler in turn
486until either one of them handles the field, or the list is exhausted.
487
488The actual extensions are declared using the `extend` keyword in the
489.proto, and are in the global namespace:
490
491~~~~ protobuf
492extend MyMessage {
493    optional int32 myextension = 100;
494}
495~~~~
496
497For each extension, `nanopb_generator.py` creates a constant of type
498`pb_extension_type_t`. To link together the base message and the
499extension, you have to:
500
5011.  Allocate storage for your field, matching the datatype in the
502    .proto. For example, for a `int32` field, you need a `int32_t`
503    variable to store the value.
5042.  Create a `pb_extension_t` constant, with pointers to your variable
505    and to the generated `pb_extension_type_t`.
5063.  Set the `message.extensions` pointer to point to the
507    `pb_extension_t`.
508
509An example of this is available in `tests/test_encode_extensions.c`
510and `tests/test_decode_extensions.c`.
511
512## Default values
513
514Protobuf has two syntax variants, proto2 and proto3. Of these proto2 has
515user definable default values that can be given in .proto file:
516
517~~~~ protobuf
518message MyMessage {
519    optional bytes foo = 1 [default = "ABC\x01\x02\x03"];
520    optional string bar = 2 [default = "åäö"];
521}
522~~~~
523
524Nanopb will generate both static and runtime initialization for the
525default values. In `myproto.pb.h` there will be a
526`#define MyMessage_init_default {...}` that can be used to initialize
527whole message into default values:
528
529    MyMessage msg = MyMessage_init_default;
530
531In addition to this, `pb_decode()` will initialize message
532fields to defaults at runtime. If this is not desired,
533`pb_decode_ex()` can be used instead.
534
535## Message framing
536
537Protocol Buffers does not specify a method of framing the messages for
538transmission. This is something that must be provided by the library
539user, as there is no one-size-fits-all solution. Typical needs for a
540framing format are to:
541
5421.  Encode the message length.
5432.  Encode the message type.
5443.  Perform any synchronization and error checking that may be needed
545    depending on application.
546
547For example UDP packets already fullfill all the requirements, and TCP
548streams typically only need a way to identify the message length and
549type. Lower level interfaces such as serial ports may need a more robust
550frame format, such as HDLC (high-level data link control).
551
552Nanopb provides a few helpers to facilitate implementing framing
553formats:
554
5551.  Functions `pb_encode_ex` and `pb_decode_ex` prefix the message
556    data with a varint-encoded length.
5572.  Union messages and oneofs are supported in order to implement
558    top-level container messages.
5593.  Message IDs can be specified using the `(nanopb_msgopt).msgid`
560    option and can then be accessed from the header.
561
562## Return values and error handling
563
564Most functions in nanopb return bool: `true` means success, `false`
565means failure. There is also support for error messages for
566debugging purposes: the error messages go in `stream->errmsg`.
567
568The error messages help in guessing what is the underlying cause of the
569error. The most common error conditions are:
570
5711)  Invalid protocol buffers binary message.
5722)  Mismatch between binary message and .proto message type.
5733)  Unterminated message (incorrect message length).
5744) Exceeding the max_size or bytes_left of a stream.
5755) Exceeding the max_size/max_count of a string or array field
5766) IO errors in your own stream callbacks.
5777) Errors that happen in your callback functions.
5788) Running out of memory, i.e. stack overflow.
5799) Invalid field descriptors (would usually mean a bug in the generator).
580