1# Nanopb: New features in nanopb 0.4
2
3## What's new in nanopb 0.4
4
5Long in the making, nanopb 0.4 has seen some wide reaching improvements
6in reaction to the development of the rest of the protobuf ecosystem.
7This document showcases features that are not immediately visible, but
8that you may want to take advantage of.
9
10A lot of effort has been spent in retaining backwards and forwards
11compatibility with previous nanopb versions. For a list of breaking
12changes, see [migration document](migration.html)
13
14### New field descriptor format
15
16The basic design of nanopb has always been that the information about
17messages is stored in a compact descriptor format, which is iterated in
18runtime. Initially it was very tightly tied with encoder and decoder
19logic.
20
21In nanopb-0.3.0 the field iteration logic was separated to
22`pb_common.c`. Already at that point it was clear that the old format
23was getting too limited, but it wasn't extended at that time.
24
25Now in 0.4, the descriptor format was completely decoupled from the
26encoder and decoder logic, and redesigned to meet new demands.
27Previously each field was stored as `pb_field_t` struct, which was
28between 8 and 32 bytes in size, depending on compilation options and
29platform. Now information about fields is stored as a variable length
30sequence of `uint32_t` data words. There are 1, 2, 4 and 8 word formats,
31with the 8 word format containing plenty of space for future
32extensibility.
33
34One benefit of the variable length format is that most messages now take
35less storage space. Most fields use 2 words, while simple fields in
36small messages require only 1 word. Benefit is larger if code previously
37required `PB_FIELD_16BIT` or `PB_FIELD_32BIT` options. In
38the `AllTypes` test case, 0.3 had data size of 1008 bytes in
398-bit configuration and 1408 bytes in 16-bit configuration. New format
40in 0.4 takes 896 bytes for either of these.
41
42In addition, the new decoupling has allowed moving most of the field
43descriptor data into FLASH on Harvard architectures, such as AVR.
44Previously nanopb was quite RAM-heavy on AVR, which cannot put normal
45constants in flash like most other platforms do.
46
47### Python packaging for generator
48
49Nanopb generator is now available as a Python package, installable using
50`pip` package manager. This will reduce the need for binary
51packages, as if you have Python already installed you can just
52`pip install nanopb` and have the generator available on path as
53`nanopb_generator`.
54
55The generator can also take advantage of the Python-based `protoc`
56available in `grpcio-tools` Python package. If you also install that,
57there is no longer a need to have binary `protoc` available.
58
59### Generator now automatically calls protoc
60
61Initially, nanopb generator was used in two steps: first calling
62`protoc` to parse the `.proto` file into `.pb` binary
63format, and then calling `nanopb_generator.py` to output the
64`.pb.h` and `.pb.c` files.
65
66Nanopb 0.2.3 added support for running as a `protoc` plugin, which
67allowed single-step generation using `--nanopb_out` parameter. However,
68the plugin mode has two complications: passing options to nanopb
69generator itself becomes more difficult, and the generator does not know
70the actual path of input files. The second limitation has been
71particularly problematic for locating `.options` files.
72
73Both of these older methods still work and will remain supported.
74However, now `nanopb_generator` can also take `.proto` files
75directly and it will transparently call `protoc` in the background.
76
77### Callbacks bound by function name
78
79Since its very beginnings, nanopb has supported field callbacks to allow
80processing structures that are larger than what could fit in memory at
81once. So far the callback functions have been stored in the message
82structure in a `pb_callback_t` struct.
83
84Storing pointers along with user data is somewhat risky from a security
85point of view. In addition it has caused problems with `oneof` fields,
86which reuse the same storage space for multiple submessages. Because
87there is no separate area for each submessage, there is no space to
88store the callback pointers either.
89
90Nanopb-0.4.0 introduces callbacks that are referenced by the function
91name instead of setting the pointers separately. This should work well
92for most applications that have a single callback function for each
93message type. For more complex needs, `pb_callback_t` will also remain
94supported.
95
96Function name callbacks also allow specifying custom data types for
97inclusion in the message structure. For example, you could have
98`MyObject*` pointer along with other message fields, and then process
99that object in custom way in your callback.
100
101This feature is demonstrated in
102[tests/oneof_callback](https://github.com/nanopb/nanopb/tree/master/tests/oneof_callback) test case and
103[examples/network_server](https://github.com/nanopb/nanopb/tree/master/examples/network_server) example.
104
105### Message level callback for oneofs
106
107As mentioned above, callbacks inside submessages inside oneofs have been
108problematic to use. To make using `pb_callback_t`-style callbacks there
109possible, a new generator option `submsg_callback` was added.
110
111Setting this option to true will cause a new message level callback to
112be added before the `which_field` of the oneof. This callback will be
113called when the submessage tag number is known, but before the actual
114message is decoded. The callback can either choose to set callback
115pointers inside the submessage, or just completely decode the submessage
116there and then. If any unread data remains after the callback returns,
117normal submessage decoding will continue.
118
119There is an example of this in [tests/oneof_callback](https://github.com/nanopb/nanopb/tree/master/tests/oneof_callback) test case.
120
121### Binding message types to custom structures
122
123It is often said that good C code is chock full of macros. Or maybe I
124got it wrong. But since nanopb 0.2, the field descriptor generation has
125heavily relied on macros. This allows it to automatically adapt to
126differences in type alignment on different platforms, and to decouple
127the Python generation logic from how the message descriptors are
128implemented on the C side.
129
130Now in 0.4.0, I've made the macros even more abstract. Time will tell
131whether this was such a great idea that I think it is, but now the
132complete list of fields in each message is available in `.pb.h` file.
133This allows a kind of metaprogramming using [X-macros]()
134
135One feature that this can be used for is binding the message descriptor
136to a custom structure or C++ class type. You could have a bunch of other
137fields in the structure and even the datatypes can be different to an
138extent, and nanopb will automatically detect the size and position of
139each field. The generated `.pb.c` files now just have calls of
140`PB_BIND(msgname, structname, width)`. Adding a similar
141call to your own code will bind the message to your own structure.
142
143### UTF-8 validation
144
145Protobuf format defines that strings should consist of valid UTF-8
146codepoints. Previously nanopb has not enforced this, requiring extra
147care in the user code. Now optional UTF-8 validation is available with
148compilation option `PB_VALIDATE_UTF8`.
149
150### Double to float conversion
151
152Some platforms such as `AVR` do not support the `double`
153datatype, instead making it an alias for `float`. This has resulted in
154problems when trying to process message types containing `double` fields
155generated on other machines. There has been an example on how to
156manually perform the conversion between `double` and
157`float`.
158
159Now that example is integrated as an optional feature in nanopb core. By
160defining `PB_CONVERT_DOUBLE_FLOAT`, the required conversion between 32-
161and 64-bit floating point formats happens automatically on decoding and
162encoding.
163
164### Improved testing
165
166Testing on embedded platforms has been integrated in the continuous
167testing environment. Now all of the 80+ test cases are automatically run
168on STM32 and AVR targets. Previously only a few specialized test cases
169were manually tested on embedded systems.
170
171Nanopb fuzzer has also been integrated in Google's [OSSFuzz](https://google.github.io/oss-fuzz/)
172platform, giving a huge boost in the CPU power available for randomized
173testing.
174