1# Nanopb: New features in nanopb 0.4 2 3## What's new in nanopb 0.4 4 5Long in the making, nanopb 0.4 has seen some wide reaching improvements 6in reaction to the development of the rest of the protobuf ecosystem. 7This document showcases features that are not immediately visible, but 8that you may want to take advantage of. 9 10A lot of effort has been spent in retaining backwards and forwards 11compatibility with previous nanopb versions. For a list of breaking 12changes, see [migration document](migration.html) 13 14### New field descriptor format 15 16The basic design of nanopb has always been that the information about 17messages is stored in a compact descriptor format, which is iterated in 18runtime. Initially it was very tightly tied with encoder and decoder 19logic. 20 21In nanopb-0.3.0 the field iteration logic was separated to 22`pb_common.c`. Already at that point it was clear that the old format 23was getting too limited, but it wasn't extended at that time. 24 25Now in 0.4, the descriptor format was completely decoupled from the 26encoder and decoder logic, and redesigned to meet new demands. 27Previously each field was stored as `pb_field_t` struct, which was 28between 8 and 32 bytes in size, depending on compilation options and 29platform. Now information about fields is stored as a variable length 30sequence of `uint32_t` data words. There are 1, 2, 4 and 8 word formats, 31with the 8 word format containing plenty of space for future 32extensibility. 33 34One benefit of the variable length format is that most messages now take 35less storage space. Most fields use 2 words, while simple fields in 36small messages require only 1 word. Benefit is larger if code previously 37required `PB_FIELD_16BIT` or `PB_FIELD_32BIT` options. In 38the `AllTypes` test case, 0.3 had data size of 1008 bytes in 398-bit configuration and 1408 bytes in 16-bit configuration. New format 40in 0.4 takes 896 bytes for either of these. 41 42In addition, the new decoupling has allowed moving most of the field 43descriptor data into FLASH on Harvard architectures, such as AVR. 44Previously nanopb was quite RAM-heavy on AVR, which cannot put normal 45constants in flash like most other platforms do. 46 47### Python packaging for generator 48 49Nanopb generator is now available as a Python package, installable using 50`pip` package manager. This will reduce the need for binary 51packages, as if you have Python already installed you can just 52`pip install nanopb` and have the generator available on path as 53`nanopb_generator`. 54 55The generator can also take advantage of the Python-based `protoc` 56available in `grpcio-tools` Python package. If you also install that, 57there is no longer a need to have binary `protoc` available. 58 59### Generator now automatically calls protoc 60 61Initially, nanopb generator was used in two steps: first calling 62`protoc` to parse the `.proto` file into `.pb` binary 63format, and then calling `nanopb_generator.py` to output the 64`.pb.h` and `.pb.c` files. 65 66Nanopb 0.2.3 added support for running as a `protoc` plugin, which 67allowed single-step generation using `--nanopb_out` parameter. However, 68the plugin mode has two complications: passing options to nanopb 69generator itself becomes more difficult, and the generator does not know 70the actual path of input files. The second limitation has been 71particularly problematic for locating `.options` files. 72 73Both of these older methods still work and will remain supported. 74However, now `nanopb_generator` can also take `.proto` files 75directly and it will transparently call `protoc` in the background. 76 77### Callbacks bound by function name 78 79Since its very beginnings, nanopb has supported field callbacks to allow 80processing structures that are larger than what could fit in memory at 81once. So far the callback functions have been stored in the message 82structure in a `pb_callback_t` struct. 83 84Storing pointers along with user data is somewhat risky from a security 85point of view. In addition it has caused problems with `oneof` fields, 86which reuse the same storage space for multiple submessages. Because 87there is no separate area for each submessage, there is no space to 88store the callback pointers either. 89 90Nanopb-0.4.0 introduces callbacks that are referenced by the function 91name instead of setting the pointers separately. This should work well 92for most applications that have a single callback function for each 93message type. For more complex needs, `pb_callback_t` will also remain 94supported. 95 96Function name callbacks also allow specifying custom data types for 97inclusion in the message structure. For example, you could have 98`MyObject*` pointer along with other message fields, and then process 99that object in custom way in your callback. 100 101This feature is demonstrated in 102[tests/oneof_callback](https://github.com/nanopb/nanopb/tree/master/tests/oneof_callback) test case and 103[examples/network_server](https://github.com/nanopb/nanopb/tree/master/examples/network_server) example. 104 105### Message level callback for oneofs 106 107As mentioned above, callbacks inside submessages inside oneofs have been 108problematic to use. To make using `pb_callback_t`-style callbacks there 109possible, a new generator option `submsg_callback` was added. 110 111Setting this option to true will cause a new message level callback to 112be added before the `which_field` of the oneof. This callback will be 113called when the submessage tag number is known, but before the actual 114message is decoded. The callback can either choose to set callback 115pointers inside the submessage, or just completely decode the submessage 116there and then. If any unread data remains after the callback returns, 117normal submessage decoding will continue. 118 119There is an example of this in [tests/oneof_callback](https://github.com/nanopb/nanopb/tree/master/tests/oneof_callback) test case. 120 121### Binding message types to custom structures 122 123It is often said that good C code is chock full of macros. Or maybe I 124got it wrong. But since nanopb 0.2, the field descriptor generation has 125heavily relied on macros. This allows it to automatically adapt to 126differences in type alignment on different platforms, and to decouple 127the Python generation logic from how the message descriptors are 128implemented on the C side. 129 130Now in 0.4.0, I've made the macros even more abstract. Time will tell 131whether this was such a great idea that I think it is, but now the 132complete list of fields in each message is available in `.pb.h` file. 133This allows a kind of metaprogramming using [X-macros]() 134 135One feature that this can be used for is binding the message descriptor 136to a custom structure or C++ class type. You could have a bunch of other 137fields in the structure and even the datatypes can be different to an 138extent, and nanopb will automatically detect the size and position of 139each field. The generated `.pb.c` files now just have calls of 140`PB_BIND(msgname, structname, width)`. Adding a similar 141call to your own code will bind the message to your own structure. 142 143### UTF-8 validation 144 145Protobuf format defines that strings should consist of valid UTF-8 146codepoints. Previously nanopb has not enforced this, requiring extra 147care in the user code. Now optional UTF-8 validation is available with 148compilation option `PB_VALIDATE_UTF8`. 149 150### Double to float conversion 151 152Some platforms such as `AVR` do not support the `double` 153datatype, instead making it an alias for `float`. This has resulted in 154problems when trying to process message types containing `double` fields 155generated on other machines. There has been an example on how to 156manually perform the conversion between `double` and 157`float`. 158 159Now that example is integrated as an optional feature in nanopb core. By 160defining `PB_CONVERT_DOUBLE_FLOAT`, the required conversion between 32- 161and 64-bit floating point formats happens automatically on decoding and 162encoding. 163 164### Improved testing 165 166Testing on embedded platforms has been integrated in the continuous 167testing environment. Now all of the 80+ test cases are automatically run 168on STM32 and AVR targets. Previously only a few specialized test cases 169were manually tested on embedded systems. 170 171Nanopb fuzzer has also been integrated in Google's [OSSFuzz](https://google.github.io/oss-fuzz/) 172platform, giving a huge boost in the CPU power available for randomized 173testing. 174