nanopb-3.4.0/docs/whats_new.md

# Nanopb: New features in nanopb 0.4

## What's new in nanopb 0.4

Long in the making, nanopb 0.4 has seen some wide reaching improvements
in reaction to the development of the rest of the protobuf ecosystem.
This document showcases features that are not immediately visible, but
that you may want to take advantage of.

A lot of effort has been spent in retaining backwards and forwards
compatibility with previous nanopb versions. For a list of breaking
changes, see [migration document](migration.html)

### New field descriptor format

The basic design of nanopb has always been that the information about
messages is stored in a compact descriptor format, which is iterated in
runtime. Initially it was very tightly tied with encoder and decoder
logic.

In nanopb-0.3.0 the field iteration logic was separated to
`pb_common.c`. Already at that point it was clear that the old format
was getting too limited, but it wasn't extended at that time.

Now in 0.4, the descriptor format was completely decoupled from the
encoder and decoder logic, and redesigned to meet new demands.
Previously each field was stored as `pb_field_t` struct, which was
between 8 and 32 bytes in size, depending on compilation options and
platform. Now information about fields is stored as a variable length
sequence of `uint32_t` data words. There are 1, 2, 4 and 8 word formats,
with the 8 word format containing plenty of space for future
extensibility.

One benefit of the variable length format is that most messages now take
less storage space. Most fields use 2 words, while simple fields in
small messages require only 1 word. Benefit is larger if code previously
required `PB_FIELD_16BIT` or `PB_FIELD_32BIT` options. In
the `AllTypes` test case, 0.3 had data size of 1008 bytes in
8-bit configuration and 1408 bytes in 16-bit configuration. New format
in 0.4 takes 896 bytes for either of these.

In addition, the new decoupling has allowed moving most of the field
descriptor data into FLASH on Harvard architectures, such as AVR.
Previously nanopb was quite RAM-heavy on AVR, which cannot put normal
constants in flash like most other platforms do.

### Python packaging for generator

Nanopb generator is now available as a Python package, installable using
`pip` package manager. This will reduce the need for binary
packages, as if you have Python already installed you can just
`pip install nanopb` and have the generator available on path as
`nanopb_generator`.

The generator can also take advantage of the Python-based `protoc`
available in `grpcio-tools` Python package. If you also install that,
there is no longer a need to have binary `protoc` available.

### Generator now automatically calls protoc

Initially, nanopb generator was used in two steps: first calling
`protoc` to parse the `.proto` file into `.pb` binary
format, and then calling `nanopb_generator.py` to output the
`.pb.h` and `.pb.c` files.

Nanopb 0.2.3 added support for running as a `protoc` plugin, which
allowed single-step generation using `--nanopb_out` parameter. However,
the plugin mode has two complications: passing options to nanopb
generator itself becomes more difficult, and the generator does not know
the actual path of input files. The second limitation has been
particularly problematic for locating `.options` files.

Both of these older methods still work and will remain supported.
However, now `nanopb_generator` can also take `.proto` files
directly and it will transparently call `protoc` in the background.

### Callbacks bound by function name

Since its very beginnings, nanopb has supported field callbacks to allow
processing structures that are larger than what could fit in memory at
once. So far the callback functions have been stored in the message
structure in a `pb_callback_t` struct.

Storing pointers along with user data is somewhat risky from a security
point of view. In addition it has caused problems with `oneof` fields,
which reuse the same storage space for multiple submessages. Because
there is no separate area for each submessage, there is no space to
store the callback pointers either.

Nanopb-0.4.0 introduces callbacks that are referenced by the function
name instead of setting the pointers separately. This should work well
for most applications that have a single callback function for each
message type. For more complex needs, `pb_callback_t` will also remain
supported.

Function name callbacks also allow specifying custom data types for
inclusion in the message structure. For example, you could have
`MyObject*` pointer along with other message fields, and then process
that object in custom way in your callback.

This feature is demonstrated in
[tests/oneof_callback](https://github.com/nanopb/nanopb/tree/master/tests/oneof_callback) test case and
[examples/network_server](https://github.com/nanopb/nanopb/tree/master/examples/network_server) example.

### Message level callback for oneofs

As mentioned above, callbacks inside submessages inside oneofs have been
problematic to use. To make using `pb_callback_t`-style callbacks there
possible, a new generator option `submsg_callback` was added.

Setting this option to true will cause a new message level callback to
be added before the `which_field` of the oneof. This callback will be
called when the submessage tag number is known, but before the actual
message is decoded. The callback can either choose to set callback
pointers inside the submessage, or just completely decode the submessage
there and then. If any unread data remains after the callback returns,
normal submessage decoding will continue.

There is an example of this in [tests/oneof_callback](https://github.com/nanopb/nanopb/tree/master/tests/oneof_callback) test case.

### Binding message types to custom structures

It is often said that good C code is chock full of macros. Or maybe I
got it wrong. But since nanopb 0.2, the field descriptor generation has
heavily relied on macros. This allows it to automatically adapt to
differences in type alignment on different platforms, and to decouple
the Python generation logic from how the message descriptors are
implemented on the C side.

Now in 0.4.0, I've made the macros even more abstract. Time will tell
whether this was such a great idea that I think it is, but now the
complete list of fields in each message is available in `.pb.h` file.
This allows a kind of metaprogramming using [X-macros]()

One feature that this can be used for is binding the message descriptor
to a custom structure or C++ class type. You could have a bunch of other
fields in the structure and even the datatypes can be different to an
extent, and nanopb will automatically detect the size and position of
each field. The generated `.pb.c` files now just have calls of
`PB_BIND(msgname, structname, width)`. Adding a similar
call to your own code will bind the message to your own structure.

### UTF-8 validation

Protobuf format defines that strings should consist of valid UTF-8
codepoints. Previously nanopb has not enforced this, requiring extra
care in the user code. Now optional UTF-8 validation is available with
compilation option `PB_VALIDATE_UTF8`.

### Double to float conversion

Some platforms such as `AVR` do not support the `double`
datatype, instead making it an alias for `float`. This has resulted in
problems when trying to process message types containing `double` fields
generated on other machines. There has been an example on how to
manually perform the conversion between `double` and
`float`.

Now that example is integrated as an optional feature in nanopb core. By
defining `PB_CONVERT_DOUBLE_FLOAT`, the required conversion between 32-
and 64-bit floating point formats happens automatically on decoding and
encoding.

### Improved testing

Testing on embedded platforms has been integrated in the continuous
testing environment. Now all of the 80+ test cases are automatically run
on STM32 and AVR targets. Previously only a few specialized test cases
were manually tested on embedded systems.

Nanopb fuzzer has also been integrated in Google's [OSSFuzz](https://google.github.io/oss-fuzz/)
platform, giving a huge boost in the CPU power available for randomized
testing.