1zcbor architecture 2===================== 3 4Since zcbor is a Python script that generates C code, this document is split into two sections: 5 61. Architecture of the Python script 72. Architecture of the generated code 8 9Architecture of the Python script 10================================= 11 12The `zcbor.py` script is located in [zcbor/zcbor.py](zcbor/zcbor.py). 13 14The functionality is spread across 5 classes: 15 161. CddlParser 172. CddlXcoder (inherits from CddlParser) 183. DataTranslator (inherits from CddlXcoder) 194. CodeGenerator (inherits from CddlXcoder) 205. CodeRenderer 21 22CddlParser 23---------- 24 25Each CddlParser object represents a CDDL type. 26Since CDDL types can contain other types, CddlParser recursively parses a CDDL string, and spawns new instances of itself to represent the (child) types it contains. 27The two most important member variables in CddlParser are `self.value` and `self.type`. 28`self.type` is a string representing the base CDDL type, the options are (corresponding CBOR types are in the form of #majortype.val): 29 30 - `"UINT"` (#0) 31 - `"INT"` (#0 or #1) 32 - `"NINT"` (#1) 33 - `"BSTR"` (#2) 34 - `"TSTR"` (#3) 35 - `"FLOAT"` (#7.25, #7.26 or #7.27) 36 - `"BOOL"` (#7.20 or #7.21) 37 - `"NIL"` (#7.22) 38 - `"UNDEF"` (#7.23) 39 - `"LIST"` (#4) 40 - `"MAP"` (#5) 41 - `"GROUP"` (N/A) 42 - `"UNION"` (N/A) 43 - `"ANY"` (#0 - #5 or #7) 44 - `"OTHER"` (N/A) 45 46`"OTHER"` means another type defined in the same document, and is used as a pointer to that type definition. 47The CDDL code that can give rise to these are described in the [README](README.md). 48 49`self.value` means different things for different values of `self.type`. 50E.g. for `"UINT"`, `self.value` has the value dictated by the type, or `None` if different values are allowed, so in the following example, `Foo` will have a `self.value` of 5, and `Bar` will have `None`. 51 52```cddl 53Foo = 5 54Bar = uint 55``` 56 57For container types, i.e `"LIST"`, `"MAP"`, `"GROUP"`, and `"UNION"`, `self.value` contains a list of their contents. 58The code usually refers to the elements/contents in `self.value` as "children". 59For `"OTHER"`, `self.value` is a string with the name of the type it refers to. 60The following example shows use of both container and `"OTHER"` types. 61 62```cddl 63Foo = uint 64Bar = [*Foo] 65``` 66 67This will spawn 3 CddlParser objects: 68 691. `Foo`, which has `self.type = "UINT"` and `self.value = None` 702. An anonymous child of Bar, which has `self.type = "OTHER"`, and `self.value = "Foo"` 713. Bar, which has `self.type = "LIST"`, and `self.value` is a python `list` containing the above object. 72 73CDDL supports many other constraints on the types, and these all have member variables in CddlParser, e.g. `self.min_qty` and `self.max_qty` which are the minimum and maximum quantity/repetitions of this type. 74 75Children of `"MAP"` objects come in key/value pairs. 76These are represented such that the values will be children of the `"MAP"` object, and the keys can be found as `self.key` in these children. 77 78There is a member called `self.my_types`, which is a dict of all the types defined in the CDDL file. 79The elements are on the form `<name>: <CddlParser object>`. 80`"OTHER"` objects will look into `self.my_types[self.value]` to find its own definition. 81 82The actual parsing of the CDDL happens with regular expressions. 83There are one or more expressions for each base type. 84The expressions consume a number of characters from the input string, and also capture a substring to use as the value of the type. 85For container types, the regex will match the whole type, and then recursively parse the children within the matched string. 86 87CddlXcoder 88---------- 89 90CddlXcoder inherits from CddlParser, and provides common functionality for DataTranslator and CodeGenerator. 91 92Most of the functionality falls into one of two categories: 93 94- Common names for members in generated code. A single type possibly needs multiple member variables in the generated code to describe it, like 95 - the value 96 - the key associated with this value 97 - the number of times it repeats 98- Condition functions that make inferences about the type based on the member variables in CddlParser, like: 99 - key_var_condition(): Whether it needs a key member 100 - is_unambiguous(): Whether the type is completely specified, i.e. whether we know beforehand exactly how the encoding will look (e.g. `Foo = 5`). 101 102DataTranslator 103----------- 104 105DataTranslator is for handling and manipulating CBOR on the "host". 106For example, the user can compose data in YAML or JSON files and have them converted to CBOR and validated against the CDDL. 107Or they can decode binary CBOR and write python code to inspect it, or just convert it back into YAML or JSON. 108 109DataTranslator inherits from CddlXcoder and allows converting data between a number of different representations like CBOR/YAML/JSON strings, but also internal Python representations. 110While the conversion is happening, the data is validated against the CDDL description. 111 112This class relies heavily on decoding libraries for CBOR/YAML/JSON: 113 114- [cbor2](https://pypi.org/project/cbor2/) 115- [PyYAML](https://pypi.org/project/PyYAML/) 116- [json](https://docs.python.org/3/library/json.html) 117 118All three use the same internal representation of the decoded data, so it's trivial to convert between them. 119The representation for all three is 1-to-1 with the corresponding Python types, (list -> list, map -> dict, uint -> int, bstr -> bytes etc.). 120In addition, the following proprietary Python classes are used: `cbor2.CBORTag` for CBOR tags, `cbor2.undefined` for CBOR `undefined` values, and `cbor2.CBORSimpleValue` for CBOR simple values (#7.0 -> #7.255 excluding bools, nil, and undefined). 121 122One caveat is that CBOR supports more features than YAML/JSON, namely: 123 124- non-text map keys 125- byte strings 126- tags 127 128zcbor allows creating bespoke representations via `--yaml-compatibility`, see the README or CLI docs for more info. 129 130Finally, DataTranslator can also generate a separate internal representation using `namedtuple`s to allow browsing CBOR data by the names given in the CDDL. 131(This is more analogous to how the data is accessed in the C code.) 132 133DataTranslator functionality is tested in [tests/scripts](tests/scripts) 134 135CodeGenerator 136------------- 137 138CodeGenerator, like DataTranslator, inherits from CddlXcoder. 139Its primary purpose is to construct the individual decoding/encoding functions for the types specified in the given CDDL document. 140It also constructs struct definitions used to hold the decoded data/data to be encoded. 141 142CodeGenerator contains optimizations to reduce both the verbosity of the code and the level of indirection in the types. 143For example: 144 - If the type is unambiguous (i.e. its value is predetermined, like in `Foo = 5`), the code will validate it, but CodeGenerator won't include the actual value in the encompassing struct definition. 145 - If a `"GROUP"` or `"UNION"` has only one child, it can be removed as a level of indirection. 146 - If the type needs only a single member variable (i.e. no additional `foo_count` or `foo_key` etc.), that variable can instead be added to the parent struct, and its decoding/encoding code moved into the parent's function. 147 - `"UNION"` are typically implemented as anonymous `union`s which removes one level of indirection when accessing them. 148 149A CodeGenerator object operates in one of two modes: `"encode"` or `"decode"`. 150The generated code for the two is not very different, but they call into different libraries. 151 152Base types, like `"UINT"`, `"BOOL"`, `"FLOAT"` are represented by native C types. `"BSTR"`s and `"TSTR"`s are represented by a proprietary `struct zcbor_string` which is just a `uint8_t` pointer and length. 153These types are decoded/encoded with C code that is not generated. 154More on this in the Architecture of the generated C code below. 155 156When a type is repeated (max_qty > 1 or max_qty > min_qty), there needs to be a distinction between repeated_foo() and foo() (these can be either encoding or decoding functions). 157repeated_foo() concerns itself with the individual value, while foo() concerns itself with the value including repetitions. 158 159When invoking CodeGenerator, the user must decide which types it will need direct access to decode/encode. 160These types are called "entry types" and they are typically the "outermost" types, or the types it is expected that the data will have. 161 162The user can also use entry types when there are `"BSTR"`s that are CBOR encoded, specified as `Foo = bstr .cbor Bar`. 163Usually such strings are automatically decoded/encoded by the generated code, and the objects part of the encompassing struct. 164However, if the user instead wants to manually decode/encode such strings, they can add them to `self.entry_types`. 165In this case, the strings will be stored as a regular `struct zcbor_string` instead of being decoded/encoded. 166 167CodeRenderer 168------------ 169 170CodeRenderer is a standalone class that takes the result of the CodeGenerator class and constructs files. 171There are up to 4 files constructed: 172 173- The C file with the decoding/encoding functions. 174- The H file with the public API to some functions in the C file. 175- The H file with all the struct definitions (the type file). If both decoding and encoding files are generated for the same CDDL, they can share the same type file. 176- An optional cmake file for building the generated code together with the zcbor C libraries. 177 178CodeRenderer conducts some pruning and deduplication of the list of types and functions received from CodeGenerator. 179 180 181Architecture of the generated C code 182==================================== 183 184In the generated C file, each type from the CDDL file gets its own decoding/encoding function, unless they are trivial types, like `Foo = uint`. 185These functions are all `static`. 186In addition, all entry types get public wrapper functions. 187 188All decoding/encoding functions operate on a state variable of type `zcbor_state_t` which keeps track of: 189 190- The current position in the payload, and the end of the payload. 191- The current position in a list or map, and the maximum expected number of elements. 192- A list of backup states, used for saving states so they can be restored if decoding/encoding fails while processing an optional element. 193 194Each function returns a `bool` indicating whether it was successful at decoding/encoding. 195In most cases, a failure in one function will result in a failure of the whole operation. 196 197However, in the following scenarios, a failure is fine because we don't know ahead of time whether the object will be found or not: 198 199- An object with unknown number of repetitions (`min_qty` and `max_qty` are not the same). 200- `"UNION"`s, since only one of the children should be present. 201 202In these cases, the code attempts to decode the object. If it fails, it restores the state to before the attempt, and then tries decoding the next candidate type. 203 204All generated functions take the form of a single if statement. 205This if statement performs boolean algebra on statements depending on the children (typically only container types get a generated function). 206The assignment of values in the structs mostly happens in the non-generated code. 207The generated code mostly combines and validates calls into the non-generated code or other generated functions. 208 209All functions (generated and not) have the same API structure: `bool <name>(zcbor_state_t *state, <type> *result)`. 210The number of arguments is kept to a minimum to reduce code size. 211 212The exceptions to the API structure are `zcbor_multi_decode`/`zcbor_multi_encode` and `zcbor_present_decode`/`zcbor_present_encode`. 213These functions accept function pointers with the above API and run them multiple times. 214When this happens, the function pointers are cast to a generic function pointer type, and processed without knowledge of the type. 215 216The non-generated files provide decoding/encoding functions for all the basic types except `"OTHER"`. 217There are also housekeeping functions for managing state and logging. 218This code is documented in the header files in the [include](include) folder. 219 220The C tests for the code generation can be found in the [tests/decode](tests/decode) and [tests/encode](tests/encode) folders. 221