1.. _logging_cs_stm: 2 3Multi-domain logging using ARM Coresight STM 4############################################ 5 6The Arm CoreSight SoC-400 is a comprehensive library of components for the creation of debug and 7trace functionality within a system. The STM (System Trace Macrocell) is a trace source that 8is integrated into a CoreSight system, designed primarily for high-bandwidth trace of 9instrumentation embedded into software. This instrumentation is made up of memory-mapped writes to 10the STMESP (STM Extended Stimulus Port) peripheral registers. Multiple cores can share and directly 11access the STM without being aware of each other. Each core has 65536 stimulus ports (identical 12register sets) which can be independently accessed thus there is no need for locking when various 13contexts want to log the data. 14 15STM Extended Stimulus Port (STMESP) 16*********************************** 17 18Local domain (each core) has access to its own set of STMESP peripherals. Each set has registers for writing data (with or without timestamp and with or without marking) and writing a flag. Each write 19to STMESP register results in the user data being encoded into MIPI System Trace Protocol v2 (STPv2). 20Hardware manages multiplexing of data from different STMESP register sets and from different cores 21by adding Major and Channel opcodes into the stream when necessary. Timestamp (from common source for all cores) and synchronization opcodes are autonomously added by the STM. 22 23Data stream generated by STM can end up in various sinks. For example it can be directly send to 24TPIU (Trace Port Interface Unit) or saved in the RAM Circular Buffer which is called ETR (Embedded 25Trace Router). TPIU is a 5 pin interface (4 data pins and clock) and external tool is required to 26capture the data (e.g. J-Trace PRO). 27When ETR is used then one core in the system is responsible for processing that data, for example 28dumping it over UART to the host. 29 30ARM Coresight Embedded Trace Router (ETR) 31***************************************** 32 33ETR is a circular RAM buffer to which tracing data is saved. Since it may contain data from 34various sources additional wrapping protocol is used to multiplex them. ``Coresight Trace 35Formatter`` is used for that. Data is encoded into 16 bytes frames which holds up to 15 byte 36of data. Circular RAM buffer size is limited to 4k thus there is a risk of data overflow. If 37overflow occurs then data is lost but due to presence of synchronization opcodes in the STPv2 38data stream synchronization can be restored. 39 40Data from ETR is handled on the device. It can be forwarded as-is to the host (e.g. using UART), 41where a host tool is decoding the data or data can be decoded on-chip to output the data in human readable format. 42 43Nordic Semiconductor NRF54H20 implementation instead of using ETR peripheral for getting 44buffer busyness is using own wrapper: Trace Buffer Monitor (TBM). 45 46STMESP Logging frontend 47*********************** 48 49:ref:`log_frontend` API is implemented. Custom frontend assumes that all required operations are executed in the context of the logging macro call. It is also assumed that frontend provides the timestamp. Frontend utilizes a fact that there are multiple STMESP register sets and use multiple sets to avoid locking context. The atomically incremented channel counter is used to pick unique STMESP set and 50whole message is written to that STMESP. Limited pool of sets is used leaving most of channels for 51other purposes. 52 53Early logging 54============= 55 56STMESP cannot be used before STM or ETR/TPIU is setup. In order to support early logging, 57before infrastructure is ready a dedicated RAM buffer is used to which data is written. When 58STM infrastructure readiness is notified (:c:func:`log_frontend_stmesp_etr_ready`) then RAM buffer 59content is written to STMESP. Early logging is applicable only to the core which owns and configures 60Coresight infrastructure (e.g. in case of NRF54H20 it is the Secure core). 61 62Trace points 63============ 64 65There may be cases when logging is too slow (even though it is very fast). For cases like that a dedicated ``trace point`` API is added. It is extremely fast because it is just a single 66write to one STMESP register. There are 2 functions in the API: 67 68* :c:func:`log_frontend_stmesp_tp` - It accepts single argument - index. Index is between 69 0 and 65280. 70* :c:func:`log_frontend_stmesp_tp_d32` - It accepts two arguments - index and user data. 71 Index is between 0 and 65280. User data is 32 bit word. 72 73It takes less than 100 ns to log a single trace point on NRF54H20 and it is approx. 7 times faster than the fastest log message. 74 75Logging using STM 76***************** 77 78STM logging has 2 modes of operations: 79 80* Dictionary-based - assisted mode which is using dictionary-based logging. In this mode logging 81 strings can be removed from the binary because they are only needed for decoding and decoding 82 is performed by the host tool. This mode takes less memory and is faster (2-3 times faster). 83 There is less data written so there is less chance of data overflow in ETR buffer. 84* Standalone - data is decoded on chip and human-readable strings are printed. This mode requires 85 more device resources and processing power. 86 87Following figure presents multi-domain logging using ARM Coresight STM. 88 89.. figure:: images/coresight_architecture.png 90 91Each core (local domain) is using logging frontend which writes logging data to STMESP registers. 92 93If TPIU is used as the output then there is no more software components needed. STPv2 stream 94of data is composed by STM and send over TPIU. 95 96If ETR RAM buffer is used then buffer owner core (``proxy``) is responsible for handling that data. 97If dictionary-based logging is used then proxy just sends data as-is over UART. 98If standalone logging is used then proxy is using :ref:`cs_trace_defmt` and 99:ref:`mipi_stp_decoder` to decode data and demultiplex messages. Messages are formatted to human 100readable strings using logging :ref:`log_output`. 101 102Dictionary-based logging 103======================== 104 105Assisted multicore logging uses dictionary-based logging to send messages without redundant strings 106to STM, and is based on the :ref:`logging_guide_dictionary` feature of the logging API provided by 107Zephyr. 108Instead of including the format strings in the log messages, it logs the addresses (message IDs) 109where the strings are stored, which reduces the size of the logging subsystem. 110If the data goes to the ETR buffer, the proxy core's responsibility is to dump this data. 111The host PC, equipped with a decoder tool, translates these addresses back into human-readable text 112using a JSON database generated during the build process. 113 114When using logs, this method has the following advantages: 115 116* It reduces the size of the binary, as the strings used in the log messages are not stored in the binary itself. Logging infrastructure is also very limited. On local domains it is just the frontend. There is even no need to have string formatting function. 117* It reduces the amount of data that needs to be sent to and processed by the application core, as the string formatting is offloaded to the host side. 118* Logging is fast. It takes less than 1 us to log a simple message (up to 2 arguments) on NRF54H20. 119 120Proxy core is using Nordic specific peripheral (TBM) to get ETR buffer busyness and send data over 121UART. Nordic specific driver for ETR buffer is located in 122:zephyr_file:`drivers/misc/coresight/nrf_etr.c`. 123 124Configuration 125------------- 126 127For Nordic SoC a dedicated snippet (:ref:`nordic-log-stm-dict`) shall be used to enable logging. 128Each core shall be build with that snippet. If any core wants to use it then application core is 129required to have it enabled too since it acts as the proxy (ETR buffer processing). All cores must use the same configuration of the logging. 130 131Reading the logs 132---------------- 133 134To read the dictionary-based STM log output, do the following: 135 1361. Set up the log capture. 137 138 Use the ``nrfutil trace stm`` command to start capturing logs from the device, specifying the database configuration for each domain ID, as well as the serial port, the baud rate, and the output file name:: 139 140 nrfutil trace stm --database-config 34:build/zephyr/log_dictionary.json,35:build_rad/zephyr/log_dictionary.json --input-serialport /dev/ttyACM1 --baudrate 115200 --output-ascii out.txt 141 142#. Capture and decode the logs. 143 144 nrfutil will capture the log data from the specified UART port and use the provided dictionary databases to decode the logs into a human-readable format. 145 The decoded logs will be saved in the specified output file (the :file:`out.txt` file in the previous example). 146 147#. Open the output file to review the decoded log messages. 148 149 The file will contain timestamps and the log messages in a format that is human-readable. 150 151If the log capture fails to find a sync, rerun the capture process. 152 153.. note:: 154 Decoding artifacts or incorrect timestamps might occur when rerunning the process. 155 156Each log line contains a domain-related or core-related prefix between the log level and the module name, indicating the core that generated the log entry. 157The following are the prefixes used to indicate the cores: 158 159.. csv-table:: nRF54H20 log prefixes 160 :header: "Core", "Prefix", "ID" 161 162 Secure Domain, ``sec``, 0x21 163 Application core, ``app``, 0x22 164 Radio core, ``rad``, 0x23 165 System Controller (SysCtrl), ``sys``, 0x2c 166 Fast Lightweight Processor (FLPR), ``flpr``, 0x2d 167 Peripheral Processor (PPR), ``ppr``, 0x2e 168 , ``mod``, 0x24 169 170Stand-alone logging 171=================== 172 173Frontend is writing to STMESP registers. Message format is aligned with the on chip decoder 174in :zephyr_file:`subsys/logging/frontends/log_frontend_stmesp_demux.c`. 175 176``Proxy`` is using Nordic specific peripheral (TBM) to get ETR buffer busyness and read and 177decode data and send human-readable data over UART. Nordic specific driver for ETR buffer is 178located in :zephyr_file:`drivers/misc/coresight/nrf_etr.c`. It is using :ref:`cs_trace_defmt` and 179:ref:`mipi_stp_decoder` and above-mentioned demultiplexer to decode messages. 180 181Logging messages contains read-only format string used in the logging macros thus they cannot be 182removed from the binary. This mode uses more read-only memory and has lower throughput because 183messages are longer and processing is time consuming. Compared to the dictionary-based mode logging 184is 2-3 slower. 185 186Configuration 187------------- 188 189For Nordic SoC a dedicated snippet (:ref:`nordic-log-stm`) shall be used to enable logging. 190Each core shall be build with that snippet. If any core wants to use it then application core is 191required to have it enabled too since it acts as the proxy (ETR buffer processing). All cores must use the same configuration of the logging. 192 193Reading the logs 194---------------- 195 196Logs are printed using Console UART. 197 198.. note:: 199 To use UART in your application, the UART's node must be described in devicetree. 200 For more details, see :ref:`devicetree-intro`. 201 202The following is an example log output:: 203 204 [00:00:00.154,790] <inf> app/spsc_pbuf: alloc in 0x2f0df800 205 [00:00:00.163,319] <inf> app/spsc_pbuf: alloc 0x2f0df800 wr_idx:20 206 [00:00:00.181,112] <inf> app/spsc_pbuf: commit in 0x2f0df800 207 [00:00:00.189,090] <inf> app/spsc_pbuf: commit 0x2f0df800, len:20 wr_idx: 44 208 [00:00:00.202,577] <inf> rad/icmsg: mbox_callback 209 [00:00:00.214,750] <inf> rad/spsc_pbuf: claim 0x2f0df800 rd_idx:20 210 [00:00:00.235,823] <inf> rad/spsc_pbuf: free 0x2f0df800 len:20 rd_idx: 44 211 [00:00:00.244,507] <inf> rad/spsc_pbuf: read done 0x2f0df800 len:20 212 [00:00:00.272,444] <inf> rad/host: ep recv 0x330021f0, len:20 213 [00:00:00.283,939] <inf> rad/host: rx:00 exp:00 214 [00:00:00.292,200] <inf> rad/icmsg: read 0 215 [00:00:05.077,026] <inf> rad/spsc_pbuf: alloc in 0x2f0df000 216 [00:00:05.077,068] <inf> rad/spsc_pbuf: alloc 0x2f0df000 wr_idx:44 217 [00:00:05.077,098] <inf> rad/spsc_pbuf: commit in 0x2f0df000 218 [00:00:05.077,134] <inf> rad/spsc_pbuf: commit 0x2f0df000, len:20 wr_idx 219 220Each log line contains a domain-related or core-related prefix between the log level and the module name, indicating the core that generated the log entry. 221The following are the prefixes used to indicate the cores: 222 223.. csv-table:: nRF54H20 log prefixes 224 :header: "Core", "Prefix" 225 226 Secure Domain, ``sec`` 227 Application core, ``app`` 228 Radio core, ``rad`` 229 System Controller (SysCtrl), ``sys`` 230 Fast Lightweight Processor (FLPR), ``flpr`` 231 Peripheral Processor (PPR), ``ppr`` 232 233Additional considerations 234========================= 235 236When using STM logging, consider the following: 237 238* Use optimized log macros (having up to 2 word size numeric arguments, like ``LOG_INF("%d %c", (int)x, (char)y)``) to improve the size and speed of logging. 239* For memory constrained applications (for example, when running on the PPR core), disable the ``printk()`` function by setting both the :kconfig:option:`CONFIG_PRINTK` and :kconfig:option:`CONFIG_BOOT_BANNER` Kconfig options to ``n`` in your project configuration. 240* When working with multiple domains, such as the Radio core and Application core, ensure that each database is prefixed with the correct domain ID. 241* Some log messages might be dropped due to the limited size of the RAM buffer that stores STM logs. 242