1.. _logging_cs_stm:
2
3Multi-domain logging using ARM Coresight STM
4############################################
5
6The Arm CoreSight SoC-400 is a comprehensive library of components for the creation of debug and
7trace functionality within a system. The STM (System Trace Macrocell) is a trace source that
8is integrated into a CoreSight system, designed primarily for high-bandwidth trace of
9instrumentation embedded into software. This instrumentation is made up of memory-mapped writes to
10the STMESP (STM Extended Stimulus Port) peripheral registers. Multiple cores can share and directly
11access the STM without being aware of each other. Each core has 65536 stimulus ports (identical
12register sets) which can be independently accessed thus there is no need for locking when various
13contexts want to log the data.
14
15STM Extended Stimulus Port (STMESP)
16***********************************
17
18Local domain (each core) has access to its own set of STMESP peripherals. Each set has registers for writing data (with or without timestamp and with or without marking) and writing a flag. Each write
19to STMESP register results in the user data being encoded into MIPI System Trace Protocol v2 (STPv2).
20Hardware manages multiplexing of data from different STMESP register sets and from different cores
21by adding Major and Channel opcodes into the stream when necessary. Timestamp (from common source for all cores) and synchronization opcodes are autonomously added by the STM.
22
23Data stream generated by STM can end up in various sinks. For example it can be directly send to
24TPIU (Trace Port Interface Unit) or saved in the RAM Circular Buffer which is called ETR (Embedded
25Trace Router). TPIU is a 5 pin interface (4 data pins and clock) and external tool is required to
26capture the data (e.g. J-Trace PRO).
27When ETR is used then one core in the system is responsible for processing that data, for example
28dumping it over UART to the host.
29
30ARM Coresight Embedded Trace Router (ETR)
31*****************************************
32
33ETR is a circular RAM buffer to which tracing data is saved. Since it may contain data from
34various sources additional wrapping protocol is used to multiplex them. ``Coresight Trace
35Formatter`` is used for that. Data is encoded into 16 bytes frames which holds up to 15 byte
36of data. Circular RAM buffer size is limited to 4k thus there is a risk of data overflow. If
37overflow occurs then data is lost but due to presence of synchronization opcodes in the STPv2
38data stream synchronization can be restored.
39
40Data from ETR is handled on the device. It can be forwarded as-is to the host (e.g. using UART),
41where a host tool is decoding the data or data can be decoded on-chip to output the data in human readable format.
42
43Nordic Semiconductor NRF54H20 implementation instead of using ETR peripheral for getting
44buffer busyness is using own wrapper: Trace Buffer Monitor (TBM).
45
46STMESP Logging frontend
47***********************
48
49:ref:`log_frontend` API is implemented. Custom frontend assumes that all required operations are executed in the context of the logging macro call. It is also assumed that frontend provides the timestamp. Frontend utilizes a fact that there are multiple STMESP register sets and use multiple sets to avoid locking context. The atomically incremented channel counter is used to pick unique STMESP set and
50whole message is written to that STMESP. Limited pool of sets is used leaving most of channels for
51other purposes.
52
53Early logging
54=============
55
56STMESP cannot be used before STM or ETR/TPIU is setup. In order to support early logging,
57before infrastructure is ready a dedicated RAM buffer is used to which data is written. When
58STM infrastructure readiness is notified (:c:func:`log_frontend_stmesp_etr_ready`) then RAM buffer
59content is written to STMESP. Early logging is applicable only to the core which owns and configures
60Coresight infrastructure (e.g. in case of NRF54H20 it is the Secure core).
61
62Trace points
63============
64
65There may be cases when logging is too slow (even though it is very fast). For cases like that a dedicated ``trace point`` API is added. It is extremely fast because it is just a single
66write to one STMESP register. There are 2 functions in the API:
67
68* :c:func:`log_frontend_stmesp_tp` - It accepts single argument - index. Index is between
69  0 and 65280.
70* :c:func:`log_frontend_stmesp_tp_d32` - It accepts two arguments - index and user data.
71  Index is between 0 and 65280. User data is 32 bit word.
72
73It takes less than 100 ns to log a single trace point on NRF54H20 and it is approx. 7 times faster than the fastest log message.
74
75Logging using STM
76*****************
77
78STM logging has 2 modes of operations:
79
80* Dictionary-based - assisted mode which is using dictionary-based logging. In this mode logging
81  strings can be removed from the binary because they are only needed for decoding and decoding
82  is performed by the host tool. This mode takes less memory and is faster (2-3 times faster).
83  There is less data written so there is less chance of data overflow in ETR buffer.
84* Standalone - data is decoded on chip and human-readable strings are printed. This mode requires
85  more device resources and processing power.
86
87Following figure presents multi-domain logging using ARM Coresight STM.
88
89.. figure:: images/coresight_architecture.png
90
91Each core (local domain) is using logging frontend which writes logging data to STMESP registers.
92
93If TPIU is used as the output then there is no more software components needed. STPv2 stream
94of data is composed by STM and send over TPIU.
95
96If ETR RAM buffer is used then buffer owner core (``proxy``) is responsible for handling that data.
97If dictionary-based logging is used then proxy just sends data as-is over UART.
98If standalone logging is used then proxy is using :ref:`cs_trace_defmt` and
99:ref:`mipi_stp_decoder` to decode data and demultiplex messages. Messages are formatted to human
100readable strings using logging :ref:`log_output`.
101
102Dictionary-based logging
103========================
104
105Assisted multicore logging uses dictionary-based logging to send messages without redundant strings
106to STM, and is based on the :ref:`logging_guide_dictionary` feature of the logging API provided by
107Zephyr.
108Instead of including the format strings in the log messages, it logs the addresses (message IDs)
109where the strings are stored, which reduces the size of the logging subsystem.
110If the data goes to the ETR buffer, the proxy core's responsibility is to dump this data.
111The host PC, equipped with a decoder tool, translates these addresses back into human-readable text
112using a JSON database generated during the build process.
113
114When using logs, this method has the following advantages:
115
116* It reduces the size of the binary, as the strings used in the log messages are not stored in the binary itself. Logging infrastructure is also very limited. On local domains it is just the frontend. There is even no need to have string formatting function.
117* It reduces the amount of data that needs to be sent to and processed by the application core, as the string formatting is offloaded to the host side.
118* Logging is fast. It takes less than 1 us to log a simple message (up to 2 arguments) on NRF54H20.
119
120Proxy core is using Nordic specific peripheral (TBM) to get ETR buffer busyness and send data over
121UART. Nordic specific driver for ETR buffer is located in
122:zephyr_file:`drivers/misc/coresight/nrf_etr.c`.
123
124Configuration
125-------------
126
127For Nordic SoC a dedicated snippet (:ref:`nordic-log-stm-dict`) shall be used to enable logging.
128Each core shall be build with that snippet. If any core wants to use it then application core is
129required to have it enabled too since it acts as the proxy (ETR buffer processing). All cores must use the same configuration of the logging.
130
131Reading the logs
132----------------
133
134To read the dictionary-based STM log output, do the following:
135
1361. Set up the log capture.
137
138   Use the ``nrfutil trace stm`` command to start capturing logs from the device, specifying the database configuration for each domain ID, as well as the serial port, the baud rate, and the output file name::
139
140      nrfutil trace stm --database-config 34:build/zephyr/log_dictionary.json,35:build_rad/zephyr/log_dictionary.json --input-serialport /dev/ttyACM1 --baudrate 115200 --output-ascii out.txt
141
142#. Capture and decode the logs.
143
144   nrfutil will capture the log data from the specified UART port and use the provided dictionary databases to decode the logs into a human-readable format.
145   The decoded logs will be saved in the specified output file (the :file:`out.txt` file in the previous example).
146
147#. Open the output file to review the decoded log messages.
148
149   The file will contain timestamps and the log messages in a format that is human-readable.
150
151If the log capture fails to find a sync, rerun the capture process.
152
153.. note::
154   Decoding artifacts or incorrect timestamps might occur when rerunning the process.
155
156Each log line contains a domain-related or core-related prefix between the log level and the module name, indicating the core that generated the log entry.
157The following are the prefixes used to indicate the cores:
158
159.. csv-table:: nRF54H20 log prefixes
160   :header: "Core", "Prefix", "ID"
161
162   Secure Domain, ``sec``, 0x21
163   Application core, ``app``, 0x22
164   Radio core, ``rad``, 0x23
165   System Controller (SysCtrl), ``sys``, 0x2c
166   Fast Lightweight Processor (FLPR), ``flpr``, 0x2d
167   Peripheral Processor (PPR), ``ppr``, 0x2e
168    , ``mod``, 0x24
169
170Stand-alone logging
171===================
172
173Frontend is writing to STMESP registers. Message format is aligned with the on chip decoder
174in :zephyr_file:`subsys/logging/frontends/log_frontend_stmesp_demux.c`.
175
176``Proxy`` is using Nordic specific peripheral (TBM) to get ETR buffer busyness and read and
177decode data and send human-readable data over UART. Nordic specific driver for ETR buffer is
178located in :zephyr_file:`drivers/misc/coresight/nrf_etr.c`. It is using :ref:`cs_trace_defmt` and
179:ref:`mipi_stp_decoder` and above-mentioned demultiplexer to decode messages.
180
181Logging messages contains read-only format string used in the logging macros thus they cannot be
182removed from the binary. This mode uses more read-only memory and has lower throughput because
183messages are longer and processing is time consuming. Compared to the dictionary-based mode logging
184is 2-3 slower.
185
186Configuration
187-------------
188
189For Nordic SoC a dedicated snippet (:ref:`nordic-log-stm`) shall be used to enable logging.
190Each core shall be build with that snippet. If any core wants to use it then application core is
191required to have it enabled too since it acts as the proxy (ETR buffer processing). All cores must use the same configuration of the logging.
192
193Reading the logs
194----------------
195
196Logs are printed using Console UART.
197
198.. note::
199   To use UART in your application, the UART's node must be described in devicetree.
200   For more details, see :ref:`devicetree-intro`.
201
202The following is an example log output::
203
204   [00:00:00.154,790] <inf> app/spsc_pbuf: alloc in 0x2f0df800
205   [00:00:00.163,319] <inf> app/spsc_pbuf: alloc 0x2f0df800 wr_idx:20
206   [00:00:00.181,112] <inf> app/spsc_pbuf: commit in 0x2f0df800
207   [00:00:00.189,090] <inf> app/spsc_pbuf: commit 0x2f0df800, len:20 wr_idx: 44
208   [00:00:00.202,577] <inf> rad/icmsg: mbox_callback
209   [00:00:00.214,750] <inf> rad/spsc_pbuf: claim 0x2f0df800 rd_idx:20
210   [00:00:00.235,823] <inf> rad/spsc_pbuf: free 0x2f0df800 len:20 rd_idx: 44
211   [00:00:00.244,507] <inf> rad/spsc_pbuf: read done 0x2f0df800 len:20
212   [00:00:00.272,444] <inf> rad/host: ep recv 0x330021f0, len:20
213   [00:00:00.283,939] <inf> rad/host: rx:00 exp:00
214   [00:00:00.292,200] <inf> rad/icmsg: read 0
215   [00:00:05.077,026] <inf> rad/spsc_pbuf: alloc in 0x2f0df000
216   [00:00:05.077,068] <inf> rad/spsc_pbuf: alloc 0x2f0df000 wr_idx:44
217   [00:00:05.077,098] <inf> rad/spsc_pbuf: commit in 0x2f0df000
218   [00:00:05.077,134] <inf> rad/spsc_pbuf: commit 0x2f0df000, len:20 wr_idx
219
220Each log line contains a domain-related or core-related prefix between the log level and the module name, indicating the core that generated the log entry.
221The following are the prefixes used to indicate the cores:
222
223.. csv-table:: nRF54H20 log prefixes
224   :header: "Core", "Prefix"
225
226   Secure Domain, ``sec``
227   Application core, ``app``
228   Radio core, ``rad``
229   System Controller (SysCtrl), ``sys``
230   Fast Lightweight Processor (FLPR), ``flpr``
231   Peripheral Processor (PPR), ``ppr``
232
233Additional considerations
234=========================
235
236When using STM logging, consider the following:
237
238* Use optimized log macros (having up to 2 word size numeric arguments, like ``LOG_INF("%d %c", (int)x, (char)y)``) to improve the size and speed of logging.
239* For memory constrained applications (for example, when running on the PPR core), disable the ``printk()`` function by setting both the :kconfig:option:`CONFIG_PRINTK` and :kconfig:option:`CONFIG_BOOT_BANNER` Kconfig options to ``n`` in your project configuration.
240* When working with multiple domains, such as the Radio core and Application core, ensure that each database is prefixed with the correct domain ID.
241* Some log messages might be dropped due to the limited size of the RAM buffer that stores STM logs.
242