1 Coresight - HW Assisted Tracing on ARM 2 ====================================== 3 4 Author: Mathieu Poirier <mathieu.poirier@linaro.org> 5 Date: September 11th, 2014 6 7Introduction 8------------ 9 10Coresight is an umbrella of technologies allowing for the debugging of ARM 11based SoC. It includes solutions for JTAG and HW assisted tracing. This 12document is concerned with the latter. 13 14HW assisted tracing is becoming increasingly useful when dealing with systems 15that have many SoCs and other components like GPU and DMA engines. ARM has 16developed a HW assisted tracing solution by means of different components, each 17being added to a design at synthesis time to cater to specific tracing needs. 18Components are generally categorised as source, link and sinks and are 19(usually) discovered using the AMBA bus. 20 21"Sources" generate a compressed stream representing the processor instruction 22path based on tracing scenarios as configured by users. From there the stream 23flows through the coresight system (via ATB bus) using links that are connecting 24the emanating source to a sink(s). Sinks serve as endpoints to the coresight 25implementation, either storing the compressed stream in a memory buffer or 26creating an interface to the outside world where data can be transferred to a 27host without fear of filling up the onboard coresight memory buffer. 28 29At typical coresight system would look like this: 30 31 ***************************************************************** 32 **************************** AMBA AXI ****************************===|| 33 ***************************************************************** || 34 ^ ^ | || 35 | | * ** 36 0000000 ::::: 0000000 ::::: ::::: @@@@@@@ |||||||||||| 37 0 CPU 0<-->: C : 0 CPU 0<-->: C : : C : @ STM @ || System || 38 |->0000000 : T : |->0000000 : T : : T :<--->@@@@@ || Memory || 39 | #######<-->: I : | #######<-->: I : : I : @@@<-| |||||||||||| 40 | # ETM # ::::: | # PTM # ::::: ::::: @ | 41 | ##### ^ ^ | ##### ^ ! ^ ! . | ||||||||| 42 | |->### | ! | |->### | ! | ! . | || DAP || 43 | | # | ! | | # | ! | ! . | ||||||||| 44 | | . | ! | | . | ! | ! . | | | 45 | | . | ! | | . | ! | ! . | | * 46 | | . | ! | | . | ! | ! . | | SWD/ 47 | | . | ! | | . | ! | ! . | | JTAG 48 *****************************************************************<-| 49 *************************** AMBA Debug APB ************************ 50 ***************************************************************** 51 | . ! . ! ! . | 52 | . * . * * . | 53 ***************************************************************** 54 ******************** Cross Trigger Matrix (CTM) ******************* 55 ***************************************************************** 56 | . ^ . . | 57 | * ! * * | 58 ***************************************************************** 59 ****************** AMBA Advanced Trace Bus (ATB) ****************** 60 ***************************************************************** 61 | ! =============== | 62 | * ===== F =====<---------| 63 | ::::::::: ==== U ==== 64 |-->:: CTI ::<!! === N === 65 | ::::::::: ! == N == 66 | ^ * == E == 67 | ! &&&&&&&&& IIIIIII == L == 68 |------>&& ETB &&<......II I ======= 69 | ! &&&&&&&&& II I . 70 | ! I I . 71 | ! I REP I<.......... 72 | ! I I 73 | !!>&&&&&&&&& II I *Source: ARM ltd. 74 |------>& TPIU &<......II I DAP = Debug Access Port 75 &&&&&&&&& IIIIIII ETM = Embedded Trace Macrocell 76 ; PTM = Program Trace Macrocell 77 ; CTI = Cross Trigger Interface 78 * ETB = Embedded Trace Buffer 79 To trace port TPIU= Trace Port Interface Unit 80 SWD = Serial Wire Debug 81 82While on target configuration of the components is done via the APB bus, 83all trace data are carried out-of-band on the ATB bus. The CTM provides 84a way to aggregate and distribute signals between CoreSight components. 85 86The coresight framework provides a central point to represent, configure and 87manage coresight devices on a platform. This first implementation centers on 88the basic tracing functionality, enabling components such ETM/PTM, funnel, 89replicator, TMC, TPIU and ETB. Future work will enable more 90intricate IP blocks such as STM and CTI. 91 92 93Acronyms and Classification 94--------------------------- 95 96Acronyms: 97 98PTM: Program Trace Macrocell 99ETM: Embedded Trace Macrocell 100STM: System trace Macrocell 101ETB: Embedded Trace Buffer 102ITM: Instrumentation Trace Macrocell 103TPIU: Trace Port Interface Unit 104TMC-ETR: Trace Memory Controller, configured as Embedded Trace Router 105TMC-ETF: Trace Memory Controller, configured as Embedded Trace FIFO 106CTI: Cross Trigger Interface 107 108Classification: 109 110Source: 111 ETMv3.x ETMv4, PTMv1.0, PTMv1.1, STM, STM500, ITM 112Link: 113 Funnel, replicator (intelligent or not), TMC-ETR 114Sinks: 115 ETBv1.0, ETB1.1, TPIU, TMC-ETF 116Misc: 117 CTI 118 119 120Device Tree Bindings 121---------------------- 122 123See Documentation/devicetree/bindings/arm/coresight.txt for details. 124 125As of this writing drivers for ITM, STMs and CTIs are not provided but are 126expected to be added as the solution matures. 127 128 129Framework and implementation 130---------------------------- 131 132The coresight framework provides a central point to represent, configure and 133manage coresight devices on a platform. Any coresight compliant device can 134register with the framework for as long as they use the right APIs: 135 136struct coresight_device *coresight_register(struct coresight_desc *desc); 137void coresight_unregister(struct coresight_device *csdev); 138 139The registering function is taking a "struct coresight_device *csdev" and 140register the device with the core framework. The unregister function takes 141a reference to a "struct coresight_device", obtained at registration time. 142 143If everything goes well during the registration process the new devices will 144show up under /sys/bus/coresight/devices, as showns here for a TC2 platform: 145 146root:~# ls /sys/bus/coresight/devices/ 147replicator 20030000.tpiu 2201c000.ptm 2203c000.etm 2203e000.etm 14820010000.etb 20040000.funnel 2201d000.ptm 2203d000.etm 149root:~# 150 151The functions take a "struct coresight_device", which looks like this: 152 153struct coresight_desc { 154 enum coresight_dev_type type; 155 struct coresight_dev_subtype subtype; 156 const struct coresight_ops *ops; 157 struct coresight_platform_data *pdata; 158 struct device *dev; 159 const struct attribute_group **groups; 160}; 161 162 163The "coresight_dev_type" identifies what the device is, i.e, source link or 164sink while the "coresight_dev_subtype" will characterise that type further. 165 166The "struct coresight_ops" is mandatory and will tell the framework how to 167perform base operations related to the components, each component having 168a different set of requirement. For that "struct coresight_ops_sink", 169"struct coresight_ops_link" and "struct coresight_ops_source" have been 170provided. 171 172The next field, "struct coresight_platform_data *pdata" is acquired by calling 173"of_get_coresight_platform_data()", as part of the driver's _probe routine and 174"struct device *dev" gets the device reference embedded in the "amba_device": 175 176static int etm_probe(struct amba_device *adev, const struct amba_id *id) 177{ 178 ... 179 ... 180 drvdata->dev = &adev->dev; 181 ... 182} 183 184Specific class of device (source, link, or sink) have generic operations 185that can be performed on them (see "struct coresight_ops"). The 186"**groups" is a list of sysfs entries pertaining to operations 187specific to that component only. "Implementation defined" customisations are 188expected to be accessed and controlled using those entries. 189 190 191How to use the tracer modules 192----------------------------- 193 194There are two ways to use the Coresight framework: 1) using the perf cmd line 195tools and 2) interacting directly with the Coresight devices using the sysFS 196interface. Preference is given to the former as using the sysFS interface 197requires a deep understanding of the Coresight HW. The following sections 198provide details on using both methods. 199 2001) Using the sysFS interface: 201 202Before trace collection can start, a coresight sink needs to be identified. 203There is no limit on the amount of sinks (nor sources) that can be enabled at 204any given moment. As a generic operation, all device pertaining to the sink 205class will have an "active" entry in sysfs: 206 207root:/sys/bus/coresight/devices# ls 208replicator 20030000.tpiu 2201c000.ptm 2203c000.etm 2203e000.etm 20920010000.etb 20040000.funnel 2201d000.ptm 2203d000.etm 210root:/sys/bus/coresight/devices# ls 20010000.etb 211enable_sink status trigger_cntr 212root:/sys/bus/coresight/devices# echo 1 > 20010000.etb/enable_sink 213root:/sys/bus/coresight/devices# cat 20010000.etb/enable_sink 2141 215root:/sys/bus/coresight/devices# 216 217At boot time the current etm3x driver will configure the first address 218comparator with "_stext" and "_etext", essentially tracing any instruction 219that falls within that range. As such "enabling" a source will immediately 220trigger a trace capture: 221 222root:/sys/bus/coresight/devices# echo 1 > 2201c000.ptm/enable_source 223root:/sys/bus/coresight/devices# cat 2201c000.ptm/enable_source 2241 225root:/sys/bus/coresight/devices# cat 20010000.etb/status 226Depth: 0x2000 227Status: 0x1 228RAM read ptr: 0x0 229RAM wrt ptr: 0x19d3 <----- The write pointer is moving 230Trigger cnt: 0x0 231Control: 0x1 232Flush status: 0x0 233Flush ctrl: 0x2001 234root:/sys/bus/coresight/devices# 235 236Trace collection is stopped the same way: 237 238root:/sys/bus/coresight/devices# echo 0 > 2201c000.ptm/enable_source 239root:/sys/bus/coresight/devices# 240 241The content of the ETB buffer can be harvested directly from /dev: 242 243root:/sys/bus/coresight/devices# dd if=/dev/20010000.etb \ 244of=~/cstrace.bin 245 24664+0 records in 24764+0 records out 24832768 bytes (33 kB) copied, 0.00125258 s, 26.2 MB/s 249root:/sys/bus/coresight/devices# 250 251The file cstrace.bin can be decompressed using "ptm2human", DS-5 or Trace32. 252 253Following is a DS-5 output of an experimental loop that increments a variable up 254to a certain value. The example is simple and yet provides a glimpse of the 255wealth of possibilities that coresight provides. 256 257Info Tracing enabled 258Instruction 106378866 0x8026B53C E52DE004 false PUSH {lr} 259Instruction 0 0x8026B540 E24DD00C false SUB sp,sp,#0xc 260Instruction 0 0x8026B544 E3A03000 false MOV r3,#0 261Instruction 0 0x8026B548 E58D3004 false STR r3,[sp,#4] 262Instruction 0 0x8026B54C E59D3004 false LDR r3,[sp,#4] 263Instruction 0 0x8026B550 E3530004 false CMP r3,#4 264Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 265Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] 266Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c 267Timestamp Timestamp: 17106715833 268Instruction 319 0x8026B54C E59D3004 false LDR r3,[sp,#4] 269Instruction 0 0x8026B550 E3530004 false CMP r3,#4 270Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 271Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] 272Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c 273Instruction 9 0x8026B54C E59D3004 false LDR r3,[sp,#4] 274Instruction 0 0x8026B550 E3530004 false CMP r3,#4 275Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 276Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] 277Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c 278Instruction 7 0x8026B54C E59D3004 false LDR r3,[sp,#4] 279Instruction 0 0x8026B550 E3530004 false CMP r3,#4 280Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 281Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] 282Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c 283Instruction 7 0x8026B54C E59D3004 false LDR r3,[sp,#4] 284Instruction 0 0x8026B550 E3530004 false CMP r3,#4 285Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 286Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] 287Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c 288Instruction 10 0x8026B54C E59D3004 false LDR r3,[sp,#4] 289Instruction 0 0x8026B550 E3530004 false CMP r3,#4 290Instruction 0 0x8026B554 E2833001 false ADD r3,r3,#1 291Instruction 0 0x8026B558 E58D3004 false STR r3,[sp,#4] 292Instruction 0 0x8026B55C DAFFFFFA true BLE {pc}-0x10 ; 0x8026b54c 293Instruction 6 0x8026B560 EE1D3F30 false MRC p15,#0x0,r3,c13,c0,#1 294Instruction 0 0x8026B564 E1A0100D false MOV r1,sp 295Instruction 0 0x8026B568 E3C12D7F false BIC r2,r1,#0x1fc0 296Instruction 0 0x8026B56C E3C2203F false BIC r2,r2,#0x3f 297Instruction 0 0x8026B570 E59D1004 false LDR r1,[sp,#4] 298Instruction 0 0x8026B574 E59F0010 false LDR r0,[pc,#16] ; [0x8026B58C] = 0x80550368 299Instruction 0 0x8026B578 E592200C false LDR r2,[r2,#0xc] 300Instruction 0 0x8026B57C E59221D0 false LDR r2,[r2,#0x1d0] 301Instruction 0 0x8026B580 EB07A4CF true BL {pc}+0x1e9344 ; 0x804548c4 302Info Tracing enabled 303Instruction 13570831 0x8026B584 E28DD00C false ADD sp,sp,#0xc 304Instruction 0 0x8026B588 E8BD8000 true LDM sp!,{pc} 305Timestamp Timestamp: 17107041535 306 3072) Using perf framework: 308 309Coresight tracers are represented using the Perf framework's Performance 310Monitoring Unit (PMU) abstraction. As such the perf framework takes charge of 311controlling when tracing gets enabled based on when the process of interest is 312scheduled. When configured in a system, Coresight PMUs will be listed when 313queried by the perf command line tool: 314 315 linaro@linaro-nano:~$ ./perf list pmu 316 317 List of pre-defined events (to be used in -e): 318 319 cs_etm// [Kernel PMU event] 320 321 linaro@linaro-nano:~$ 322 323Regardless of the number of tracers available in a system (usually equal to the 324amount of processor cores), the "cs_etm" PMU will be listed only once. 325 326A Coresight PMU works the same way as any other PMU, i.e the name of the PMU is 327listed along with configuration options within forward slashes '/'. Since a 328Coresight system will typically have more than one sink, the name of the sink to 329work with needs to be specified as an event option. Names for sink to choose 330from are listed in sysFS under ($SYSFS)/bus/coresight/devices: 331 332 root@linaro-nano:~# ls /sys/bus/coresight/devices/ 333 20010000.etf 20040000.funnel 20100000.stm 22040000.etm 334 22140000.etm 230c0000.funnel 23240000.etm 20030000.tpiu 335 20070000.etr 20120000.replicator 220c0000.funnel 336 23040000.etm 23140000.etm 23340000.etm 337 338 root@linaro-nano:~# perf record -e cs_etm/@20070000.etr/u --per-thread program 339 340The syntax within the forward slashes '/' is important. The '@' character 341tells the parser that a sink is about to be specified and that this is the sink 342to use for the trace session. 343 344More information on the above and other example on how to use Coresight with 345the perf tools can be found in the "HOWTO.md" file of the openCSD gitHub 346repository [3]. 347 3482.1) AutoFDO analysis using the perf tools: 349 350perf can be used to record and analyze trace of programs. 351 352Execution can be recorded using 'perf record' with the cs_etm event, 353specifying the name of the sink to record to, e.g: 354 355 perf record -e cs_etm/@20070000.etr/u --per-thread 356 357The 'perf report' and 'perf script' commands can be used to analyze execution, 358synthesizing instruction and branch events from the instruction trace. 359'perf inject' can be used to replace the trace data with the synthesized events. 360The --itrace option controls the type and frequency of synthesized events 361(see perf documentation). 362 363Note that only 64-bit programs are currently supported - further work is 364required to support instruction decode of 32-bit Arm programs. 365 366 367Generating coverage files for Feedback Directed Optimization: AutoFDO 368--------------------------------------------------------------------- 369 370'perf inject' accepts the --itrace option in which case tracing data is 371removed and replaced with the synthesized events. e.g. 372 373 perf inject --itrace --strip -i perf.data -o perf.data.new 374 375Below is an example of using ARM ETM for autoFDO. It requires autofdo 376(https://github.com/google/autofdo) and gcc version 5. The bubble 377sort example is from the AutoFDO tutorial (https://gcc.gnu.org/wiki/AutoFDO/Tutorial). 378 379 $ gcc-5 -O3 sort.c -o sort 380 $ taskset -c 2 ./sort 381 Bubble sorting array of 30000 elements 382 5910 ms 383 384 $ perf record -e cs_etm/@20070000.etr/u --per-thread taskset -c 2 ./sort 385 Bubble sorting array of 30000 elements 386 12543 ms 387 [ perf record: Woken up 35 times to write data ] 388 [ perf record: Captured and wrote 69.640 MB perf.data ] 389 390 $ perf inject -i perf.data -o inj.data --itrace=il64 --strip 391 $ create_gcov --binary=./sort --profile=inj.data --gcov=sort.gcov -gcov_version=1 392 $ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo 393 $ taskset -c 2 ./sort_autofdo 394 Bubble sorting array of 30000 elements 395 5806 ms 396 397 398How to use the STM module 399------------------------- 400 401Using the System Trace Macrocell module is the same as the tracers - the only 402difference is that clients are driving the trace capture rather 403than the program flow through the code. 404 405As with any other CoreSight component, specifics about the STM tracer can be 406found in sysfs with more information on each entry being found in [1]: 407 408root@genericarmv8:~# ls /sys/bus/coresight/devices/20100000.stm 409enable_source hwevent_select port_enable subsystem uevent 410hwevent_enable mgmt port_select traceid 411root@genericarmv8:~# 412 413Like any other source a sink needs to be identified and the STM enabled before 414being used: 415 416root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/20010000.etf/enable_sink 417root@genericarmv8:~# echo 1 > /sys/bus/coresight/devices/20100000.stm/enable_source 418 419From there user space applications can request and use channels using the devfs 420interface provided for that purpose by the generic STM API: 421 422root@genericarmv8:~# ls -l /dev/20100000.stm 423crw------- 1 root root 10, 61 Jan 3 18:11 /dev/20100000.stm 424root@genericarmv8:~# 425 426Details on how to use the generic STM API can be found here [2]. 427 428[1]. Documentation/ABI/testing/sysfs-bus-coresight-devices-stm 429[2]. Documentation/trace/stm.rst 430[3]. https://github.com/Linaro/perf-opencsd 431