perf-c2c.txt - OpenGrok cross reference for /Linux-v5.10/tools/perf/Documentation/perf-c2c.txt

Lines Matching +full:l2 +full:- +full:data +full:- +full:latency
1 perf-c2c(1)
5 ----
6 perf-c2c - Shared Data C2C/HITM Analyzer.
9 --------
12 'perf c2c record' [<options>] -- [<record command options>] <command>
16 -----------
19 The perf c2c tool provides means for Shared Data C2C/HITM analysis. It allows
22 On x86, the tool is based on load latency and precise store facility events
27   - memory address of the access
28   - type of the access (load and store details)
29   - latency (in cycles) of the load access
31 The c2c tool provide means to record this data and report back access details
32 for cachelines with highest contention - highest number of HITM accesses.
35 User uses the record command to record events data and report command to
40 --------------
41 -e::
42 --event=::
43 	Select the PMU event. Use 'perf c2c record -e list'
46 -v::
47 --verbose::
50 -l::
51 --ldlat::
52 	Configure mem-loads latency. (x86 only)
54 -k::
55 --all-kernel::
58 -u::
59 --all-user::
63 --------------
64 -k::
65 --vmlinux=<file>::
68 -v::
69 --verbose::
72 -i::
73 --input::
76 -N::
77 --node-info::
80 -c::
81 --coalesce::
86 -g::
87 --call-graph::
89 	Please refer to perf-report man page for details.
91 --stdio::
94 --stats::
97 --full-symbols::
100 --no-source::
103 --show-all::
106 -f::
107 --force::
110 -d::
111 --display::
114 --stitch-lbr::
116 	callgraph. The perf.data file must have been obtained using
117 	perf c2c record --call-graph lbr.
126 ----------
133   -W,-d,--phys-data,--sample-cpu
135 Unless specified otherwise with '-e' option, following events are monitored by
138   cpu/mem-loads,ldlat=30/P
139   cpu/mem-stores/P
143   cpu/mem-loads/
144   cpu/mem-stores/
146 User can pass any 'perf record' option behind '--' mark, like (to enable
149   $ perf c2c record -- -g -a
154 ----------
155 The perf c2c report command displays shared data analysis.  It comes in two
159   - sort all the data based on the cacheline address
160   - store access details for each cacheline
161   - sort all cachelines based on user settings
162   - display data
168 For each cacheline in the 1) list we display following data:
172   - zero based index to identify the cacheline
175   - cacheline address (hex number)
178   - cacheline percentage of all Remote/Local HITM accesses
180   LLC Load Hitm - Total, LclHitm, RmtHitm
181   - count of Total/Local/Remote load HITMs
184   - sum of all cachelines accesses
187   - sum of all load accesses
190   - sum of all store accesses
192   Store Reference - L1Hit, L1Miss
193     L1Hit - store accesses that hit L1
194     L1Miss - store accesses that missed L1
196   Core Load Hit - FB, L1, L2
197   - count of load hits in FB (Fill Buffer), L1 and L2 cache
199   LLC Load Hit - LlcHit, LclHitm
200   - count of LLC load accesses, includes LLC hits and LLC HITMs
202   RMT Load Hit - RmtHit, RmtHitm
203   - count of remote load accesses, includes remote hits and remote HITMs
205   Load Dram - Lcl, Rmt
206   - count of local and remote DRAM accesses
208 For each offset in the 2) list we display following data:
210   HITM - Rmt, Lcl
211   - % of Remote/Local HITM accesses for given offset within cacheline
213   Store Refs - L1 Hit, L1 Miss
214   - % of store accesses that hit/missed L1 for given offset within cacheline
216   Data address - Offset
217   - offset address
220   - pid of the process responsible for the accesses
223   - tid of the process responsible for the accesses
226   - code address responsible for the accesses
228   cycles - rmt hitm, lcl hitm, load
229     - sum of cycles for given accesses - Remote/Local HITM and generic load
232     - number of cpus that participated on the access
235     - code symbol related to the 'Code address' value
238     - shared object name related to the 'Code address' value
241     - source information related to the 'Code address' value
244     - nodes participating on the access (see NODE INFO section)
247 ---------
250   - node IDs separated by ','
251   - node IDs with stats for each ID, in following format:
253   - node IDs with list of affected CPUs in following format:
256 User can switch between above flavors with -N option or
260 --------
266   tid   - coalesced by process TIDs
267   pid   - coalesced by process PIDs
268   iaddr - coalesced by code address, following fields are displayed:
270   dso   - coalesced by shared object
275 ------------
276 The stdio output displays data on standard output.
280   - overall statistics of memory accesses
283   - overall statistics on shared cachelines
285   Shared Data Cache Line Table
286   - list of most expensive cachelines
289   - list of all accessed offsets for each cacheline
292 ----------
299 -------
305 --------
307   https://joemario.github.io/blog/2016/09/01/c2c-blog/
310 --------
311 linkperf:perf-record[1], linkperf:perf-mem[1]