Lines Matching +full:l2 +full:- +full:data +full:- +full:latency
1 perf-c2c(1)
5 ----
6 perf-c2c - Shared Data C2C/HITM Analyzer.
9 --------
12 'perf c2c record' [<options>] -- [<record command options>] <command>
16 -----------
19 The perf c2c tool provides means for Shared Data C2C/HITM analysis. It allows
22 On x86, the tool is based on load latency and precise store facility events
27 - memory address of the access
28 - type of the access (load and store details)
29 - latency (in cycles) of the load access
31 The c2c tool provide means to record this data and report back access details
32 for cachelines with highest contention - highest number of HITM accesses.
35 User uses the record command to record events data and report command to
40 --------------
41 -e::
42 --event=::
43 Select the PMU event. Use 'perf c2c record -e list'
46 -v::
47 --verbose::
50 -l::
51 --ldlat::
52 Configure mem-loads latency. (x86 only)
54 -k::
55 --all-kernel::
58 -u::
59 --all-user::
63 --------------
64 -k::
65 --vmlinux=<file>::
68 -v::
69 --verbose::
72 -i::
73 --input::
76 -N::
77 --node-info::
80 -c::
81 --coalesce::
86 -g::
87 --call-graph::
89 Please refer to perf-report man page for details.
91 --stdio::
94 --stats::
97 --full-symbols::
100 --no-source::
103 --show-all::
106 -f::
107 --force::
110 -d::
111 --display::
114 --stitch-lbr::
116 callgraph. The perf.data file must have been obtained using
117 perf c2c record --call-graph lbr.
126 ----------
133 -W,-d,--phys-data,--sample-cpu
135 Unless specified otherwise with '-e' option, following events are monitored by
138 cpu/mem-loads,ldlat=30/P
139 cpu/mem-stores/P
143 cpu/mem-loads/
144 cpu/mem-stores/
146 User can pass any 'perf record' option behind '--' mark, like (to enable
149 $ perf c2c record -- -g -a
154 ----------
155 The perf c2c report command displays shared data analysis. It comes in two
159 - sort all the data based on the cacheline address
160 - store access details for each cacheline
161 - sort all cachelines based on user settings
162 - display data
168 For each cacheline in the 1) list we display following data:
172 - zero based index to identify the cacheline
175 - cacheline address (hex number)
178 - cacheline percentage of all Remote/Local HITM accesses
180 LLC Load Hitm - Total, LclHitm, RmtHitm
181 - count of Total/Local/Remote load HITMs
184 - sum of all cachelines accesses
187 - sum of all load accesses
190 - sum of all store accesses
192 Store Reference - L1Hit, L1Miss
193 L1Hit - store accesses that hit L1
194 L1Miss - store accesses that missed L1
196 Core Load Hit - FB, L1, L2
197 - count of load hits in FB (Fill Buffer), L1 and L2 cache
199 LLC Load Hit - LlcHit, LclHitm
200 - count of LLC load accesses, includes LLC hits and LLC HITMs
202 RMT Load Hit - RmtHit, RmtHitm
203 - count of remote load accesses, includes remote hits and remote HITMs
205 Load Dram - Lcl, Rmt
206 - count of local and remote DRAM accesses
208 For each offset in the 2) list we display following data:
210 HITM - Rmt, Lcl
211 - % of Remote/Local HITM accesses for given offset within cacheline
213 Store Refs - L1 Hit, L1 Miss
214 - % of store accesses that hit/missed L1 for given offset within cacheline
216 Data address - Offset
217 - offset address
220 - pid of the process responsible for the accesses
223 - tid of the process responsible for the accesses
226 - code address responsible for the accesses
228 cycles - rmt hitm, lcl hitm, load
229 - sum of cycles for given accesses - Remote/Local HITM and generic load
232 - number of cpus that participated on the access
235 - code symbol related to the 'Code address' value
238 - shared object name related to the 'Code address' value
241 - source information related to the 'Code address' value
244 - nodes participating on the access (see NODE INFO section)
247 ---------
250 - node IDs separated by ','
251 - node IDs with stats for each ID, in following format:
253 - node IDs with list of affected CPUs in following format:
256 User can switch between above flavors with -N option or
260 --------
266 tid - coalesced by process TIDs
267 pid - coalesced by process PIDs
268 iaddr - coalesced by code address, following fields are displayed:
270 dso - coalesced by shared object
275 ------------
276 The stdio output displays data on standard output.
280 - overall statistics of memory accesses
283 - overall statistics on shared cachelines
285 Shared Data Cache Line Table
286 - list of most expensive cachelines
289 - list of all accessed offsets for each cacheline
292 ----------
299 -------
305 --------
307 https://joemario.github.io/blog/2016/09/01/c2c-blog/
310 --------
311 linkperf:perf-record[1], linkperf:perf-mem[1]