perf-stat.txt - OpenGrok cross reference for /Linux-v5.10/tools/perf/Documentation/perf-stat.txt

Lines Matching +full:per +full:- +full:cpu
1 perf-stat(1)
5 ----
6 perf-stat - Run a command and gather performance counter statistics
9 --------
11 'perf stat' [-e <EVENT> | --event=EVENT] [-a] <command>
12 'perf stat' [-e <EVENT> | --event=EVENT] [-a] -- <command> [<options>]
13 'perf stat' [-e <EVENT> | --event=EVENT] [-a] record [-o file] -- <command> [<options>]
14 'perf stat' report [-i file]
17 -----------
23 -------
33 -e::
34 --event=::
37 	- a symbolic event name (use 'perf list' to list all events)
39 	- a raw PMU event (eventsel+umask) in the form of rNNN where NNN is a
42         - a symbolic or raw PMU event followed by an optional colon
43 	  and a list of event modifiers, e.g., cpu-cycles:p.  See the
44 	  linkperf:perf-list[1] man page for details on event modifiers.
46 	- a symbolically formed event like 'pmu/param1=0x3,param2/' where
52 	  perf stat -A -a -e cpu/event,percore=1/,otherevent ...
54 	- a symbolically formed event like 'pmu/config=M,config1=N,config2=K/'
67 -i::
68 --no-inherit::
70 -p::
71 --pid=<pid>::
74 -t::
75 --tid=<tid>::
79 --pfm-events events::
81 including support for event filters. For example '--pfm-events
84 events cannot be mixed together. The latter must be used with the -e
85 option. The -e option and this one can be mixed and matched.  Events
89 -a::
90 --all-cpus::
91         system-wide collection from all CPUs (default if no target is specified)
93 --no-scale::
96 -d::
97 --detailed::
100 	   -d:          detailed events, L1 and LLC data cache
101         -d -d:     more detailed events, dTLB and iTLB events
102      -d -d -d:     very detailed events, adding prefetch events
104 -r::
105 --repeat=<n>::
108 -B::
109 --big-num::
111 	Enabled by default. Use "--no-big-num" to disable.
112 	Default setting can be changed with "perf config stat.big-num=false".
114 -C::
115 --cpu=::
117 comma-separated list with no space: 0,1. Ranges of CPUs are specified with -: 0-2.
118 In per-thread mode, this option is ignored. The -a option is still necessary
119 to activate system-wide monitoring. Default is to count on all CPUs.
121 -A::
122 --no-aggr::
125 -n::
126 --null::
127         null run - don't start any counters
129 -v::
130 --verbose::
133 -x SEP::
134 --field-separator SEP::
135 print counts using a CSV-style output to make it easy to import directly into
138 --table:: Display time for each run (-r option), in a table format, e.g.:
140   $ perf stat --null -r 5 --table perf bench sched pipe
145              5.189 (-0.293) #
146              5.189 (-0.294) #
147              5.186 (-0.296) #
152              5.483 +- 0.198 seconds time elapsed  ( +-  3.62% )
154 -G name::
155 --cgroup name::
157 in per-cpu mode. The cgroup filesystem must be mounted. All threads belonging to
161 an empty cgroup (monitor all the time) using, e.g., -G foo,,bar. Cgroups must have
164 use '-e e1 -e e2 -G foo,foo' or just use '-e e1 -e e2 -G foo'.
167 command line can be used: 'perf stat -e cycles -G cgroup_name -a -e cycles'.
169 --for-each-cgroup name::
171 by comma).  This has same effect that repeating -e option and -G option for
172 each event x name.  This option cannot be used with -G/--cgroup option.
174 -o file::
175 --output file::
178 --append::
179 Append to the output file designated with the -o option. Ignored if -o is not specified.
181 --log-fd::
183 Log output to fd, instead of stderr.  Complementary to --output, and mutually exclusive
184 with it.  --append may be used here.  Examples:
185      3>results  perf stat --log-fd 3          -- $cmd
186      3>>results perf stat --log-fd 3 --append -- $cmd
188 --control=fifo:ctl-fifo[,ack-fifo]::
189 --control=fd:ctl-fd[,ack-fd]::
190 ctl-fifo / ack-fifo are opened and used as ctl-fd / ack-fd as follows.
191 Listen on ctl-fd descriptor for command to control measurement ('enable': enable events,
193 --delay=-1 option. Optionally send control command completion ('ack\n') to ack-fd descriptor
202  test -p ${ctl_fifo} && unlink ${ctl_fifo}
207  test -p ${ctl_ack_fifo} && unlink ${ctl_ack_fifo}
211  perf stat -D -1 -e cpu-cycles -a -I 1000       \
212            --control fd:${ctl_fd},${ctl_fd_ack} \
213            -- sleep 30 &
216  sleep 5  && echo 'enable' >&${ctl_fd} && read -u ${ctl_fd_ack} e1 && echo "enabled(${e1})"
217  sleep 10 && echo 'disable' >&${ctl_fd} && read -u ${ctl_fd_ack} d1 && echo "disabled(${d1})"
219  exec {ctl_fd_ack}>&-
222  exec {ctl_fd}>&-
225  wait -n ${perf_pid}
229 --pre::
230 --post::
233 perf stat --repeat 10 --null --sync --pre 'make -s O=defconfig-build/clean' -- make -s -j64 O=defco…
235 -I msecs::
236 --interval-print msecs::
239 	example: 'perf stat -I 1000 -e cycles -a sleep 5'
243 --interval-count times::
245 This option should be used together with "-I" option.
246 	example: 'perf stat -I 1000 --interval-count 2 -e cycles -a'
248 --interval-clear::
251 --timeout msecs::
253 This option is not supported with the "-I" option.
254 	example: 'perf stat --time 2000 -e cycles -a'
256 --metric-only::
258 Don't show any raw values. Not supported with --per-thread.
260 --per-socket::
261 Aggregate counts per processor socket for system-wide mode measurements.  This
263 use --per-socket in addition to -a. (system-wide).  The output includes the
267 --per-die::
268 Aggregate counts per processor die for system-wide mode measurements.  This
270 use --per-die in addition to -a. (system-wide).  The output includes the
274 --per-core::
275 Aggregate counts per physical processor for system-wide mode measurements.  This
277 use --per-core in addition to -a. (system-wide).  The output includes the
280 --per-thread::
281 Aggregate counts per monitored threads, when monitoring threads (-t option)
282 or processes (-p option).
284 --per-node::
285 Aggregate counts per NUMA nodes for system-wide mode measurements. This
287 mode, use --per-node in addition to -a. (system-wide).
289 -D msecs::
290 --delay msecs::
291 After starting the program, wait msecs before measuring (-1: start with events
295 -T::
296 --transaction::
300 --metric-no-group::
303 --metric-no-group option places events outside of groups and may
304 increase the chance of the event being scheduled - leading to more
306 for metrics like instructions per cycle can be lower - as both metrics
309 --metric-no-merge::
320 -----------
323 -o file::
324 --output file::
328 -----------
331 -i file::
332 --input file::
335 --per-socket::
336 Aggregate counts per processor socket for system-wide mode measurements.
338 --per-die::
339 Aggregate counts per processor die for system-wide mode measurements.
341 --per-core::
342 Aggregate counts per physical processor for system-wide mode measurements.
344 -M::
345 --metrics::
351 -A::
352 --no-aggr::
355 --topdown::
356 Print top down level 1 metrics if supported by the CPU. This allows to
357 determine bottle necks in the CPU pipeline for CPU bound workloads,
361 Frontend bound means that the CPU cannot fetch and decode instructions fast
363 neck. Bad Speculation means that the CPU wasted cycles due to branch
364 mispredictions and similar issues. Retiring means that the CPU computed without
366 if the workload is actually bound by the CPU and not by something else.
369 mode like -I 1000, as the bottleneck of workloads can change often.
371 This enables --metric-only, unless overridden with --no-metric-only.
376 The top down metrics are collected per core instead of per
377 CPU thread. Per core mode is automatically enabled
378 and -a (global monitoring) is needed, requiring root rights or
379 perf.perf_event_paranoid=-1.
391 --no-merge::
404 --smi-cost::
407 During the measurement, the /sys/device/cpu/freeze_on_smi will be set to
410 The cost of SMI can be measured by (aperf - unhalted core cycles).
413 oriented analysis. --metric_only will be applied by default.
414 The output is SMI cycles%, equals to (aperf - unhalted core cycles) / aperf
416 Users who wants to get the actual value can apply --no-metric-only.
418 --all-kernel::
421 --all-user::
424 --percore-show-thread::
426 for all hardware threads in a core and show the counts per core.
429 counts for all hardware threads in a core but show the sum counts per
433 --summary::
434 Print summary for interval mode (-I).
437 --------
439 $ perf stat -- make
443         83723.452481      task-clock:u (msec)       #    1.004 CPUs utilized
444                    0      context-switches:u        #    0.000 K/sec
445                    0      cpu-migrations:u          #    0.000 K/sec
446            3,228,188      page-faults:u             #    0.039 M/sec
448      313,163,853,778      instructions:u            #    1.36  insn per cycle
450        2,078,861,393      branch-misses:u           #    2.98% of all branches
458 -------
473 ----------
475 With -x, perf stat is able to output a not-quite-CSV format output
477 it is recommended to use a different character like -x \;
481 	- optional usec time stamp in fractions of second (with -I xxx)
482 	- optional CPU, core, or socket identifier
483 	- optional number of logical CPUs aggregated
484 	- counter value
485 	- unit of the counter value or empty
486 	- event name
487 	- run time of counter
488 	- percentage of measurement time the counter was running
489 	- optional variance if multiple values are collected with -r
490 	- optional metric value
491 	- optional unit of metric
496 --------
497 linkperf:perf-top[1], linkperf:perf-list[1]