Lines Matching +full:trace +full:- +full:buffer +full:- +full:extension
1 perf-intel-pt(1)
5 ----
6 perf-intel-pt - Support for Intel Processor Trace within perf tools
9 --------
11 'perf record' -e intel_pt//
14 -----------
16 Intel Processor Trace (Intel PT) is an extension of Intel Architecture that
19 Technical details are documented in the Intel 64 and IA-32 Architectures
20 Software Developer Manuals, Chapter 36 Intel Processor Trace.
23 processors that are based on the Intel micro-architecture code name Broadwell.
25 Trace data is collected by 'perf record' and stored within the perf.data file.
28 Trace data must be 'decoded' which involves walking the object code and matching
29 the trace data packets. For example a TNT packet only tells whether a
33 Decoding is done on-the-fly. The decoder outputs samples in the same format as
43 builds, however the executed images are needed - which makes use in JIT-compiled
44 environments, or with self-modified code, a challenge. Also symbols need to be
47 A limitation of Intel PT is that it produces huge amounts of trace data
51 vary depending on the use-case and architecture.
55 ----------
61 Data is captured with 'perf record' e.g. to trace 'ls' userspace-only:
63 perf record -e intel_pt//u ls
69 To also trace kernel space presents a problem, namely kernel self-modifying
73 --kcore is used, but access to /proc/kcore is restricted e.g.
75 sudo perf record -o pt_ls --kcore -e intel_pt// -- ls
82 sudo perf report -i pt_ls
84 Because samples are synthesized after-the-fact, the sampling period can be
87 sudo perf report pt_ls --itrace=i1usge
89 See the sections below for more information about the --itrace option.
103 perf record -e intel_pt//u ls
104 perf script --itrace=ibxwpe
109 perf script --itrace=ibxwpe -F+flags
112 system, asynchronous, interrupt, transaction abort, trace begin, trace end,
113 in transaction, VM-entry, and VM-exit respectively.
117 perf script --insn-trace --xed
120 Dumping all instructions in a long trace can be fairly slow. It is usually better
123 perf script --call-trace
127 perf script --call-ret-trace
132 perf script --time starttime,stoptime --insn-trace --xed
134 While examining the trace it's also useful to filter on specific CPUs using
135 the -C option
137 perf script --time starttime,stoptime --insn-trace --xed -C 1
144 perf script --itrace=be -F+ipc
146 There are two ways that instructions-per-cycle (IPC) can be calculated depending
151 used - refer to the 'mtc' config term. When MTC is used, however, the values
166 Another note, in the case of "branches" events, non-taken branches are not
168 TNT packet that starts with a non-taken branch. To see every possible IPC
169 value, "instructions" events can be used e.g. --itrace=i0ns
173 Refer to script export-to-sqlite.py or export-to-postgresql.py for more details,
174 and to script exported-sql-viewer.py for an example of using the database.
176 There is also script intel-pt-events.py which provides an example of how to
180 --insn-trace - instruction trace
181 --src-trace - source trace
188 by inability to access the executed image, self-modified or JIT-ed code, or the
189 inability to match side-band information (such as context switches and mmaps)
193 resulting in data lost because the buffer was full. See 'Buffer handling' below
198 -----------
207 -e intel_pt//
211 -e intel_pt/tsc,noretcomp=0/
215 -e intel_pt/tsc=1,noretcomp=0/
217 Note there are now new config terms - see section 'config terms' further below.
224 $ grep -H . /sys/bus/event_source/devices/intel_pt/format/*
226 /sys/bus/event_source/devices/intel_pt/format/cyc_thresh:config:19-22
228 /sys/bus/event_source/devices/intel_pt/format/mtc_period:config:14-17
230 /sys/bus/event_source/devices/intel_pt/format/psb_period:config:24-27
235 -e intel_pt/noretcomp=0/
239 -e intel_pt/tsc=1,noretcomp=0/
243 -e intel_pt/tsc=0/
247 -e intel_pt/config=0x400/
262 perf_event_attr is displayed if the -vv option is used e.g.
264 ------------------------------------------------------------
278 ------------------------------------------------------------
279 sys_perf_event_open: pid 31104 cpu 0 group_fd -1 flags 0x8
280 sys_perf_event_open: pid 31104 cpu 1 group_fd -1 flags 0x8
281 sys_perf_event_open: pid 31104 cpu 2 group_fd -1 flags 0x8
282 sys_perf_event_open: pid 31104 cpu 3 group_fd -1 flags 0x8
283 ------------------------------------------------------------
289 The June 2015 version of Intel 64 and IA-32 Architectures Software Developer
290 Manuals, Chapter 36 Intel Processor Trace, defined new Intel PT features.
296 without timing information, for example a per-thread context
327 trace bytes between PSB packets as:
336 $ perf record -e intel_pt/psb_period=15/u uname
337 Invalid psb_period for intel_pt. Valid values are: 0-5
343 If decoding is expected to be reliable and the buffer is large
363 The frequency of MTC packets can also be specified - see
366 mtc_period Specifies how frequently MTC packets are produced - see mtc
378 CTC-frequency / (2 ^ value)
380 e.g. value 3 means one eighth of CTC-frequency
388 $ perf record -e intel_pt/mtc_period=15/u uname
409 a threshold - see cyc_thresh below.
411 cyc_thresh Specifies how frequently CYC packets are produced - see cyc
425 2 ^ (value - 1)
434 $ perf record -e intel_pt/cyc,cyc_thresh=15/u uname
435 Invalid cyc_thresh for intel_pt. Valid values are: 0-12
439 pt Specifies pass-through which enables the 'branch' config term.
466 changes to the CPU C-state.
481 --aux-sample
485 --aux-sample=8192
489 -e intel_pt//u
492 following will create Intel PT samples on the branch-misses event, note the
495 perf record --aux-sample -e '{intel_pt//u,branch-misses:u}'
497 An alternative to '--aux-sample' is to add the config term 'aux-sample-size' to
500 perf record -e intel_pt//u -e branch-misses/aux-sample-size=8192/u
504 perf record -e '{intel_pt//u,branch-misses/aux-sample-size=8192/u}'
508 …perf record -e intel_pt//u --filter 'filter * @/bin/ls' -e branch-misses/aux-sample-size=8192/u --…
522 difficult to know how big the event might be without the trace sample attached,
529 The difference between full trace and snapshot from the kernel's perspective is
530 that in full trace we don't overwrite trace data that the user hasn't collected
532 the trace run and overwrite older data in the buffer so that whenever something
538 -S
542 -S0x100000
550 The snapshot size is displayed if the option -vv is used e.g.
558 Intel PT buffer size is specified by an addition to the -m option e.g.
560 -m,16
562 selects a buffer size of 16 pages i.e. 64KiB.
564 Note that the existing functionality of -m is unchanged. The auxtrace mmap size
578 In full-trace mode, powers of two are allowed for buffer size, with a minimum
582 The mmap size and auxtrace mmap size are displayed if the -vv option is used e.g.
592 full-trace mode
596 Full-trace mode traces continuously e.g.
598 perf record -e intel_pt//u uname
602 perf record --aux-sample -e intel_pt//u -e branch-misses:u
607 perf record -v -e intel_pt//u -S ./loopy 1000000000 &
609 kill -USR2 11435
613 Note that "Recording AUX area tracing snapshot" is displayed because the -v
623 $ sudo ~/bin/perf record --control fifo:perf.control,perf.ack -S -e intel_pt//u -- sleep 60 &
625 $ ps -e | grep perf
627 $ kill -USR2 15244
628 bash: kill: (15244) - Operation not permitted
635 Buffer handling
638 There may be buffer limitations (i.e. single ToPa entry) which means that actual
639 buffer sizes are limited to powers of 2 up to 4MiB (MAX_ORDER). In order to
643 a) the interrupt may not be handled in time so that the current buffer
644 becomes full and some trace data is lost.
648 If trace data is lost, the driver sets 'truncated' in the PERF_RECORD_AUX event
651 In full-trace mode, the driver waits for data to be copied out before allowing
652 the (logical) buffer to wrap-around. If data is not copied out quickly enough,
655 that happens, perf tools always re-enable the intel_pt event after copying out
662 By default "perf record" post-processes the event stream to find all build ids
670 perf buildid-list
674 perf buildid-list --with-hits
682 collection of side-band information. In order to prevent that, a dummy
685 there is complete side-band information to allow the decoding of subsequent
708 "per thread" mode is selected by -t or by --per-thread (with -p or -u or just a
710 "per cpu" is selected by -C or -a.
714 In per-thread mode an exact list of threads is traced. There is no inheritance.
715 Each thread has its own event buffer.
717 In per-cpu mode all processes (or processes from the selected cgroup i.e. -G
718 option, or processes selected with -p or -u) are traced. Each cpu has its own
719 buffer. Inheritance is allowed.
721 In workload-only mode, the workload is traced but with per-cpu buffers.
722 Inheritance is allowed. Note that you can now trace a workload in per-thread
723 mode by using the --per-thread option.
726 Privileged vs non-privileged users
729 Unless /proc/sys/kernel/perf_event_paranoid is set to -1, unprivileged users
730 have memory limits imposed upon them. That affects what buffer sizes they can
745 Unless /proc/sys/kernel/perf_event_paranoid is set to -1, unprivileged users are
746 not permitted to use tracepoints which means there is insufficient side-band
747 information to decode Intel PT in per-cpu mode, and potentially workload-only
750 Note also, that to use tracepoints, read-access to debugfs is required. So if
751 debugfs is not mounted or the user does not have read-access, it will again not
752 be possible to decode Intel PT in per-cpu mode.
758 The sched_switch tracepoint is used to provide side-band data for Intel PT
765 $ perf record -vv -e intel_pt//u uname
766 ------------------------------------------------------------
780 ------------------------------------------------------------
781 sys_perf_event_open: pid 31104 cpu 0 group_fd -1 flags 0x8
782 sys_perf_event_open: pid 31104 cpu 1 group_fd -1 flags 0x8
783 sys_perf_event_open: pid 31104 cpu 2 group_fd -1 flags 0x8
784 sys_perf_event_open: pid 31104 cpu 3 group_fd -1 flags 0x8
785 ------------------------------------------------------------
796 ------------------------------------------------------------
797 sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8
798 sys_perf_event_open: pid -1 cpu 1 group_fd -1 flags 0x8
799 sys_perf_event_open: pid -1 cpu 2 group_fd -1 flags 0x8
800 sys_perf_event_open: pid -1 cpu 3 group_fd -1 flags 0x8
801 ------------------------------------------------------------
820 ------------------------------------------------------------
821 sys_perf_event_open: pid 31104 cpu 0 group_fd -1 flags 0x8
822 sys_perf_event_open: pid 31104 cpu 1 group_fd -1 flags 0x8
823 sys_perf_event_open: pid 31104 cpu 2 group_fd -1 flags 0x8
824 sys_perf_event_open: pid 31104 cpu 3 group_fd -1 flags 0x8
827 perf event ring buffer mmapped per cpu
834 and only in per-cpu mode.
838 cannot be matched against the Intel PT trace.
842 -----------
844 By default, perf script will decode trace data found in the perf.data file.
845 This can be further controlled by new option --itrace.
848 New --itrace option
853 --itrace
857 --itrace=cepwx
876 Z prefer to ignore timestamps (so-called "timeless" decoding)
878 "Instructions" events look like they were recorded by "perf record -e
881 "Branches" events look like they were recorded by "perf record -e branches". "c"
899 "Power" events correspond to power event packets and CBR (core-to-bus ratio)
903 C-state changes, whereas CBR is indicative of CPU frequency. perf script
907 pwre: hw: 0 cstate: 2 sub-cstate: 0
911 "cbr" includes the frequency and the percentage of maximum non-turbo
913 "pwre" shows C-state transitions (to a C-state deeper than C0) and
918 For more details refer to the Intel 64 and IA-32 Architectures Software
921 PSB events show when a PSB+ occurred and also the byte-offset in the trace.
926 Error events show where the decoder lost the trace. Error events
929 will or will not be reported. Each flag must be preceded by either '+' or '-'.
931 -o Suppress overflow errors
932 -l Suppress trace data lost errors
935 --itrace=e-o-l
941 must be preceded by either '+' or '-'. The flags support by Intel PT are:
942 -a Suppress logging of perf events
949 --itrace=i10us
952 microseconds of trace. Alternatives to "us" are "ms" (milliseconds),
967 'instructions' (i.e. --itrace=i1i).
972 --itrace=ig32
973 --itrace=xg32
978 --itrace=il10
979 --itrace=xl10
986 instead of synthesized events. For example, to record branch-misses events for
987 'ls' and then add a call chain derived from the Intel PT trace:
989 perf record --aux-sample -e '{intel_pt//u,branch-misses:u}' -- ls
990 perf report --itrace=Ge
998 into the event buffer in one go. That reduces interrupts, but can give very
999 late timestamps. Because the Intel PT trace is synchronized by timestamps,
1000 the PEBS events do not match the trace. Currently, Large PEBS is used only in
1002 - hardware supports it
1003 - PEBS is used
1004 - event period is specified, instead of frequency
1005 - the sample type is limited to the following flags:
1014 cases, avoid specifying the event period i.e. avoid the 'perf record' -c option,
1015 --count option, or 'period' config term.
1017 To disable trace decoding entirely, use the option --no-itrace.
1022 --itrace=i0nss1000000
1026 The q option changes the way the trace is decoded. The decoding is much faster
1033 ranges that could then be decoded fully using the --time option.
1037 - direct calls and jmps
1038 - conditional branches
1039 - non-branch instructions
1043 - asynchronous branches such as interrupts
1044 - indirect branches
1045 - function return target address *if* the noretcomp config term (refer
1047 - start of (control-flow) tracing
1048 - end of (control-flow) tracing, if it is not out of context
1049 - power events, ptwrite, transaction start and abort
1050 - instruction pointer associated with PSB packets
1055 Repeating the q option (double-q i.e. qq) results in even faster decoding and even
1058 PSBEND). Note PSB packets occur regularly in the trace based on the psb_period
1064 - everything except instruction pointer associated with PSB packets
1068 - instruction pointer associated with PSB packets
1070 The Z option is equivalent to having recorded a trace without TSC
1072 decoding a trace of a virtual machine.
1078 perf script has an option (-D) to "dump" the events i.e. display the binary
1081 When -D is used, Intel PT packets are displayed. The packet decoder does not
1082 pay attention to PSB packets, but just decodes the bytes - so the packets seen
1084 One example of that would be when the buffer-switching interrupt has been too
1085 slow, and the buffer has been filled completely. In that case, the last packet
1086 in the buffer might be truncated and immediately followed by a PSB as the trace
1087 continues in the next buffer.
1089 To disable the display of Intel PT packets, combine the -D option with
1090 --no-itrace.
1094 -----------
1096 By default, perf report will decode trace data found in the perf.data file.
1097 This can be further controlled by new option --itrace exactly the same as
1098 perf script, with the exception that the default is --itrace=igxe.
1102 -----------
1104 perf inject also accepts the --itrace option in which case tracing data is
1107 perf inject --itrace -i perf.data -o perf.data.new
1114 $ gcc-5 -O3 sort.c -o sort_optimized
1120 [intel-pt]
1121 mispred-all = on
1123 $ perf record -e intel_pt//u ./sort 3000
1128 $ perf inject -i perf.data -o inj --itrace=i100usle --strip
1129 $ ./create_gcov --binary=./sort --profile=inj --gcov=sort.gcov -gcov_version=1
1130 $ gcc-5 -O3 -fauto-profile=sort.gcov sort.c -o sort_autofdo
1140 -----------------
1142 Some hardware has the feature to redirect PEBS records to the Intel PT trace.
1143 Recording is selected by using the aux-output config term e.g.
1145 perf record -c 10000 -e '{intel_pt/branch=0/,cycles/aux-output/ppp}' uname
1149 To display PEBS events from the Intel PT trace, use the itrace 'o' option e.g.
1151 perf script --itrace=oe
1154 ---
1156 include::build-xed.txt[]
1160 ------------------------
1169 VMX controls may block the perf NMI to the host potentially resulting in lost trace data
1170 …Guest kernel self-modifying code (e.g. jump labels or JIT-compiled eBPF) will result in decoding e…
1182 Mount the guest file system. Note sshfs needs -o direct_io to enable reading of proc files. root …
1185 $ sshfs -o direct_io root@vm0:/ vm0
1189 $ perf buildid-cache -v --kcore vm0/proc/kcore
1190 …kcore added to build-id cache directory /home/user/.debug/[kernel.kcore]/9600f316a53a0f54278885e8d…
1195 $ ps -eLl | grep 'KVM\|PID'
1197 3 S 64055 1430 1 1440 1 80 0 - 1921718 - ? 00:02:47 CPU 0/KVM
1198 3 S 64055 1430 1 1441 1 80 0 - 1921718 - ? 00:02:41 CPU 1/KVM
1199 3 S 64055 1430 1 1442 1 80 0 - 1921718 - ? 00:02:38 CPU 2/KVM
1200 3 S 64055 1430 1 1443 2 80 0 - 1921718 - ? 00:03:18 CPU 3/KVM
1202 Start an open-ended perf record, tracing the VM process, do something on the VM, and then ctrl-C to…
1206 Intel PT traces both the host and the guest so --guest and --host need to be specified.
1207 Without timestamps, --per-thread must be specified to distinguish threads.
1209 …$ sudo perf kvm --guest --host --guestkallsyms $KALLSYMS record --kcore -e intel_pt/tsc=0,mtc=0,cy…
1214 perf script can be used to provide an instruction trace
1216 $ perf script --guestkallsyms $KALLSYMS --insn-trace --xed -F+ipc | grep -C10 vmresume | head -21
1246 Mount the guest file system. Note sshfs needs -o direct_io to enable reading of proc files. root …
1248 $ mkdir -p vm0
1249 $ sshfs -o direct_io root@vm0:/ vm0
1253 $ perf buildid-cache -v --kcore vm0/proc/kcore
1259 $ ps -eLl | grep 'KVM\|PID'
1261 3 S 64055 16998 1 17005 13 80 0 - 1818189 - ? 00:00:16 CPU 0/KVM
1262 3 S 64055 16998 1 17006 4 80 0 - 1818189 - ? 00:00:05 CPU 1/KVM
1263 3 S 64055 16998 1 17007 3 80 0 - 1818189 - ? 00:00:04 CPU 2/KVM
1264 3 S 64055 16998 1 17008 4 80 0 - 1818189 - ? 00:00:05 CPU 3/KVM
1266 Start an open-ended perf record, tracing the VM process, do something on the VM, and then ctrl-C to…
1269 Intel PT traces both the host and the guest so --guest and --host need to be specified.
1271 …$ sudo perf kvm --guest --host --guestkallsyms $KALLSYMS record --kcore -e intel_pt/cyc=1/k -p 169…
1276 only 7-bytes, so the TSC Offset might differ from the actual value in the 8th byte. That will
1279 $ perf inject -i perf.data.kvm --vm-time-correlation=dry-run
1295 $ perf inject -i perf.data.kvm --vm-time-correlation="dry-run 0xffffe42722c64c41"
1297 Note the options for 'perf inject' --vm-time-correlation are:
1299 [ dry-run ] [ <TSC Offset> [ : <VMCS> [ , <VMCS> ]... ] ]...
1302 The option "dry-run" will cause the file to be processed but without updating it.
1303 Note it is also possible to get a intel_pt.log file by adding option --itrace=d
1307 $ perf inject -i perf.data.kvm --vm-time-correlation=0xffffe42722c64c41 --force
1311 $ perf script -i perf.data.kvm --guestkallsyms $KALLSYMS --itrace=e-o
1315 'perf script' can be used to provide an instruction trace showing timestamps
1317 …$ perf script -i perf.data.kvm --guestkallsyms $KALLSYMS --insn-trace --xed -F+ipc | grep -C10 vmr…
1343 --------
1345 linkperf:perf-record[1], linkperf:perf-script[1], linkperf:perf-report[1],
1346 linkperf:perf-inject[1]