Lines Matching +full:down +full:- +full:counters
2 -----------------------------------
5 methodology to break down CPU pipeline execution into 4 bottlenecks:
10 Traditionally this was implemented by events in generic counters
13 perf stat --topdown implements this.
15 Full Top Down includes more levels that can break down the
24 fixed counters and do not require generic counters. This allows
27 % perf stat -a --topdown -I1000
64 metric event, and allow user programs to read the performance counters.
95 int slots_fd = perf_event_open(&slots, 0, -1, -1, 0);
115 int metrics_fd = perf_event_open(&metrics, 0, -1, slots_fd, 0);
134 #define RDPMC_FIXED (1 << 30) /* return fixed counters */
135 #define RDPMC_METRIC (1 << 29) /* return metric counters */
158 _rdpmc calls should not be mixed with reading the metrics and slots counters
159 through system calls, as the kernel will reset these counters after each system
216 retiring_slots = GET_METRIC(metric_b, 0) * slots_b - retiring_slots_a
217 bad_spec_slots = GET_METRIC(metric_b, 1) * slots_b - bad_spec_slots_a
218 fe_bound_slots = GET_METRIC(metric_b, 2) * slots_b - fe_bound_slots_a
219 be_bound_slots = GET_METRIC(metric_b, 3) * slots_b - be_bound_slots_a
224 slots_delta = slots_b - slots_a
237 recreated from L1 and L2 metric counters. (Available on Sapphire Rapids and
247 heavy_ops_slots = GET_METRIC(metric_b, 4) * slots_b - heavy_ops_slots_a
248 br_mispredict_slots = GET_METRIC(metric_b, 5) * slots_b - br_mispredict_slots_a
249 fetch_lat_slots = GET_METRIC(metric_b, 6) * slots_b - fetch_lat_slots_a
250 mem_bound_slots = GET_METRIC(metric_b, 7) * slots_b - mem_bound_slots_a
252 slots_delta = slots_b - slots_a
254 light_ops_ratio = retiring_ratio - heavy_ops_ratio;
257 machine_clears_ratio = bad_spec_ratio - br_mispredict_ratio;
260 fetch_bw_ratio = fe_bound_ratio - fetch_lat_ratio;
263 core_bound_ratio = be_bound_ratio - mem_bound_ratio;
278 Resetting metrics counters
283 fraction bit shrinks. So the counters need to be reset regularly.
289 When using perf stat it is recommended to always use the -I option,
292 perf stat -I 1000 --topdown ...
307 Four pseudo TopDown metric events are exposed for the end-users,
308 topdown-retiring, topdown-bad-spec, topdown-fe-bound and topdown-be-bound.
311 - All the TopDown metric events must be in a group with the SLOTS event.
312 - The SLOTS event must be the leader of the group.
313 - The PERF_FORMAT_GROUP flag must be applied for each TopDown metric
319 For example, perf record -e '{slots, $sampling_event, topdown-retiring}:S'
325 The upper half is also divided into four 8-bit fields for the new level 2
326 metrics. Four more TopDown metric events are exposed for the end-users,
327 topdown-heavy-ops, topdown-br-mispredict, topdown-fetch-lat and
328 topdown-mem-bound.
334 Light_Operations = Retiring - Heavy_Operations
335 Machine_Clears = Bad_Speculation - Branch_Mispredicts
336 Fetch_Bandwidth = Frontend_Bound - Fetch_Latency
337 Core_Bound = Backend_Bound - Memory_Bound
340 [1] https://software.intel.com/en-us/top-down-microarchitecture-analysis-method-win
341 [2] https://github.com/andikleen/pmu-tools/wiki/toplev-manual
342 [3] https://software.intel.com/en-us/intel-vtune-amplifier-xe
343 [4] https://github.com/andikleen/pmu-tools/tree/master/jevents
344 [5] https://sites.google.com/site/analysismethods/yasin-pubs