Lines Matching +full:power +full:- +full:delivery
4 …"MetricExpr": "topdown\\-fe\\-bound / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-ret…
7 …-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into mi…
12 …"MetricExpr": "(5 * IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE - INT_MISC.UOP_DROPPING) / SLO…
15 …he CPU was stalled due to Frontend latency issues. For example; instruction-cache misses; iTLB mi…
39 … corrected path; following all sorts of miss-predicted branches. For example; branchy code with lo…
52 …"MetricExpr": "(1 - (BR_MISP_RETIRED.ALL_BRANCHES / (BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS…
71 …o switches from DSB to MITE pipelines. The DSB (decoded i-cache) is a Uop Cache where the front-en…
83 … the fraction of cycles when the CPU was stalled due to switches of uop delivery to the Microcode …
87 … switches of uop delivery to the Microcode Sequencer (MS). Commonly used instructions are optimize…
92 "MetricExpr": "max(0, tma_frontend_bound - tma_fetch_latency)",
100 "MetricExpr": "(IDQ.MITE_CYCLES_ANY - IDQ.MITE_CYCLES_OK) / CORE_CLKS / 2",
103 …the legacy decode pipeline). This pipeline is used for code that was not pre-cached in the DSB or …
107 …"BriefDescription": "This metric represents fraction of cycles where decoder-0 was the only active…
108 …"MetricExpr": "(cpu@INST_DECODED.DECODERS\\,cmask\\=1@ - cpu@INST_DECODED.DECODERS\\,cmask\\=2@) /…
115 "MetricExpr": "(cpu@IDQ.MITE_UOPS\\,cmask\\=4@ - cpu@IDQ.MITE_UOPS\\,cmask\\=5@) / CLKS",
122 "MetricExpr": "(IDQ.DSB_CYCLES_ANY - IDQ.DSB_CYCLES_OK) / CORE_CLKS / 2",
130 "MetricExpr": "(LSD.CYCLES_ACTIVE - LSD.CYCLES_OK) / CORE_CLKS / 2",
133 …oes well sustaining Uop supply. However; in some rare cases; optimal uop-delivery could not be rea…
138 "MetricExpr": "max(1 - (tma_frontend_bound + tma_backend_bound + tma_retiring), 0)",
141 …s for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For…
149 …etched from an incorrectly speculated program path; or stalls when the out-of-order part of the ma…
154 "MetricExpr": "max(0, tma_bad_speculation - tma_branch_mispredicts)",
157 …-of-order portion of the machine needs to recover its state after the clear. For example; this can…
162 …"MetricExpr": "topdown\\-be\\-bound / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-ret…
165 …-of-order scheduler dispatches ready uops into their respective execution units; and once complete…
173 …o demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory d…
178 … "MetricExpr": "max((CYCLE_ACTIVITY.STALLS_MEM_ANY - CYCLE_ACTIVITY.STALLS_L1D_MISS) / CLKS, 0)",
181 … TLB. These cases are characterized by execution unit stalls; while some non-completed demand load…
186 …mask\\=1@ + DTLB_LOAD_MISSES.WALK_ACTIVE, max(CYCLE_ACTIVITY.CYCLES_MEM_ANY - CYCLE_ACTIVITY.CYCLE…
189 …-aside Buffers) are processor caches for recently used entries out of the Page Tables that are use…
193 … the (first level) DTLB was missed by load accesses, that later on hit in second-level TLB (STLB)",
194 "MetricExpr": "tma_dtlb_load - tma_load_stlb_miss",
200 …"BriefDescription": "This metric estimates the fraction of cycles where the Second-level TLB (STLB…
211 …perations in the pipeline; a load can avoid waiting for memory if a prior in-flight store is writi…
216 …"MetricExpr": "(16 * max(0, MEM_INST_RETIRED.LOCK_LOADS - L2_RQSTS.ALL_RFO) + (MEM_INST_RETIRED.LO…
223 … estimates fraction of cycles handling memory load split accesses - load that cross 64-byte cache …
227 … estimates fraction of cycles handling memory load split accesses - load that cross 64-byte cache …
235 …sible; which incur a few cycles load re-issue. However; the short re-issue duration is often hidde…
248 …ISS))) + L1D_PEND_MISS.FB_FULL_PERIODS)) * ((CYCLE_ACTIVITY.STALLS_L1D_MISS - CYCLE_ACTIVITY.STALL…
256 "MetricExpr": "(CYCLE_ACTIVITY.STALLS_L2_MISS - CYCLE_ACTIVITY.STALLS_L3_MISS) / CLKS",
271 …n of cycles while the memory subsystem was handling synchronizations due to data-sharing accesses",
275 … cycles while the memory subsystem was handling synchronizations due to data-sharing accesses. Dat…
287 …f cycles where the Super Queue (SQ) was full taking into account all request-types and both hardwa…
291 …f cycles where the Super Queue (SQ) was full taking into account all request-types and both hardwa…
296 …ALLS_L3_MISS / CLKS + ((CYCLE_ACTIVITY.STALLS_L1D_MISS - CYCLE_ACTIVITY.STALLS_L2_MISS) / CLKS) - …
307 …-core traffic is generated by all IA cores. This metric does not aggregate non-data-read requests …
312 …CLK_UNHALTED.THREAD, OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD) / CLKS - tma_mem_bandwidth",
319 … CPU was stalled due to RFO store memory accesses; RFO store issue a read-for-ownership request b…
323 …ses; RFO store issue a read-for-ownership request before the write. Even though store accesses do …
328 …tricExpr": "((L2_RQSTS.RFO_HIT * 10 * (1 - (MEM_INST_RETIRED.LOCK_LOADS / MEM_INST_RETIRED.ALL_STO…
331 …-of-order core performance; however; holding resources for longer time can lead into undesired imp…
339 …hreading hiccup; where multiple Logical Processors contend on different data-elements mapped into …
347 …resents rate of split store accesses. Consider aligning your data to the 64-byte cache line granu…
355 …uired by RFO stores. Even though store accesses do not typically stall out-of-order CPUs; there ar…
359 …: "This metric roughly estimates the fraction of cycles spent handling first-level data TLB store …
363 …-level data TLB store misses. As with ordinary data caching; focus on improving data locality and…
367 …tion of cycles where the TLB was missed by store accesses, hitting in the second-level TLB (STLB)",
368 "MetricExpr": "tma_dtlb_store - tma_store_stlb_miss",
381 …"BriefDescription": "This metric represents fraction of slots where Core non-memory issues were of…
382 "MetricExpr": "max(0, tma_backend_bound - tma_memory_bound)",
385 …-memory issues were of a bottleneck. Shortage in hardware compute resources; or dependencies in s…
397 … the CPU performance was potentially limited due to Core computation issues (non divider-related)",
398 …PORTS_UTIL)) / CLKS if (ARITH.DIVIDER_ACTIVE < (CYCLE_ACTIVITY.STALLS_TOTAL - CYCLE_ACTIVITY.STALL…
401 …-related). Two distinct categories can be attributed into this metric: (1) heavy data-dependency …
406 …k\\=0x80@ / CLKS + tma_serializing_operation * (CYCLE_ACTIVITY.STALLS_TOTAL - CYCLE_ACTIVITY.STALL…
409 …t (Logical Processor cycles since ICL, Physical Core cycles otherwise). Long-latency instructions …
413 …"BriefDescription": "This metric represents fraction of cycles the CPU issue-pipeline was stalled …
417 …ycles the CPU issue-pipeline was stalled due to serializing operations. Instructions like CPUID; W…
441 …-dependency among software instructions; or over oversubscribing a particular hardware resource. I…
449 …cal Core cycles otherwise). Loop Vectorization -most compilers feature auto-Vectorization options…
511 …"MetricExpr": "topdown\\-retiring / (topdown\\-fe\\-bound + topdown\\-bad\\-spec + topdown\\-retir…
514 …ions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is …
518 …slots where the CPU was retiring light-weight operations -- instructions that require no more than…
519 "MetricExpr": "max(0, tma_retiring - tma_heavy_operations)",
522 …-weight operations -- instructions that require no more than one uop (micro-operation). This corre…
526 …"BriefDescription": "This metric represents overall arithmetic floating-point (FP) operations frac…
530 …-point (FP) operations fraction the CPU has executed (retired). Note this metric's value may excee…
542 …"BriefDescription": "This metric approximates arithmetic floating-point (FP) scalar uops fraction …
546 …"PublicDescription": "This metric approximates arithmetic floating-point (FP) scalar uops fraction…
550 …"BriefDescription": "This metric approximates arithmetic floating-point (FP) vector uops fraction …
554 …"PublicDescription": "This metric approximates arithmetic floating-point (FP) vector uops fraction…
558 …tric approximates arithmetic FP vector uops fraction the CPU has retired for 128-bit wide vectors",
562 … approximates arithmetic FP vector uops fraction the CPU has retired for 128-bit wide vectors. May…
566 …tric approximates arithmetic FP vector uops fraction the CPU has retired for 256-bit wide vectors",
570 … approximates arithmetic FP vector uops fraction the CPU has retired for 256-bit wide vectors. May…
574 …tric approximates arithmetic FP vector uops fraction the CPU has retired for 512-bit wide vectors",
578 … approximates arithmetic FP vector uops fraction the CPU has retired for 512-bit wide vectors. May…
582 … represents fraction of slots where the CPU was retiring memory operations -- uops for memory load…
600 …o op) instructions. Compilers often use NOPs for certain address alignments - e.g. start address o…
604 …is metric represents the remaining light uops fraction the CPU has executed - remaining means not …
605 …"MetricExpr": "max(0, tma_light_operations - (tma_fp_arith + tma_memory_operations + tma_branch_in…
611 …tric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructio…
612 …"MetricExpr": "tma_microcode_sequencer + tma_retiring * (UOPS_DECODED.DEC0 - cpu@UOPS_DECODED.DEC0…
615 …he CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro…
620 "MetricExpr": "tma_heavy_operations - tma_microcode_sequencer",
623 …t are decoder into two or up to ([SNB+] four; [ADL+] five) uops. This highly-correlates with the n…
639 …er-cases for operations that cannot be handled natively by the execution pipeline. For example; wh…
644 "MetricExpr": "max(0, tma_microcode_sequencer - tma_assists)",
647 … as in the case of read-modify-write as an example. Since these instructions require multiple uops…
663 … "Total pipeline cost of Memory Latency related bottlenecks (external memory and off-core caches)",
669 …ription": "Total pipeline cost of Memory Address Translation related bottlenecks (data-side TLBs)",
675 …Total pipeline cost of branch related instructions (used for program control-flow including functi…
676 … 3 * BR_INST_RETIRED.NEAR_CALL + (BR_INST_RETIRED.NEAR_TAKEN - BR_INST_RETIRED.COND_TAKEN - 2 * BR…
681 …of instruction fetch related bottlenecks by large code footprint programs (i-side cache; TLB and B…
688 …- tma_fetch_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_ica…
717 … "BriefDescription": "Per-Logical Processor actual clocks when the Logical Processor is active.",
723 …"BriefDescription": "Total issue-pipeline slots (per-Physical Core till ICL; per-Logical Processor…
729 … "BriefDescription": "Fraction of Physical Core issue-slots utilized by this Logical Processor",
735 "BriefDescription": "The ratio of Executed- by Issued-Uops",
739 …"PublicDescription": "The ratio of Executed- by Issued-Uops. Ratio > 1 suggests high rate of uop m…
742 "BriefDescription": "Instructions Per Cycle across hyper-threads (per physical core)",
754 …BriefDescription": "Actual per-core usage of the Floating Point non-X87 execution units (regardles…
758 …-core usage of the Floating Point non-X87 execution units (regardless of precision or vector-width…
761 …"BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is …
767 … "BriefDescription": "Probability of Core Bound bottleneck hidden by SMT-profiling artifacts",
768 …"MetricExpr": "(1 - tma_core_bound / tma_ports_utilization if tma_core_bound < tma_ports_utilizati…
828 …"BriefDescription": "Instructions per FP Arithmetic Scalar Single-Precision instruction (lower num…
832 …"PublicDescription": "Instructions per FP Arithmetic Scalar Single-Precision instruction (lower nu…
835 …"BriefDescription": "Instructions per FP Arithmetic Scalar Double-Precision instruction (lower num…
839 …"PublicDescription": "Instructions per FP Arithmetic Scalar Double-Precision instruction (lower nu…
842 …"BriefDescription": "Instructions per FP Arithmetic AVX/SSE 128-bit instruction (lower number mean…
846 …"PublicDescription": "Instructions per FP Arithmetic AVX/SSE 128-bit instruction (lower number mea…
849 …"BriefDescription": "Instructions per FP Arithmetic AVX* 256-bit instruction (lower number means h…
853 …"PublicDescription": "Instructions per FP Arithmetic AVX* 256-bit instruction (lower number means …
856 …"BriefDescription": "Instructions per FP Arithmetic AVX 512-bit instruction (lower number means hi…
860 …"PublicDescription": "Instructions per FP Arithmetic AVX 512-bit instruction (lower number means h…
887 "BriefDescription": "Average number of Uops issued by front-end when it issued something",
905 …tion": "Average number of cycles of a switch from the DSB fetch-unit to MITE fetch unit - see DSB_…
911 …"BriefDescription": "Total penalty related to DSB (uop cache) misses - subset of the Instruction_F…
917 …"BriefDescription": "Number of Instructions per non-speculative DSB miss (lower number means highe…
923 …"BriefDescription": "Number of Instructions per non-speculative Branch Misprediction (JEClear) (lo…
929 …"BriefDescription": "Branch Misprediction Cost: Fraction of TMA slots wasted per non-speculative b…
935 "BriefDescription": "Fraction of branches that are non-taken conditionals",
954 …"MetricExpr": "(BR_INST_RETIRED.NEAR_TAKEN - BR_INST_RETIRED.COND_TAKEN - 2 * BR_INST_RETIRED.NEAR…
960 "MetricExpr": "1 - (Cond_NT + Cond_TK + CallRet + Jump)",
965 …"BriefDescription": "Actual Average Latency for L1 data-cache miss demand load operations (in core…
971 …BriefDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is …
996 …"MetricExpr": "1000 * ((OFFCORE_REQUESTS.ALL_DATA_RD - OFFCORE_REQUESTS.DEMAND_DATA_RD) + L2_RQSTS…
1019 … instructions for retired demand loads (L1D misses that merge into ongoing miss-handling entries)",
1032 "BriefDescription": "Average per-core data fill bandwidth to the L1 data cache [GB / sec]",
1038 "BriefDescription": "Average per-core data fill bandwidth to the L2 cache [GB / sec]",
1044 "BriefDescription": "Average per-core data fill bandwidth to the L3 cache [GB / sec]",
1050 "BriefDescription": "Average per-core data access bandwidth to the L3 cache [GB / sec]",
1056 … "BriefDescription": "Average per-thread data fill bandwidth to the L1 data cache [GB / sec]",
1062 "BriefDescription": "Average per-thread data fill bandwidth to the L2 cache [GB / sec]",
1068 "BriefDescription": "Average per-thread data fill bandwidth to the L3 cache [GB / sec]",
1074 "BriefDescription": "Average per-thread data access bandwidth to the L3 cache [GB / sec]",
1088 "MetricGroup": "Power;Summary",
1096 … supported options of: FP precisions, scalar and vector instructions, vector-width and AMX engine."
1101 "MetricGroup": "Power",
1105 …"BriefDescription": "Fraction of Core cycles where the core was running with power-delivery for ba…
1107 "MetricGroup": "Power",
1109 … was running with power-delivery for baseline license level 0. This includes non-AVX codes, SSE, …
1112 …"BriefDescription": "Fraction of Core cycles where the core was running with power-delivery for li…
1114 "MetricGroup": "Power",
1116 …as running with power-delivery for license level 1. This includes high current AVX 256-bit instru…
1119 …"BriefDescription": "Fraction of Core cycles where the core was running with power-delivery for li…
1121 "MetricGroup": "Power",
1123 …ere the core was running with power-delivery for license level 2 (introduced in SKX). This includ…
1127 …"MetricExpr": "1 - CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / CPU_CLK_UNHALTED.REF_DISTRIBUTED if #SMT_o…
1157 "MetricExpr": "(cstate_core@c6\\-residency@ / msr@tsc@) * 100",
1158 "MetricGroup": "Power",
1163 "MetricExpr": "(cstate_core@c7\\-residency@ / msr@tsc@) * 100",
1164 "MetricGroup": "Power",
1169 "MetricExpr": "(cstate_pkg@c2\\-residency@ / msr@tsc@) * 100",
1170 "MetricGroup": "Power",
1175 "MetricExpr": "(cstate_pkg@c3\\-residency@ / msr@tsc@) * 100",
1176 "MetricGroup": "Power",
1181 "MetricExpr": "(cstate_pkg@c6\\-residency@ / msr@tsc@) * 100",
1182 "MetricGroup": "Power",
1187 "MetricExpr": "(cstate_pkg@c7\\-residency@ / msr@tsc@) * 100",
1188 "MetricGroup": "Power",
1193 "MetricExpr": "(cstate_pkg@c8\\-residency@ / msr@tsc@) * 100",
1194 "MetricGroup": "Power",
1199 "MetricExpr": "(cstate_pkg@c9\\-residency@ / msr@tsc@) * 100",
1200 "MetricGroup": "Power",
1205 "MetricExpr": "(cstate_pkg@c10\\-residency@ / msr@tsc@) * 100",
1206 "MetricGroup": "Power",