Lines Matching +full:high +full:- +full:bandwidth

7-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into mi…
15 …he CPU was stalled due to Frontend latency issues. For example; instruction-cache misses; iTLB mi…
39 … corrected path; following all sorts of miss-predicted branches. For example; branchy code with lo…
52 …"MetricExpr": "(1 - (BR_MISP_RETIRED.ALL_BRANCHES / (BR_MISP_RETIRED.ALL_BRANCHES + MACHINE_CLEARS…
71-cache) is a Uop Cache where the front-end directly delivers Uops (micro operations) avoiding heav…
87 … Commonly used instructions are optimized for delivery by the DSB (decoded i-cache) or MITE (legac…
91 …": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues",
92 "MetricExpr": "tma_frontend_bound - tma_fetch_latency",
95bandwidth issues. For example; inefficiencies at the instruction decoders; or restrictions for ca…
100 "MetricExpr": "(IDQ.ALL_MITE_CYCLES_ANY_UOPS - IDQ.ALL_MITE_CYCLES_4_UOPS) / CORE_CLKS / 2",
103 …re-cached in the DSB or LSD. For example; inefficiencies due to asymmetric decoders; use of long i…
107 …"BriefDescription": "This metric represents fraction of cycles where decoder-0 was the only active…
108 …"MetricExpr": "(cpu@INST_DECODED.DECODERS\\,cmask\\=1@ - cpu@INST_DECODED.DECODERS\\,cmask\\=2@) /…
115 "MetricExpr": "(IDQ.ALL_DSB_CYCLES_ANY_UOPS - IDQ.ALL_DSB_CYCLES_4_UOPS) / CORE_CLKS / 2",
123 …"MetricExpr": "(UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * ((INT_MISC.RECOVERY_CYCLES_ANY /…
126 …s for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For…
134 …etched from an incorrectly speculated program path; or stalls when the out-of-order part of the ma…
139 "MetricExpr": "tma_bad_speculation - tma_branch_mispredicts",
142-of-order portion of the machine needs to recover its state after the clear. For example; this can…
147 …"MetricExpr": "1 - tma_frontend_bound - (UOPS_ISSUED.ANY + 4 * ((INT_MISC.RECOVERY_CYCLES_ANY / 2)…
150-of-order scheduler dispatches ready uops into their respective execution units; and once complete…
158 …o demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory d…
163 … "MetricExpr": "max((CYCLE_ACTIVITY.STALLS_MEM_ANY - CYCLE_ACTIVITY.STALLS_L1D_MISS) / CLKS, 0)",
166high latency even though it is being satisfied by the L1. Another example is loads who miss in the…
171 …mask\\=1@ + DTLB_LOAD_MISSES.WALK_ACTIVE, max(CYCLE_ACTIVITY.CYCLES_MEM_ANY - CYCLE_ACTIVITY.CYCLE…
174-aside Buffers) are processor caches for recently used entries out of the Page Tables that are use…
178 … the (first level) DTLB was missed by load accesses, that later on hit in second-level TLB (STLB)",
179 "MetricExpr": "tma_dtlb_load - tma_load_stlb_miss",
185 …"BriefDescription": "This metric estimates the fraction of cycles where the Second-level TLB (STLB…
196 …perations in the pipeline; a load can avoid waiting for memory if a prior in-flight store is writi…
201 …"MetricExpr": "(12 * max(0, MEM_INST_RETIRED.LOCK_LOADS - L2_RQSTS.ALL_RFO) + (MEM_INST_RETIRED.LO…
208 … estimates fraction of cycles handling memory load split accesses - load that cross 64-byte cache …
212 … estimates fraction of cycles handling memory load split accesses - load that cross 64-byte cache …
220 …d re-issue. However; the short re-issue duration is often hidden by the out-of-order core and HW o…
228 …isfied from (metric values >1 are valid). Often it hints on approaching bandwidth limits (to L2 ca…
233 …cpu@L1D_PEND_MISS.FB_FULL\\,cmask\\=1@)) * ((CYCLE_ACTIVITY.STALLS_L1D_MISS - CYCLE_ACTIVITY.STALL…
241 "MetricExpr": "(CYCLE_ACTIVITY.STALLS_L2_MISS - CYCLE_ACTIVITY.STALLS_L3_MISS) / CLKS",
256 …n of cycles while the memory subsystem was handling synchronizations due to data-sharing accesses",
257 … (MEM_LOAD_L3_HIT_RETIRED.XSNP_HIT + MEM_LOAD_L3_HIT_RETIRED.XSNP_HITM * (1 - (OCR.DEMAND_DATA_RD.…
260 … cycles while the memory subsystem was handling synchronizations due to data-sharing accesses. Dat…
272 …f cycles where the Super Queue (SQ) was full taking into account all request-types and both hardwa…
276 …f cycles where the Super Queue (SQ) was full taking into account all request-types and both hardwa…
281 …MISS / CLKS + ((CYCLE_ACTIVITY.STALLS_L1D_MISS - CYCLE_ACTIVITY.STALLS_L2_MISS) / CLKS) - tma_l2_b…
288 … cycles where the core's performance was likely hurt due to approaching bandwidth limits of extern…
292bandwidth limits of external memory (DRAM). The underlying heuristic assumes that a similar off-c…
297 …CLK_UNHALTED.THREAD, OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD) / CLKS - tma_mem_bandwidth",
316 …ystem was handling loads from remote memory. This is caused often due to non-optimal NUMA allocati…
324 …r sockets including synchronizations issues. This is caused often due to non-optimal NUMA allocati…
328 … on idle latencies) how often the CPU was stalled on accesses to external 3D-Xpoint (Crystal Ridge…
329- ((19 * (MEM_LOAD_L3_MISS_RETIRED.REMOTE_DRAM * (1 + (MEM_LOAD_RETIRED.FB_HIT / MEM_LOAD_RETIRED.…
332 … on idle latencies) how often the CPU was stalled on accesses to external 3D-Xpoint (Crystal Ridge…
336 … CPU was stalled due to RFO store memory accesses; RFO store issue a read-for-ownership request b…
340 …ses; RFO store issue a read-for-ownership request before the write. Even though store accesses do …
345 …tricExpr": "((L2_RQSTS.RFO_HIT * 11 * (1 - (MEM_INST_RETIRED.LOCK_LOADS / MEM_INST_RETIRED.ALL_STO…
348-of-order core performance; however; holding resources for longer time can lead into undesired imp…
356 …hreading hiccup; where multiple Logical Processors contend on different data-elements mapped into …
364 …resents rate of split store accesses. Consider aligning your data to the 64-byte cache line granu…
368 …: "This metric roughly estimates the fraction of cycles spent handling first-level data TLB store …
372-level data TLB store misses. As with ordinary data caching; focus on improving data locality and…
376 …tion of cycles where the TLB was missed by store accesses, hitting in the second-level TLB (STLB)",
377 "MetricExpr": "tma_dtlb_store - tma_store_stlb_miss",
390 …"BriefDescription": "This metric represents fraction of slots where Core non-memory issues were of…
391 "MetricExpr": "tma_backend_bound - tma_memory_bound",
394-memory issues were of a bottleneck. Shortage in hardware compute resources; or dependencies in s…
406 … the CPU performance was potentially limited due to Core computation issues (non divider-related)",
407 …PORTS_UTIL)) / CLKS if (ARITH.DIVIDER_ACTIVE < (CYCLE_ACTIVITY.STALLS_TOTAL - CYCLE_ACTIVITY.STALL…
410-related). Two distinct categories can be attributed into this metric: (1) heavy data-dependency …
415 …S_EXECUTED.CORE_CYCLES_NONE / 2 if #SMT_on else CYCLE_ACTIVITY.STALLS_TOTAL - CYCLE_ACTIVITY.STALL…
418 …t (Logical Processor cycles since ICL, Physical Core cycles otherwise). Long-latency instructions …
422 …"BriefDescription": "This metric represents fraction of cycles the CPU issue-pipeline was stalled …
426 …ycles the CPU issue-pipeline was stalled due to serializing operations. Instructions like CPUID; W…
447 …"MetricExpr": "((UOPS_EXECUTED.CORE_CYCLES_GE_1 - UOPS_EXECUTED.CORE_CYCLES_GE_2) / 2 if #SMT_on e…
450-dependency among software instructions; or over oversubscribing a particular hardware resource. I…
455 …"MetricExpr": "((UOPS_EXECUTED.CORE_CYCLES_GE_2 - UOPS_EXECUTED.CORE_CYCLES_GE_3) / 2 if #SMT_on e…
458 …cal Core cycles otherwise). Loop Vectorization -most compilers feature auto-Vectorization options…
505 …HED_PORT.PORT_2 + UOPS_DISPATCHED_PORT.PORT_3 + UOPS_DISPATCHED_PORT.PORT_7 - UOPS_DISPATCHED_PORT…
511 …ion of cycles CPU dispatched uops on execution port 2 ([SNB+]Loads and Store-address; [ICL+] Loads…
518 …ion of cycles CPU dispatched uops on execution port 3 ([SNB+]Loads and Store-address; [ICL+] Loads…
532 …sents Core fraction of cycles CPU dispatched uops on execution port 4 (Store-data) Sample with: UO…
539 …action of cycles CPU dispatched uops on execution port 7 ([HSW+]simple Store-address) Sample with:…
550 …ions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is …
554 …slots where the CPU was retiring light-weight operations -- instructions that require no more than…
555 "MetricExpr": "tma_retiring - tma_heavy_operations",
558-weight operations -- instructions that require no more than one uop (micro-operation). This corre…
562 …"BriefDescription": "This metric represents overall arithmetic floating-point (FP) operations frac…
566-point (FP) operations fraction the CPU has executed (retired). Note this metric's value may excee…
574 …FP arithmetic operations; hence may be used as a thermometer to avoid X87 high usage and preferabl…
578 …"BriefDescription": "This metric approximates arithmetic floating-point (FP) scalar uops fraction …
582 …"PublicDescription": "This metric approximates arithmetic floating-point (FP) scalar uops fraction…
586 …"BriefDescription": "This metric approximates arithmetic floating-point (FP) vector uops fraction …
590 …"PublicDescription": "This metric approximates arithmetic floating-point (FP) vector uops fraction…
594 …tric approximates arithmetic FP vector uops fraction the CPU has retired for 128-bit wide vectors",
598 … approximates arithmetic FP vector uops fraction the CPU has retired for 128-bit wide vectors. May…
602 …tric approximates arithmetic FP vector uops fraction the CPU has retired for 256-bit wide vectors",
606 … approximates arithmetic FP vector uops fraction the CPU has retired for 256-bit wide vectors. May…
610 …tric approximates arithmetic FP vector uops fraction the CPU has retired for 512-bit wide vectors",
614 … approximates arithmetic FP vector uops fraction the CPU has retired for 512-bit wide vectors. May…
618 … represents fraction of slots where the CPU was retiring memory operations -- uops for memory load…
625 …represents fraction of slots where the CPU was retiring fused instructions -- where one uop can re…
629 …represents fraction of slots where the CPU was retiring fused instructions -- where one uop can re…
634 …"MetricExpr": "tma_light_operations * (BR_INST_RETIRED.ALL_BRANCHES - UOPS_RETIRED.MACRO_FUSED) / …
637 …lots where the CPU was retiring branch instructions that were not fused. Non-conditional branches …
645 …o op) instructions. Compilers often use NOPs for certain address alignments - e.g. start address o…
649 …is metric represents the remaining light uops fraction the CPU has executed - remaining means not …
650 …"MetricExpr": "max(0, tma_light_operations - (tma_fp_arith + tma_memory_operations + tma_fused_ins…
656 …tric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructio…
657 … "MetricExpr": "(UOPS_RETIRED.RETIRE_SLOTS + UOPS_RETIRED.MACRO_FUSED - INST_RETIRED.ANY) / SLOTS",
660 …he CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro…
665 "MetricExpr": "tma_heavy_operations - tma_microcode_sequencer",
668 …t are decoder into two or up to ([SNB+] four; [ADL+] five) uops. This highly-correlates with the n…
684 …er-cases for operations that cannot be handled natively by the execution pipeline. For example; wh…
689 "MetricExpr": "max(0, tma_microcode_sequencer - tma_assists)",
692 … as in the case of read-modify-write as an example. Since these instructions require multiple uops…
702 … "BriefDescription": "Total pipeline cost of (external) Memory Bandwidth related bottlenecks",
708 … "Total pipeline cost of Memory Latency related bottlenecks (external memory and off-core caches)",
714 …ription": "Total pipeline cost of Memory Address Translation related bottlenecks (data-side TLBs)",
720 …Total pipeline cost of branch related instructions (used for program control-flow including functi…
721 …TIRED.NEAR_CALL + (BR_INST_RETIRED.NEAR_TAKEN - (BR_INST_RETIRED.CONDITIONAL - BR_INST_RETIRED.NOT…
726 …of instruction fetch related bottlenecks by large code footprint programs (i-side cache; TLB and B…
732 … "BriefDescription": "Total pipeline cost of instruction fetch bandwidth related bottlenecks",
733- tma_fetch_latency * tma_mispredicts_resteers / (tma_branch_resteers + tma_dsb_switches + tma_ica…
762 … "BriefDescription": "Per-Logical Processor actual clocks when the Logical Processor is active.",
768 …"BriefDescription": "Total issue-pipeline slots (per-Physical Core till ICL; per-Logical Processor…
774 "BriefDescription": "The ratio of Executed- by Issued-Uops",
778 …iption": "The ratio of Executed- by Issued-Uops. Ratio > 1 suggests high rate of uop micro-fusions…
781 "BriefDescription": "Instructions Per Cycle across hyper-threads (per physical core)",
793 …BriefDescription": "Actual per-core usage of the Floating Point non-X87 execution units (regardles…
797-core usage of the Floating Point non-X87 execution units (regardless of precision or vector-width…
800 …"BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is …
806 … "BriefDescription": "Probability of Core Bound bottleneck hidden by SMT-profiling artifacts",
807 …"MetricExpr": "(1 - tma_core_bound / tma_ports_utilization if tma_core_bound < tma_ports_utilizati…
867 …"BriefDescription": "Instructions per FP Arithmetic Scalar Single-Precision instruction (lower num…
871 …"PublicDescription": "Instructions per FP Arithmetic Scalar Single-Precision instruction (lower nu…
874 …"BriefDescription": "Instructions per FP Arithmetic Scalar Double-Precision instruction (lower num…
878 …"PublicDescription": "Instructions per FP Arithmetic Scalar Double-Precision instruction (lower nu…
881 …"BriefDescription": "Instructions per FP Arithmetic AVX/SSE 128-bit instruction (lower number mean…
885 …"PublicDescription": "Instructions per FP Arithmetic AVX/SSE 128-bit instruction (lower number mea…
888 …"BriefDescription": "Instructions per FP Arithmetic AVX* 256-bit instruction (lower number means h…
892 …"PublicDescription": "Instructions per FP Arithmetic AVX* 256-bit instruction (lower number means …
895 …"BriefDescription": "Instructions per FP Arithmetic AVX 512-bit instruction (lower number means hi…
899 …"PublicDescription": "Instructions per FP Arithmetic AVX 512-bit instruction (lower number means h…
926 "BriefDescription": "Average number of Uops issued by front-end when it issued something",
938 …tion": "Average number of cycles of a switch from the DSB fetch-unit to MITE fetch unit - see DSB_…
944 …"BriefDescription": "Total penalty related to DSB (uop cache) misses - subset of the Instruction_F…
950 …"BriefDescription": "Number of Instructions per non-speculative DSB miss (lower number means highe…
956 …"BriefDescription": "Number of Instructions per non-speculative Branch Misprediction (JEClear) (lo…
962 …"BriefDescription": "Branch Misprediction Cost: Fraction of TMA slots wasted per non-speculative b…
968 "BriefDescription": "Fraction of branches that are non-taken conditionals",
975 …"MetricExpr": "(BR_INST_RETIRED.CONDITIONAL - BR_INST_RETIRED.NOT_TAKEN) / BR_INST_RETIRED.ALL_BRA…
987 …"MetricExpr": "(BR_INST_RETIRED.NEAR_TAKEN - (BR_INST_RETIRED.CONDITIONAL - BR_INST_RETIRED.NOT_TA…
992 …"BriefDescription": "Actual Average Latency for L1 data-cache miss demand load operations (in core…
998 …BriefDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is …
1035 "MetricExpr": "1000 * (L2_RQSTS.REFERENCES - L2_RQSTS.MISS) / INST_RETIRED.ANY",
1052 … instructions for retired demand loads (L1D misses that merge into ongoing miss-handling entries)",
1065 "BriefDescription": "Average per-core data fill bandwidth to the L1 data cache [GB / sec]",
1071 "BriefDescription": "Average per-core data fill bandwidth to the L2 cache [GB / sec]",
1077 "BriefDescription": "Average per-core data fill bandwidth to the L3 cache [GB / sec]",
1083 "BriefDescription": "Average per-core data access bandwidth to the L3 cache [GB / sec]",
1101 … "BriefDescription": "Average per-thread data fill bandwidth to the L1 data cache [GB / sec]",
1107 "BriefDescription": "Average per-thread data fill bandwidth to the L2 cache [GB / sec]",
1113 "BriefDescription": "Average per-thread data fill bandwidth to the L3 cache [GB / sec]",
1119 "BriefDescription": "Average per-thread data access bandwidth to the L3 cache [GB / sec]",
1141 … supported options of: FP precisions, scalar and vector instructions, vector-width and AMX engine."
1150 …"BriefDescription": "Fraction of Core cycles where the core was running with power-delivery for ba…
1154 …s running with power-delivery for baseline license level 0. This includes non-AVX codes, SSE, AVX…
1157 …"BriefDescription": "Fraction of Core cycles where the core was running with power-delivery for li…
1161 … running with power-delivery for license level 1. This includes high current AVX 256-bit instruct…
1164 …"BriefDescription": "Fraction of Core cycles where the core was running with power-delivery for li…
1168 …e the core was running with power-delivery for license level 2 (introduced in SKX). This includes…
1172 …"MetricExpr": "1 - CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / (CPU_CLK_UNHALTED.REF_XCLK_ANY / 2) if #SM…
1189 "BriefDescription": "Average external Memory Bandwidth Use for reads and writes [GB / sec]",
1207 …cy of data read request to external 3D X-Point memory [in nanoseconds]. Accounts for demand loads …
1213 …o external DRAM memory [in nanoseconds]. Accounts for demand loads and L1/L2 data-read prefetches",
1219 "BriefDescription": "Average 3DXP Memory Bandwidth Use for reads [GB / sec]",
1225 "BriefDescription": "Average 3DXP Memory Bandwidth Use for Writes [GB / sec]",
1231 "BriefDescription": "Average IO (network or disk) Bandwidth Use for Writes [GB / sec]",
1237 "BriefDescription": "Average IO (network or disk) Bandwidth Use for Reads [GB / sec]",
1256 "MetricExpr": "(cstate_core@c3\\-residency@ / msr@tsc@) * 100",
1262 "MetricExpr": "(cstate_core@c6\\-residency@ / msr@tsc@) * 100",
1268 "MetricExpr": "(cstate_core@c7\\-residency@ / msr@tsc@) * 100",
1274 "MetricExpr": "(cstate_pkg@c2\\-residency@ / msr@tsc@) * 100",
1280 "MetricExpr": "(cstate_pkg@c3\\-residency@ / msr@tsc@) * 100",
1286 "MetricExpr": "(cstate_pkg@c6\\-residency@ / msr@tsc@) * 100",
1292 "MetricExpr": "(cstate_pkg@c7\\-residency@ / msr@tsc@) * 100",
1464 … "BriefDescription": "Intel(R) Ultra Path Interconnect (UPI) data transmit bandwidth (MB/sec)",
1471 "BriefDescription": "DDR memory read bandwidth (MB/sec)",
1478 "BriefDescription": "DDR memory write bandwidth (MB/sec)",
1485 "BriefDescription": "DDR memory bandwidth (MB/sec)",
1492 … "BriefDescription": "Intel(R) Optane(TM) Persistent Memory(PMEM) memory read bandwidth (MB/sec)",
1499 … "BriefDescription": "Intel(R) Optane(TM) Persistent Memory(PMEM) memory write bandwidth (MB/sec)",
1506 "BriefDescription": "Intel(R) Optane(TM) Persistent Memory(PMEM) memory bandwidth (MB/sec)",
1513 …"BriefDescription": "Bandwidth of IO reads that are initiated by end device controllers that are r…
1520 …"BriefDescription": "Bandwidth of IO writes that are initiated by end device controllers that are …
1534 …"BriefDescription": "Uops delivered from legacy decode pipeline (Micro-instruction Translation Eng…
1548 …"BriefDescription": "Bandwidth (MB/sec) of read requests that miss the last level cache (LLC) and …
1555 …"BriefDescription": "Bandwidth (MB/sec) of write requests that miss the last level cache (LLC) and…
1562 …"BriefDescription": "Bandwidth (MB/sec) of read requests that miss the last level cache (LLC) and …
1569 …ly does well sustaining Uop supply. However; in some rare cases; optimal uop-delivery could not be…
1570 …"MetricExpr": "100 * ( ( LSD.CYCLES_ACTIVE - LSD.CYCLES_4_UOPS ) / ( ( CPU_CLK_UNHALTED.THREAD_ANY…