Lines Matching +full:high +full:- +full:bandwidth
7 …-lines are fetched from the memory subsystem; parsed into instructions; and lastly decoded into mi…
15 …he CPU was stalled due to Frontend latency issues. For example; instruction-cache misses; iTLB mi…
20 "MetricExpr": "ICACHE.IFETCH_STALL / CLKS - tma_itlb_misses",
38 … corrected path; following all sorts of miss-predicted branches. For example; branchy code with lo…
46 …-cache) is a Uop Cache where the front-end directly delivers Uops (micro operations) avoiding heav…
62 … Commonly used instructions are optimized for delivery by the DSB (decoded i-cache) or MITE (legac…
66 …": "This metric represents fraction of slots the CPU was stalled due to Frontend bandwidth issues",
67 "MetricExpr": "tma_frontend_bound - tma_fetch_latency",
70 …bandwidth issues. For example; inefficiencies at the instruction decoders; or restrictions for ca…
75 "MetricExpr": "(IDQ.ALL_MITE_CYCLES_ANY_UOPS - IDQ.ALL_MITE_CYCLES_4_UOPS) / CORE_CLKS / 2",
78 …re-cached in the DSB or LSD. For example; inefficiencies due to asymmetric decoders; use of long i…
83 "MetricExpr": "(IDQ.ALL_DSB_CYCLES_ANY_UOPS - IDQ.ALL_DSB_CYCLES_4_UOPS) / CORE_CLKS / 2",
91 …"MetricExpr": "(UOPS_ISSUED.ANY - UOPS_RETIRED.RETIRE_SLOTS + 4 * ((INT_MISC.RECOVERY_CYCLES_ANY /…
94 …s for which the issue-pipeline was blocked due to recovery from earlier incorrect speculation. For…
102 …etched from an incorrectly speculated program path; or stalls when the out-of-order part of the ma…
107 "MetricExpr": "tma_bad_speculation - tma_branch_mispredicts",
110 …-of-order portion of the machine needs to recover its state after the clear. For example; this can…
115 "MetricExpr": "1 - (tma_frontend_bound + tma_bad_speculation + tma_retiring)",
118 …-of-order scheduler dispatches ready uops into their respective execution units; and once complete…
123 …XECUTED.CYCLES_GE_1_UOP_EXEC - UOPS_EXECUTED.CYCLES_GE_3_UOPS_EXEC if (IPC > 1.8) else UOPS_EXECUT…
126 …o demand load or store instructions. This accounts mainly for (1) non-completed in-flight memory d…
131 …"MetricExpr": "max((min(CPU_CLK_UNHALTED.THREAD, CYCLE_ACTIVITY.STALLS_LDM_PENDING) - CYCLE_ACTIVI…
134 …high latency even though it is being satisfied by the L1. Another example is loads who miss in the…
142 …-aside Buffers) are processor caches for recently used entries out of the Page Tables that are use…
150 …perations in the pipeline; a load can avoid waiting for memory if a prior in-flight store is writi…
162 … estimates fraction of cycles handling memory load split accesses - load that cross 64-byte cache …
166 … estimates fraction of cycles handling memory load split accesses - load that cross 64-byte cache …
174 …d re-issue. However; the short re-issue duration is often hidden by the out-of-order core and HW o…
182 …isfied from (metric values >1 are valid). Often it hints on approaching bandwidth limits (to L2 ca…
187 … "MetricExpr": "(CYCLE_ACTIVITY.STALLS_L1D_PENDING - CYCLE_ACTIVITY.STALLS_L2_PENDING) / CLKS",
210 …n of cycles while the memory subsystem was handling synchronizations due to data-sharing accesses",
214 … cycles while the memory subsystem was handling synchronizations due to data-sharing accesses. Dat…
226 …f cycles where the Super Queue (SQ) was full taking into account all request-types and both hardwa…
230 …f cycles where the Super Queue (SQ) was full taking into account all request-types and both hardwa…
235 …"MetricExpr": "(1 - (MEM_LOAD_UOPS_RETIRED.LLC_HIT / (MEM_LOAD_UOPS_RETIRED.LLC_HIT + 7 * MEM_LOAD…
242 … cycles where the core's performance was likely hurt due to approaching bandwidth limits of extern…
246 …bandwidth limits of external memory (DRAM). The underlying heuristic assumes that a similar off-c…
251 …CLK_UNHALTED.THREAD, OFFCORE_REQUESTS_OUTSTANDING.CYCLES_WITH_DATA_RD) / CLKS - tma_mem_bandwidth",
270 …ystem was handling loads from remote memory. This is caused often due to non-optimal NUMA allocati…
278 …r sockets including synchronizations issues. This is caused often due to non-optimal NUMA allocati…
282 … CPU was stalled due to RFO store memory accesses; RFO store issue a read-for-ownership request b…
286 …ses; RFO store issue a read-for-ownership request before the write. Even though store accesses do …
291 …etricExpr": "((L2_RQSTS.RFO_HIT * 9 * (1 - (MEM_UOPS_RETIRED.LOCK_LOADS / MEM_UOPS_RETIRED.ALL_STO…
294 …-of-order core performance; however; holding resources for longer time can lead into undesired imp…
302 …hreading hiccup; where multiple Logical Processors contend on different data-elements mapped into …
310 …resents rate of split store accesses. Consider aligning your data to the 64-byte cache line granu…
314 …: "This metric roughly estimates the fraction of cycles spent handling first-level data TLB store …
318 …-level data TLB store misses. As with ordinary data caching; focus on improving data locality and…
322 …"BriefDescription": "This metric represents fraction of slots where Core non-memory issues were of…
323 "MetricExpr": "tma_backend_bound - tma_memory_bound",
326 …-memory issues were of a bottleneck. Shortage in hardware compute resources; or dependencies in s…
338 … the CPU performance was potentially limited due to Core computation issues (non divider-related)",
339 …- UOPS_EXECUTED.CYCLES_GE_3_UOPS_EXEC if (IPC > 1.8) else UOPS_EXECUTED.CYCLES_GE_2_UOPS_EXEC - RS…
342 …-related). Two distinct categories can be attributed into this metric: (1) heavy data-dependency …
347 …SMT_on else (min(CPU_CLK_UNHALTED.THREAD, CYCLE_ACTIVITY.CYCLES_NO_EXECUTE) - RS_EVENTS.EMPTY_CYCL…
350 …t (Logical Processor cycles since ICL, Physical Core cycles otherwise). Long-latency instructions …
355 …_EXECUTED.CORE\\,cmask\\=1@ - cpu@UOPS_EXECUTED.CORE\\,cmask\\=2@) / 2 if #SMT_on else (UOPS_EXECU…
358 …-dependency among software instructions; or over oversubscribing a particular hardware resource. I…
363 …_EXECUTED.CORE\\,cmask\\=2@ - cpu@UOPS_EXECUTED.CORE\\,cmask\\=3@) / 2 if #SMT_on else (UOPS_EXECU…
366 …cal Core cycles otherwise). Loop Vectorization -most compilers feature auto-Vectorization options…
406 …"MetricExpr": "(UOPS_DISPATCHED_PORT.PORT_2 + UOPS_DISPATCHED_PORT.PORT_3 - UOPS_DISPATCHED_PORT.P…
412 …ion of cycles CPU dispatched uops on execution port 2 ([SNB+]Loads and Store-address; [ICL+] Loads…
419 …ion of cycles CPU dispatched uops on execution port 3 ([SNB+]Loads and Store-address; [ICL+] Loads…
433 …sents Core fraction of cycles CPU dispatched uops on execution port 4 (Store-data) Sample with: UO…
444 …ions-per-cycle (see IPC metric). Note that a high Retiring value does not necessary mean there is …
448 …slots where the CPU was retiring light-weight operations -- instructions that require no more than…
449 "MetricExpr": "tma_retiring - tma_heavy_operations",
452 …-weight operations -- instructions that require no more than one uop (micro-operation). This corre…
456 …"BriefDescription": "This metric represents overall arithmetic floating-point (FP) operations frac…
460 …-point (FP) operations fraction the CPU has executed (retired). Note this metric's value may excee…
468 …FP arithmetic operations; hence may be used as a thermometer to avoid X87 high usage and preferabl…
472 …"BriefDescription": "This metric approximates arithmetic floating-point (FP) scalar uops fraction …
476 …"PublicDescription": "This metric approximates arithmetic floating-point (FP) scalar uops fraction…
480 …"BriefDescription": "This metric approximates arithmetic floating-point (FP) vector uops fraction …
484 …"PublicDescription": "This metric approximates arithmetic floating-point (FP) vector uops fraction…
488 …tric represents fraction of slots where the CPU was retiring heavy-weight operations -- instructio…
492 …he CPU was retiring heavy-weight operations -- instructions that require two or more uops or micro…
508 …er-cases for operations that cannot be handled natively by the execution pipeline. For example; wh…
513 "MetricExpr": "max(0, tma_microcode_sequencer - tma_assists)",
516 … as in the case of read-modify-write as an example. Since these instructions require multiple uops…
544 … "BriefDescription": "Per-Logical Processor actual clocks when the Logical Processor is active.",
550 …"BriefDescription": "Total issue-pipeline slots (per-Physical Core till ICL; per-Logical Processor…
556 "BriefDescription": "The ratio of Executed- by Issued-Uops",
560 …iption": "The ratio of Executed- by Issued-Uops. Ratio > 1 suggests high rate of uop micro-fusions…
563 "BriefDescription": "Instructions Per Cycle across hyper-threads (per physical core)",
575 …"BriefDescription": "Instruction-Level-Parallelism (average number of uops executed when there is …
654 …"BriefDescription": "Number of Instructions per non-speculative Branch Misprediction (JEClear) (lo…
660 …"BriefDescription": "Actual Average Latency for L1 data-cache miss demand load operations (in core…
666 …BriefDescription": "Memory-Level-Parallelism (average number of L1 miss demand load when there is …
697 "BriefDescription": "Average per-core data fill bandwidth to the L1 data cache [GB / sec]",
703 "BriefDescription": "Average per-core data fill bandwidth to the L2 cache [GB / sec]",
709 "BriefDescription": "Average per-core data fill bandwidth to the L3 cache [GB / sec]",
715 … "BriefDescription": "Average per-thread data fill bandwidth to the L1 data cache [GB / sec]",
721 "BriefDescription": "Average per-thread data fill bandwidth to the L2 cache [GB / sec]",
727 "BriefDescription": "Average per-thread data fill bandwidth to the L3 cache [GB / sec]",
733 "BriefDescription": "Average per-thread data access bandwidth to the L3 cache [GB / sec]",
755 … supported options of: FP precisions, scalar and vector instructions, vector-width and AMX engine."
765 …"MetricExpr": "1 - CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE / (CPU_CLK_UNHALTED.REF_XCLK_ANY / 2) if #SM…
782 "BriefDescription": "Average external Memory Bandwidth Use for reads and writes [GB / sec]",
801 "MetricExpr": "(cstate_core@c3\\-residency@ / msr@tsc@) * 100",
807 "MetricExpr": "(cstate_core@c6\\-residency@ / msr@tsc@) * 100",
813 "MetricExpr": "(cstate_core@c7\\-residency@ / msr@tsc@) * 100",
819 "MetricExpr": "(cstate_pkg@c2\\-residency@ / msr@tsc@) * 100",
825 "MetricExpr": "(cstate_pkg@c3\\-residency@ / msr@tsc@) * 100",
831 "MetricExpr": "(cstate_pkg@c6\\-residency@ / msr@tsc@) * 100",
837 "MetricExpr": "(cstate_pkg@c7\\-residency@ / msr@tsc@) * 100",