KOJAK Patterns
General Patterns
- Keywords:
- CPU allocation time
- Unit:
- Seconds
- Description:
- Time spent on program execution including the idle
times of CPUs reserved for slave threads during OpenMP sequential
execution. Total assumes that every thread of a process allocated a
separate CPU during the entire runtime of the process.
- Parent:
- None
- Children:
- Execution, Idle Threads
- Keywords:
- Execution time
- Unit:
- Seconds
- Description:
- Time spent on program execution but without the
idle times of slave threads during OpenMP sequential execution. Note
that for pure MPI applications, this pattern is equal to Time.
- Parent:
- Time
- Children:
- MPI,
OpenMP,
SHMEM
- Keywords:
- MPI
- Unit:
- Seconds
- Description:
- This pattern refers to the time spent in MPI
calls.
- Parent:
- Execution
- Children:
- Communication,
IO (MPI),
Synchronization (MPI)
- Keywords:
- MPI, communication
- Unit:
- Seconds
- Description:
- This pattern refers to the time spent in MPI
communication calls.
- Parent:
- MPI
- Children:
-
Collective (MPI),
Point-to-Point (MPI)
RMA Communication (MPI-2)
- Keywords:
- MPI, collective communication
- Unit:
- Seconds
- Description:
- Time spent on MPI collective communication.
- Parent:
- Communication (MPI)
- Children:
- Early Reduce,
Late Broadcast (MPI),
Wait at N x N (MPI)
- Keywords:
- MPI, n-to-1 communication
- Unit:
- Seconds
- Description:
- Collective communication operations that send
data from all processes to one destination process (i.e., n-to-1) may
suffer from waiting times if the destination process enters the
operation earlier than its sending counterparts, that is, before any
data could have been sent. The pattern refers to the time lost as a
result of this situation. It applies to MPI calls MPI_Reduce(),
MPI_Gather() and MPI_Gatherv().
- Parent:
- Collective (MPI)
- Children:
- None
- Keywords:
- MPI, 1-to-n communication
- Unit:
- Seconds
- Description:
- Collective communication operations that send data
from one source process to all processes (i.e., 1-to-n) may suffer
from waiting times if destination processes enter the operation
earlier than the source process, that is, before any data could have
been sent. The pattern refers to the time lost as a result of this
situation. It applies to MPI calls MPI_Bcast(), MPI_Scatter() and
MPI_Scatterv().
- Parent:
- Collective (MPI)
- Children:
- None
- Keywords:
- MPI, n-to-n communication
- Unit:
- Seconds
- Description:
- Collective communication operations that send data
from all processes to all processes (i.e., n-to-n) exhibit an inherent
synchronization among all participants, that is, no process can finish
the operation until the last process has started it. This pattern
covers the time spent in n-to-n operations until all processes have
reached it. It applies to MPI calls MPI_Reduce_scatter(), MPI_Allgather(),
MPI_Allgatherv(), MPI_Allreduce(), MPI_Alltoall(), MPI_Alltoallv().
- Parent:
- Collective (MPI)
- Children:
- None
- Keywords:
- MPI, point-to-point communication
- Unit:
- Seconds
- Description:
- This pattern refers to the time spent in MPI
point-to-point communication calls.
- Parent:
- Communication (MPI)
- Children:
- Late Receiver, Late Sender
- Keywords:
- MPI, delayed sender
- Unit:
- Seconds
- Description:
- A send operation is blocked until the
corresponding receive operation is called. This can happen for several
reasons. Either the MPI implementation is working in synchronous mode
by default or the size of the message to be sent exceeds the available
MPI-internal buffer space and the operation is blocked until the data
is transferred to the receiver. The pattern refers to the time spend
waiting as a result of this situation.
- Parent:
- Point-to-Point
- Children:
- Messages in Wrong Order
(Late Receiver)
- Keywords:
- MPI, sending order of messages
- Unit:
- Seconds
- Description:
- A Late Receiver
situation may be the result of messages that are sent in the wrong
order. If a process sends messages to processes that are not ready to
receive them, the sender's MPI-internal buffer may overflow so that
from then on the process needs to send in synchronous mode causing a
Late Receiver situation. This pattern refers to the time spent in a
wait state as a result of this situation.
- Parent:
- Late Receiver
- Children:
- None
- Keywords:
- MPI, delayed receiver
- Unit:
- Seconds
- Description:
- The time lost waiting caused by a blocking
receive operation (e.g, MPI_Recv or MPI_Wait) that is posted earlier
than the corresponding send operation.
- Parent:
- Point-to-Point
- Children:
- Messages in Wrong Order
(Late Sender)
- Keywords:
- MPI, acceptance order of messages
- Unit:
- Seconds
- Description:
- A Late Sender
situation may be the result of messages that are received in the wrong
order. If a process expects messages from one or more processes in a
certain order, although these processes are sending them in a
different order, the receiver may need to wait for a message if it
tries to receive a message early that has been sent late. The
situation can be avoided by receiving messages in the order in which
they are sent instead. This pattern refers to the time spent in a wait
state as a result of this situation.
- Parent:
- Late Sender
- Children:
- None
- Keywords:
- MPI-2, RMA, Remote Memory Access, 1-Sided Communication
- Unit:
- Seconds
- Description:
- This pattern refers to the time spent in MPI
RMA communication calls. RMA communication calls are MPI_Get(), MPI_Put() and
MPI_Accumulate().
- Parent:
- Communication (MPI)
- Children:
- Early Transfer
- Keywords:
- MPI-2, RMA, Remote Memory Access, 1-sided communication
- Unit:
- Seconds
- Description:
-
The time lost waiting caused by a blocking RMA transfer operation ( e.g,
MPI_Get() or MPI_Put() ) that is posted earlier than the corresponding exposure
epoch begins.
- Parent:
- RMA Communication (MPI)
- Children:
- None
- Keywords:
- MPI, IO
- Unit:
- Seconds
- Description:
- This pattern refers to the time spent in MPI IO calls.
- Parent:
- MPI
- Children:
- None
- Keywords:
- MPI, barrier
- Unit:
- Seconds
- Description:
- This pattern refers to the time spent in MPI
barriers and RMA synchronisation calls.
- Parent:
- MPI
- Children:
-
Barrier (MPI),
RMA Synchronisation,
Init/Exit (MPI)
- Keywords:
- MPI, synchronization
- Unit:
- Seconds
- Description:
- This pattern refers to the time spent in MPI
barriers.
- Parent:
- Synchronization (MPI)
- Children:
- Barrier Completion (MPI),
Wait at Barrier (MPI)
- Keywords:
- MPI, synchronization
- Unit:
- Seconds
- Description:
- This pattern refers to the time spent in MPI
barriers after the first process has left the operation.
- Parent:
- Synchronization (MPI)
- Children:
- None
- Keywords:
- MPI, barrier
- Unit:
- Seconds
- Description:
- This pattern covers the time spent waiting in
front of an MPI barrier, which is the time inside the barrier call
until the last processes has reached the barrier. A large amount of
waiting time spent in front of barriers can be an indication of load
imbalance.
- Parent:
- Synchronization (MPI)
- Children:
- None
- Keywords:
- MPI-2, RMA, Synchronization, Remote Memory Access, 1-Sided Communication
- Unit:
- Seconds
- Description:
- This pattern refers to the time spent in MPI
RMA synchronization calls. RMA Synchronisation calls are MPI_Win_fence(),
MPI_Win_lock(), MPI_Win_unlock(), MPI_Win_post(), MPI_Win_wait(), MPI_Win_test(),
MPI_Win_start(), MPI_Win_complete() MPI_Win_create() and MPI_Win_free().
- Parent:
- Synchronization (MPI)
- Children:
- Window Management,
Fence,
General Active Target Synchronization,
Passive Target Synchronization (Locks)
- Keywords:
- MPI-2, RMA, Window, Remote Memory Access, 1-Sided Communication
- Unit:
- Seconds
- Description:
- This pattern refers to the time spent in collective
window construction/destruction calls: MPI_Win_Create() and MPI_Win_free().
- Parent:
- RMA Synchronization
- Children:
- Wait at Create,
Wait at Free
- Keywords:
- MPI-2, RMA, Window, Remote Memory Access, 1-Sided Communication
- Unit:
- Seconds
- Description:
- This pattern covers the time spent waiting in
front of an MPI_Win_create(), which is the time inside the collective window creation call
until the last processes has reached the MPI_Win_create(). A large amount of
waiting time spent in front of MPI_Win_create() can be an indication of load
imbalance.
- Parent:
- Window Management
- Children:
- None
- Keywords:
- MPI-2, RMA, Window, Remote Memory Access, 1-Sided Communication
- Unit:
- Seconds
- Description:
- This pattern covers the time spent waiting in
front of an MPI_Win_free(), which is the time inside the collective window destruction call
until the last processes has reached the MPI_Win_free(). A large amount of
waiting time spent in front of MPI_Win_free() can be an indication of load
imbalance.
- Parent:
- Window Management
- Children:
- None
- Keywords:
- MPI-2, RMA, Collective Synchronization, Fence,
Remote Memory Access, 1-Sided Communication
- Unit:
- Seconds
- Description:
- This pattern refers to the time spent in collective
RMA synchronization call MPI_Win_fernce().
- Parent:
- RMA Synchronization
- Children:
- Wait at Fence
- Keywords:
- MPI-2, RMA, Collective Synchronization, Fence, Remote Memory Access, 1-Sided Communication
- Unit:
- Seconds
- Description:
- This pattern covers the time spent waiting in
front of an MPI_Win_fence(), which is the time inside the collective synchronization call
until the last processes has reached the MPI_Win_fence(). A large amount of
waiting time spent in front of MPI_Win_fence() can be an indication of load
imbalance.
- Parent:
- Fence
- Children:
- None
- Keywords:
- MPI-2, RMA, Synchronization, GATS, Remote Memory Access, 1-Sided Communication
- Unit:
- Seconds
- Description:
- This pattern refers to the time spent in general active target
synchronization calls. These are MPI_Win_post(), MPI_Win_wait(), MPI_Win_test(),
MPI_Win_start() and MPI_Win_complete().
- Parent:
- RMA Synchronization
- Children:
- Early Wait,
Late Post
- Keywords:
- MPI-2, RMA, Synchronization, GATS, Remote Memory Access, 1-Sided Communication
- Unit:
- Seconds
- Description:
- Time lost in MPI_Win_wait() call, which will block until
all matching calls to MPI_Win_Complete() have occurred. Part of lost time can be
caused by Late Complete
- Parent:
- General Active Target Synchronization
- Children:
- Late Complete
- Keywords:
- MPI-2, RMA, Synchronization, GATS, Remote Memory Access, 1-Sided Communication
- Unit:
- Seconds
- Description:
- The end of exposure epoch marked by a MPI_Win_wait call is
delayed as one or more MPI_Win_complete() calls are executed too late. (i.e., not
immediately after the last communication call.)
- Parent:
- Early Wait
- Children:
- None
- Keywords:
- MPI-2, RMA, Synchronization, GATS, Remote Memory Access, 1-Sided Communication
- Unit:
- Seconds
- Description:
- The access to the target window is delayed either
by a RMA synchronisation call MPI_Win_Start() or MPI_Win_complete() until the window is
exposed.
- Parent:
- General Active Target Synchronization
- Children:
- None
- Keywords:
- MPI-2, RMA, Synchronization, Locks, Remote Memory Access, 1-Sided Communication
- Unit:
- Seconds
- Description:
- This pattern refers to the time spent in MPI_Lock() and MPI_Unlock()
function calls.
- Parent:
- RMA Synchronization
- Children:
- None
- Keywords:
- MPI, initialize, finalize
- Unit:
- Seconds
- Description:
- This pattern refers to the time spent
on MPI initialization calls. It applies to MPI_Init() and MPI_Finalize()
calls.
- Parent:
- Synchronization (MPI)
- Children:
- None
- Keywords:
- OpenMP
- Unit:
- Seconds
- Description:
- Time spent on behalf of the OpenMP. This includes
time spent in OpenMP API calls as well as time spent in code generated
by the OpenMP compiler.
- Parent:
- Execution
- Children:
- Flush, Fork, Synchronization (OpenMP)
- Keywords:
- OpenMP, flush directive
- Unit:
- Seconds
- Description:
- Time spent in OpenMP flush directives.
- Parent:
- OpenMP
- Children:
- None
- Keywords:
- OpenMP, team creation
- Unit:
- Seconds
- Description:
- Time spent by the master thread creating a team of
threads.
- Parent:
- OpenMP
- Children:
- None
- Keywords:
- OpenMP, synchronization
- Unit:
- Seconds
- Description:
- Time spent in OpenMP barrier or lock
synchronization. Lock synchronization may be accomplished using either
API calls or critical sections.
- Parent:
- OpenMP
- Children:
- Barrier (OpenMP), Lock Competition (OpenMP)
- Keywords:
- OpenMP, barrier
- Unit:
- Seconds
- Description:
- This pattern refers to the time spent in implicit
(compiler-generated) or explicit (user-specified) OpenMP barrier
synchronization. Note that during measurement implicit barriers are
treated similar to explicit ones. The instrumentation procedure
replaces an implicit barrier with an explicit barrier enclosed by the
parallel construct. This is done by adding a nowait clause and a
barrier directive as the last statement of the parallel construct. In
cases where the implicit barrier cannot be removed (i.e., parallel
region), the explicit barrier is executed in front of the implicit
barrier, which will then be negligible because the team will already
be synchronized when reaching it. The synthetic explicit barrier
appears in the display as a special implicit barrier construct.
- Parent:
- OpenMP
- Children:
- Explicit, Implicit
- Keywords:
- OpenMP, explicit barrier
- Unit:
- Seconds
- Description:
- Time spent in explicit (i.e., user-specified)
OpenMP barriers.
- Parent:
- Barrier (OpenMP)
- Children:
- Wait at Barrier (Explicit)
- Keywords:
- OpenMP, explicit barrier
- Unit:
- Seconds
- Description:
- This pattern covers the time spent waiting in
front of an explicit (user-specified) OpenMP barrier. It refers to the
time spent in the barrier until all threads have reached it.
- Parent:
- Explicit
- Children:
- None
- Keywords:
- OpenMP, implicit barrier
- Unit:
- Seconds
- Description:
- Time spent in implicit (i.e., compiler-generated)
OpenMP barriers.
- Parent:
- Barrier (OpenMP)
- Children:
- Wait at Barrier (Implicit)
- Keywords:
- OpenMP, implicit barrier
- Unit:
- Seconds
- Description:
- This pattern covers the time spent waiting in
front of an implicit (compiler-generated) OpenMP barrier. It refers to
the time spent in the barrier until all threads have reached it.
- Parent:
- Implicit
- Children:
- None
- Keywords:
- OpenMP, lock synchronization
- Unit:
- Seconds
- Description:
- This pattern refers to the time a thread spent
waiting for a lock that had been previously acquired by another
thread. The lock may either had been acquired transparently at the
beginning of a critical section or using an explicit API call.
- Parent:
- Synchronization (OpenMP)
- Children:
- API Lock Synchronization,
Critical
- Keywords:
- OpenMP, API lock routines
- Unit:
- Seconds
- Description:
- This pattern refers to the time a thread spent in
an OpenMP API lock routine waiting for a lock that had been
previously acquired by another thread.
- Parent:
- Synchronization (OpenMP)
- Children:
- None
- Keywords:
- OpenMP, critical section
- Unit:
- Seconds
- Description:
- This pattern refers to the time spent waiting in
front of a critical section occupied by another thread.
- Parent:
- Lock Competition (OpenMP)
- Children:
- None
- Keywords:
- SHMEM
- Unit:
- Seconds
- Description:
- Time spent in SHMEM API calls.
- Parent:
- Execution
- Children:
- Communication (SHMEM),
Synchronization (SHMEM)
- Keywords:
- SHMEM, communication
- Unit:
- Seconds
- Description:
- This pattern refers to the time spent in SHMEM
RMA, collective and atomic communication calls. SHMEM RMA are get and put transfer
calls, Collective
- Parent:
- SHMEM
- Children:
-
Collective(SHMEM),
RMA Communication (SHMEM)
- Keywords:
- SHMEM, collective communication
- Unit:
- Seconds
- Description:
- Time spent on SHMEM collective communication. It applies
to SHMEM calls: shmem_broadcast(), shmem_broadcast_all(), shmem_and(), shmem_max(),
shmem_min(), shmem_or(), shmem_prod(), shmem_sum(), shmem_xor(), shmem_collect() and
shmem_fcollect().
- Parent:
- Communication (SHMEM)
- Children:
- Late Broadcast (SHMEM),
Wait at N x N (SHMEM)
- Keywords:
- SHMEM, 1-to-n communication, Broadcast
- Unit:
- Seconds
- Description:
- Collective communication operations that send data
from one source process to all processes (i.e., 1-to-n) may suffer
from waiting times if destination processes enter the operation
earlier than the source process, that is, before any data could have
been sent. The pattern refers to the time lost as a result of this
situation.
- Parent:
- Collective (SHMEM)
- Children:
- None
- Keywords:
- SHMEM, n-to-n communication
- Unit:
- Seconds
- Description:
- Collective communication operations that send data
from all processes to all processes (i.e., n-to-n) exhibit an inherent
synchronization among all participants, that is, no process can finish
the operation until the last process has started it. This pattern
covers the time spent in n-to-n operations until all processes have
reached it. It applies to SHMEM calls: shmem_and(), shmem_max(), shmem_min(),
shmem_or(), shmem_prod(), shmem_sum(), shmem_xor(), shmem_collect() and
shmem_fcollect().
- Parent:
- Collective (SHMEM)
- Children:
- None
- Keywords:
- SHMEM, RMA, 1-Sided Communication
- Unit:
- Seconds
- Description:
- This pattern refers to the time spent in SHMEM
RMA communication calls. RMA communication calls are SHMEM get and put transfers and
SHMEM atomic operations. Atomic operations are shmem_swap(), shmem_cswap(), shmem_mswap(),
shmem_inc(), shmem_finc(), shmem_add() and shmem_fadd() SHMEM calls.
- Parent:
- Communication (SHMEM)
- Children:
- None
- Keywords:
- SHMEM
- Unit:
- Seconds
- Description:
- This pattern refers to the time spent in SHMEM
synchronisation calls. This applies to SHMEM barriers, point-to-point synchronisation
and management function calls.
- Parent:
- SHMEM
- Children:
-
Barrier (SHMEM),
p2p Synchronisation
Init/Exit (SHMEM)
Memory Management (SHMEM)
- Keywords:
- SHMEM, synchronization
- Unit:
- Seconds
- Description:
- This pattern refers to the time spent in SHMEM
barriers.
- Parent:
- Synchronization (SHMEM)
- Children:
- Wait at Barrier (SHMEM)
- Keywords:
- SHMEM, barrier
- Unit:
- Seconds
- Description:
- This pattern covers the time spent waiting in
front of an SHMEM barrier, which is the time inside the barrier call
until the last processes has reached the barrier. A large amount of
waiting time spent in front of barriers can be an indication of load
imbalance.
- Parent:
- Barrier (SHMEM)
- Children:
- None
- Keywords:
- SHMEM, RMA
- Unit:
- Seconds
- Description:
- This pattern refers to the time spent in SHMEM
point-to-point synchronization calls.
- Parent:
- Synchronization (SHMEM)
- Children:
- Lock Completion (SHMEM),
Wait Until
- Keywords:
- SHMEM, lock synchronization
- Unit:
- Seconds
- Description:
- This pattern refers to the time a PE spent
waiting for a lock that had been previously acquired by another PE.
- Parent:
- P2P Synchronization
- Children:
- None
- Keywords:
- SHMEM, wait, wait_until, synchronization
- Unit:
- Seconds
- Description:
- This pattern refers to the time spent
waiting for a shared variable to be changed by a remote write or
atomic swap issued by a different PE. It applies to SHMEM calls
shmem_wait(), shem_wait_until()
- Parent:
- P2P Synchronization
- Children:
- None
- Keywords:
- SHMEM, initialize, finalize
- Unit:
- Seconds
- Description:
- This pattern refers to the time spent
on SHMEM initialization calls. It applies to shmem_init() and shmem_finalize()
calls.
- Parent:
- Synchronization (SHMEM)
- Children:
- None
- Keywords:
- SHMEM, memory allocation, realocation, free.
- Unit:
- Seconds
- Description:
- This pattern refers to the time spent
on SHMEM memory management calls. It applies to shmalloc(), shmalloc_nb,
shfree() and shrealloc() calls.
- Parent:
- Synchronization (SHMEM)
- Children:
- None
- Keywords:
- OpenMP, sequential execution
- Unit:
- Seconds
- Description:
- This pattern refers to idle times on CPUs reserved
for slave threads when a process is executed sequentially before or
after an OpenMP parallel region.
- Parent:
- Time
- Children:
- None
- Keywords:
- Trace generation overhead
- Unit:
- Seconds
- Description:
- Time spent performing major tasks
related to trace generation, such as time synchronization or dumping
the trace-buffer contents to a file. Note that the normal per-event
overhead is not included.
- Parent:
- Time
- Children:
- None
- Keywords:
- Function calls
- Unit:
- Number of visits
- Description:
- Number of times a certain call path has
been visited.
- Parent:
- None
- Children:
- None
CPU & Memory Patterns
- Keywords:
- Hardware counter
- Unit:
- Number of processor cycles of occurrence
- Description:
- Total processor cycles
- Parent:
- None
- Children:
- BUSY
+ IDLE
+ STALL
- Keywords:
- Hardware counter
- Unit:
- Number of instructions
- Description:
- Total instructions completed
- Parent:
- None
- Children:
- BRANCH
+ FLOATING_POINT
+ INTEGER
+ MEMORY
+ VECTOR
- Keywords:
- Hardware counter
- Unit:
- Number of instructions
- Description:
- Number of branch instructions
- Parent:
- INSTRUCTION
- Children:
- COND_BRANCH
+ UNCOND_BRANCH
- Keywords:
- Hardware counter
- Unit:
- Number of instructions
- Description:
- Number of branch instructions which were correctly predicted
- Parent:
- BRANCH
- Children:
- None
- Keywords:
- Hardware counter
- Unit:
- Number of instructions
- Description:
- Number of branch instructions which were mis-predicted
- Parent:
- BRANCH
- Children:
- None
- Keywords:
- Hardware counter
- Unit:
- Number of instructions
- Description:
- Number of floating-point instructions
- Parent:
- INSTRUCTION
- Children:
- FP_ADD
+ FP_MUL
+ FP_FMA
+ FP_DIV
+ FP_INV
+ FP_SQRT
+ FP_MISC
- Keywords:
- Hardware counter
- Unit:
- Number of instructions
- Description:
- Number of floating-point addition instructions
- Parent:
- FLOATING_POINT
- Children:
- None
- Keywords:
- Hardware counter
- Unit:
- Number of instructions
- Description:
- Number of floating-point multiplication instructions
- Parent:
- FLOATING_POINT
- Children:
- None
- Keywords:
- Hardware counter
- Unit:
- Number of instructions
- Description:
- Number of floating-point fused multiply-add instructions
- Parent:
- FLOATING_POINT
- Children:
- None
- Keywords:
- Hardware counter
- Unit:
- Number of instructions
- Description:
- Number of floating-point division instructions
- Parent:
- FLOATING_POINT
- Children:
- None
- Keywords:
- Hardware counter
- Unit:
- Number of instructions
- Description:
- Number of floating-point inverse (reciprocal?) instructions
- Parent:
- FLOATING_POINT
- Children:
- None
- Keywords:
- Hardware counter
- Unit:
- Number of instructions
- Description:
- Number of floating-point square-root instructions
- Parent:
- FLOATING_POINT
- Children:
- None
- Keywords:
- Hardware counter
- Unit:
- Number of instructions
- Description:
- Number of miscellaneous floating-point instructions
such as moves and estimates
- Parent:
- FLOATING_POINT
- Children:
- None
- Keywords:
- Hardware counter
- Unit:
- Number of instructions
- Description:
- Number of fixed-point (integer) instructions
- Parent:
- INSTRUCTION
- Children:
- None
- Keywords:
- Hardware counter
- Unit:
- Number of instructions
- Description:
- Number of memory-referencing instructions
- Parent:
- INSTRUCTION
- Children:
- LOAD
+ STORE
+ SYNCH
- Keywords:
- Hardware counter
- Unit:
- Number of instructions
- Description:
- Number of memory load (read) instructions
- Parent:
- MEMORY
- Children:
- None
- Keywords:
- Hardware counter
- Unit:
- Number of instructions
- Description:
- Number of memory store (write) instructions
- Parent:
- MEMORY
- Children:
- None
- Keywords:
- Hardware counter
- Unit:
- Number of instructions
- Description:
- Number of memory synchronization instructions
- Parent:
- MEMORY
- Children:
- None
- Keywords:
- Hardware counter
- Unit:
- Number of instructions
- Description:
- Number of vector instructions
- Parent:
- INSTRUCTION
- Children:
- None
- Keywords:
- Hardware counter
- Unit:
- Number of data accesses
- Description:
- Total data accesses
- Parent:
- None
- Children:
- DATA_HIT_L1$
+ DATA_HIT_L2$
+ DATA_HIT_L3$
+ DATA_HIT_MEM
- Synonym:
- L1_D_HIT
- Keywords:
- Hardware counter
- Unit:
- Number of data accesses
- Description:
- Total data accesses (stores and loads) which hit in 1st-level cache
- Parent:
- DATA_ACCESS
- Children:
- DATA_STORE_INTO_L1$
+ DATA_LOAD_FROM_L1$
- Synonyms:
- L1_D_READ_HIT
- Keywords:
- Hardware counter
- Unit:
- Number of data accesses
- Description:
- Total data stores (writes) which hit in 1st-level cache
- Parent:
- DATA_HIT_L1$
- Children:
- None
- Synonyms:
- L1_D_WRITE_HIT
- Keywords:
- Hardware counter
- Unit:
- Number of data accesses
- Description:
- Total data loads (reads) which hit in 1st-level cache
- Parent:
- DATA_HIT_L1$
- Children:
- None
- Synonym:
- L2_D_HIT
- Keywords:
- Hardware counter
- Unit:
- Number of data accesses
- Description:
- Total data accesses (stores and loads) which miss in 1st-level cache and hit in 2nd-level cache
- Parent:
- DATA_ACCESS
- Children:
- DATA_STORE_INTO_L2$
+ DATA_LOAD_FROM_L2$
- Synonyms:
- L2_D_READ_HIT
- Keywords:
- Hardware counter
- Unit:
- Number of data accesses
- Description:
- Total data stores (writes) which miss in 1st-level cache and hit in 2nd-level cache
- Parent:
- DATA_HIT_L2$
- Children:
- None
- Synonyms:
- L2_D_WRITE_HIT
- Keywords:
- Hardware counter
- Unit:
- Number of data accesses
- Description:
- Total data loads (reads) which miss in 1st-level cache and hit in 2nd-level cache
- Parent:
- DATA_HIT_L2$
- Children:
- None
- Synonym:
- L3_D_HIT
- Keywords:
- Hardware counter
- Unit:
- Number of data accesses
- Description:
- Total data accesses (stores and loads) which miss in 1st-level and 2nd-level caches and hit in 3rd-level cache
- Parent:
- DATA_ACCESS
- Children:
- DATA_STORE_INTO_L3$
+ DATA_LOAD_FROM_L3$
- Synonyms:
- L3_D_READ_HIT
- Keywords:
- Hardware counter
- Unit:
- Number of data accesses
- Description:
- Total data stores (writes) which miss in 1st-level and 2nd-level caches and hit in 3rd-level cache
- Parent:
- DATA_HIT_L3$
- Children:
- None
- Synonyms:
- L3_D_READ_HIT
- Keywords:
- Hardware counter
- Unit:
- Number of data accesses
- Description:
- Total data loads (reads) which miss in 1st-level and 2nd-level caches and hit in 3rd-level cache
- Parent:
- DATA_HIT_L3$
- Children:
- None
- Keywords:
- Hardware counter
- Unit:
- Number of data accesses
- Description:
- Total data accesses (stores and loads) which miss in all caches and must go to memory (system)
- Parent:
- DATA_ACCESS
- Children:
- DATA_STORE_INTO_MEM
+ DATA_LOAD_FROM_MEM
- Keywords:
- Hardware counter
- Unit:
- Number of data accesses
- Description:
- Total data stores (writes) which miss in all caches and must go to memory (system)
- Parent:
- DATA_HIT_MEM
- Children:
- None
- Keywords:
- Hardware counter
- Unit:
- Number of data accesses
- Description:
- Total data loads (reads) which miss in all caches and must go to memory (system)
- Parent:
- DATA_HIT_MEM
- Children:
- None
- Keywords:
- Hardware counter
- Unit:
- Number of instruction accesses
- Description:
- Total instruction accesses (fetches)
- Parent:
- None
- Children:
- INST_HIT_PREF
+ INST_HIT_L1$
+ INST_HIT_L2$
+ INST_HIT_L3$
+ INST_HIT_MEM
- Keywords:
- Hardware counter
- Unit:
- Number of instruction accesses
- Description:
- Total instruction prefetches
- Parent:
- INST_ACCESS
- Children:
- None
- Synonym:
- L1_I_HIT
- Keywords:
- Hardware counter
- Unit:
- Number of instruction accesses
- Description:
- Total instruction accesses (fetches) which hit in 1st-level cache
- Parent:
- INST_ACCESS
- Children:
- None
- Synonym:
- L2_I_HIT
- Keywords:
- Hardware counter
- Unit:
- Number of instruction accesses
- Description:
- Total instruction accesses (fetches) which miss in 1st-level cache and hit in 2nd-level cache
- Parent:
- INST_ACCESS
- Children:
- None
- Synonym:
- L3_I_HIT
- Keywords:
- Hardware counter
- Unit:
- Number of instruction accesses
- Description:
- Total instruction accesses (fetches) which miss in 1st-level and 2nd-level caches and hit in 3rd-level cache
- Parent:
- INST_ACCESS
- Children:
- None
- Keywords:
- Hardware counter
- Unit:
- Number of instruction accesses
- Description:
- Total instruction accesses (fetches) which miss in all caches and must go to memory (system)
- Parent:
- INST_ACCESS
- Children:
- None
- Keywords:
- Hardware counter
- Unit:
- Number of accesses
- Description:
- Total 1st-level cache accesses
- Parent:
- None
- Children:
- L1_INST
+ L1_LOAD
+ L1_STORE
- Keywords:
- Hardware counter
- Unit:
- Number of accesses
- Description:
- Total 1st-level instruction-cache accesses
- Parent:
- L1_ACCESS
- Children:
- L1_INST_HIT
+ L1_INST_MISS
- Synonym:
- L1_I_HIT
- Keywords:
- Hardware counter
- Unit:
- Number of accesses
- Description:
- Total 1st-level instruction-cache hits
- Parent:
- L1_INST
- Children:
- None
- Synonym:
- L1_I_MISS
- Keywords:
- Hardware counter
- Unit:
- Number of accesses
- Description:
- Total 1st-level instruction-cache misses
- Parent:
- L1_INST
- Children:
- None
- Synonym:
- L1_D_READ
- Keywords:
- Hardware counter
- Unit:
- Number of accesses
- Description:
- Total 1st-level data-cache loads (reads)
- Parent:
- L1_ACCESS
- Children:
- L1_LOAD_HIT
+ L1_LOAD_MISS
- Keywords:
- Hardware counter
- Unit:
- Number of accesses
- Description:
- Total 1st-level data-cache load (read) hits
- Parent:
- L1_LOAD
- Children:
- None
- Synonym:
- L1_D_READ_MISS
- Keywords:
- Hardware counter
- Unit:
- Number of accesses
- Description:
- Total 1st-level data-cache load (read) misses
- Parent:
- L1_LOAD
- Children:
- None
- Synonym:
- L1_D_WRITE
- Keywords:
- Hardware counter
- Unit:
- Number of accesses
- Description:
- Total 1st-level data-cache stores (writes)
- Parent:
- L1_ACCESS
- Children:
- L1_STORE_HIT
+ L1_STORE_MISS
- Keywords:
- Hardware counter
- Unit:
- Number of accesses
- Description:
- Total 1st-level data-cache store (write) hits
- Parent:
- L1_STORE
- Children:
- None
- Synonym:
- L1_D_WRITE_MISS
- Keywords:
- Hardware counter
- Unit:
- Number of accesses
- Description:
- Total 1st-level data-cache store (write) misses
- Parent:
- L1_STORE
- Children:
- None
- Keywords:
- Hardware counter
- Unit:
- Number of accesses
- Description:
- Total 1st-level data-cache misses
- Parent:
- None (Not currently parented)
- Children:
- L1_D_READ_MISS
+ L1_D_WRITE_MISS
- Keywords:
- Hardware counter
- Unit:
- Number of accesses
- Description:
- Total 2nd-level cache accesses
- Parent:
- None
- Children:
- L2_HIT
+ L2_MISS
- Keywords:
- Hardware counter
- Unit:
- Number of access hits
- Description:
- Total 2nd-level cache hits
- Parent:
- L2_ACCESS
- Children:
- L2_INST_HIT
+ L2_LOAD_HIT
+ L2_STORE_HIT
- Keywords:
- Hardware counter
- Unit:
- Number of accesses
- Description:
- Total 2nd-level instruction-cache hits
- Parent:
- L2_HIT
- Children:
- None
- Keywords:
- Hardware counter
- Unit:
- Number of accesses
- Description:
- Total 2nd-level data-cache load (read) hits
- Parent:
- L2_HIT
- Children:
- None
- Keywords:
- Hardware counter
- Unit:
- Number of accesses
- Description:
- Total 2nd-level data-cache store (write) hits
- Parent:
- L2_HIT
- Children:
- None
- Keywords:
- Hardware counter
- Unit:
- Number of access misses
- Description:
- Total 2nd-level cache misses
- Parent:
- L2_ACCESS
- Children:
- L2_INST_MISS
+ L2_LOAD_MISS
+ L2_STORE_MISS
- Keywords:
- Hardware counter
- Unit:
- Number of accesses
- Description:
- Total 2nd-level instruction-cache misses
- Parent:
- L2_MISS
- Children:
- None
- Keywords:
- Hardware counter
- Unit:
- Number of accesses
- Description:
- Total 2nd-level data-cache load (read) misses
- Parent:
- L2_MISS
- Children:
- None
- Keywords:
- Hardware counter
- Unit:
- Number of accesses
- Description:
- Total 2nd-level data-cache store (write) misses
- Parent:
- L2_MISS
- Children:
- None
- Keywords:
- Hardware counter
- Unit:
- Number of TLB accesses
- Description:
- Total TLB (Translation Lookaside Buffer) accesses
- Parent:
- None
- Children:
- DATA_TLB_ACCESS
+ INST_TLB_ACCESS
- Keywords:
- Hardware counter
- Unit:
- Number of Data-TLB accesses
- Description:
- Total Data-TLB (Translation Lookaside Buffer) accesses
- Parent:
- TLB_ACCESS
- Children:
- DATA_TLB_HIT
+ DATA_TLB_MISS
- Keywords:
- Hardware counter
- Unit:
- Number of Data-TLB hits
- Description:
- Data-TLB (Translation Lookaside Buffer) hits
- Parent:
- DATA_TLB_ACCESS
- Children:
- None
- Synonym:
- TLB_D_MISS
- Keywords:
- Hardware counter
- Unit:
- Number of Data-TLB misses
- Description:
- Data-TLB (Translation Lookaside Buffer) misses
- Parent:
- DATA_TLB_ACCESS
- Children:
- None
- Keywords:
- Hardware counter
- Unit:
- Number of Instruction-TLB accesses
- Description:
- Total Instruction-TLB (Translation Lookaside Buffer) accesses
- Parent:
- TLB_ACCESS
- Children:
- INST_TLB_HIT
+ INST_TLB_MISS
- Keywords:
- Hardware counter
- Unit:
- Number of Instruction-TLB hits
- Description:
- Instruction-TLB (Translation Lookaside Buffer) hits
- Parent:
- INST_TLB_ACCESS
- Children:
- None
- Synonym:
- TLB_I_MISS
- Keywords:
- Hardware counter
- Unit:
- Number of Instruction-TLB misses
- Description:
- Instruction-TLB (Translation Lookaside Buffer) misses
- Parent:
- INST_TLB_ACCESS
- Children:
- None