Local data share (LDS)#
LDS Speed-of-Light#
Warning
The theoretical maximum throughput for some metrics in this section are
currently computed with the maximum achievable clock frequency, as reported
by rocminfo
, for an accelerator. This may not be realistic for all
workloads.
The LDS speed-of-light chart shows a number of key metrics for the LDS as a comparison with the peak achievable values of those metrics.
Metric |
Description |
Unit |
---|---|---|
Utilization |
Indicates what percent of the kernel’s duration the LDS
was actively executing instructions (including, but not limited to, load,
store, atomic and HIP’s |
Percent |
Access Rate |
Indicates the percentage of SIMDs in the VALU [1] actively issuing LDS instructions, averaged over the lifetime of the kernel. Calculated as the ratio of the total number of cycles spent by the scheduler issuing LDS instructions over the total CU cycles. |
Percent |
Theoretical Bandwidth (% of Peak) |
Indicates the maximum amount of bytes that could have been loaded from, stored to, or atomically updated in the LDS in this kernel, as a percent of the peak LDS bandwidth achievable. See the LDS bandwidth example for more detail. |
Percent |
Bank Conflict Rate |
Indicates the percentage of active LDS cycles that were spent servicing bank conflicts. Calculated as the ratio of LDS cycles spent servicing bank conflicts over the number of LDS cycles that would have been required to move the same amount of data in an uncontended access. [2] |
Percent |
Footnotes
Statistics#
The LDS statistics panel gives a more detailed view of the hardware:
Metric |
Description |
Unit |
---|---|---|
LDS Instructions |
The total number of LDS instructions (including, but not limited to,
read/write/atomics and HIP’s |
Instructions per normalization unit |
Theoretical Bandwidth |
Indicates the maximum amount of bytes that could have been loaded from, stored to, or atomically updated in the LDS per normalization unit. Does not take into account the execution mask of the wavefront when the instruction was executed. See the LDS bandwidth example for more detail. |
Bytes per normalization unit |
LDS Latency |
The average number of round-trip cycles (i.e., from issue to data-return / acknowledgment) required for an LDS instruction to complete. |
Cycles |
Bank Conflicts/Access |
The ratio of the number of cycles spent in the LDS scheduler due to bank conflicts (as determined by the conflict resolution hardware) to the base number of cycles that would be spent in the LDS scheduler in a completely uncontended case. This is the unnormalized form of the Bank Conflict Rate. |
Conflicts/Access |
Index Accesses |
The total number of cycles spent in the LDS scheduler over all operations per normalization unit. |
Cycles per normalization unit |
Atomic Return Cycles |
The total number of cycles spent on LDS atomics with return per normalization unit. |
Cycles per normalization unit |
Bank Conflicts |
The total number of cycles spent in the LDS scheduler due to bank conflicts (as determined by the conflict resolution hardware) per normalization unit. |
Cycles per normalization unit |
Address Conflicts |
The total number of cycles spent in the LDS scheduler due to address conflicts (as determined by the conflict resolution hardware) per normalization unit. |
Cycles per normalization unit |
Unaligned Stall |
The total number of cycles spent in the LDS scheduler due to stalls from non-dword aligned addresses per normalization unit. |
Cycles per normalization unit |
Memory Violations |
The total number of out-of-bounds accesses made to the LDS, per normalization unit. This is unused and expected to be zero in most configurations for modern CDNA™ accelerators. |
Accesses per normalization unit |