Local data share (LDS)

Contents

Local data share (LDS)#

LDS Speed-of-Light#

Warning

The theoretical maximum throughput for some metrics in this section are currently computed with the maximum achievable clock frequency, as reported by rocminfo, for an accelerator. This may not be realistic for all workloads.

The LDS speed-of-light chart shows a number of key metrics for the LDS as a comparison with the peak achievable values of those metrics.

Metric	Description	Unit
Utilization	Indicates what percent of the kernel’s duration the LDS was actively executing instructions (including, but not limited to, load, store, atomic and HIP’s `__shfl` operations). Calculated as the ratio of the total number of cycles LDS was active over the total CU cycles.	Percent
Access Rate	Indicates the percentage of SIMDs in the VALU [1] actively issuing LDS instructions, averaged over the lifetime of the kernel. Calculated as the ratio of the total number of cycles spent by the scheduler issuing LDS instructions over the total CU cycles.	Percent
Theoretical Bandwidth (% of Peak)	Indicates the maximum amount of bytes that could have been loaded from, stored to, or atomically updated in the LDS in this kernel, as a percent of the peak LDS bandwidth achievable. See the LDS bandwidth example for more detail.	Percent
Bank Conflict Rate	Indicates the percentage of active LDS cycles that were spent servicing bank conflicts. Calculated as the ratio of LDS cycles spent servicing bank conflicts over the number of LDS cycles that would have been required to move the same amount of data in an uncontended access. [2]	Percent

Footnotes

Statistics#

The LDS statistics panel gives a more detailed view of the hardware:

Metric	Description	Unit
LDS Instructions	The total number of LDS instructions (including, but not limited to, read/write/atomics and HIP’s `__shfl` instructions) executed per normalization unit.	Instructions per normalization unit
Theoretical Bandwidth	Indicates the maximum amount of bytes that could have been loaded from, stored to, or atomically updated in the LDS per normalization unit. Does not take into account the execution mask of the wavefront when the instruction was executed. See the LDS bandwidth example for more detail.	Bytes per normalization unit
LDS Latency	The average number of round-trip cycles (i.e., from issue to data-return / acknowledgment) required for an LDS instruction to complete.	Cycles
Bank Conflicts/Access	The ratio of the number of cycles spent in the LDS scheduler due to bank conflicts (as determined by the conflict resolution hardware) to the base number of cycles that would be spent in the LDS scheduler in a completely uncontended case. This is the unnormalized form of the Bank Conflict Rate.	Conflicts/Access
Index Accesses	The total number of cycles spent in the LDS scheduler over all operations per normalization unit.	Cycles per normalization unit
Atomic Return Cycles	The total number of cycles spent on LDS atomics with return per normalization unit.	Cycles per normalization unit
Bank Conflicts	The total number of cycles spent in the LDS scheduler due to bank conflicts (as determined by the conflict resolution hardware) per normalization unit.	Cycles per normalization unit
Address Conflicts	The total number of cycles spent in the LDS scheduler due to address conflicts (as determined by the conflict resolution hardware) per normalization unit.	Cycles per normalization unit
Unaligned Stall	The total number of cycles spent in the LDS scheduler due to stalls from non-dword aligned addresses per normalization unit.	Cycles per normalization unit
Memory Violations	The total number of out-of-bounds accesses made to the LDS, per normalization unit. This is unused and expected to be zero in most configurations for modern CDNA™ accelerators.	Accesses per normalization unit