Local data share (LDS)#
LDS Speed-of-Light#
Warning
The theoretical maximum throughput for some metrics in this section are
currently computed with the maximum achievable clock frequency, as reported
by rocminfo, for an accelerator. This may not be realistic for all
workloads.
The LDS speed-of-light chart shows a number of key metrics for the LDS as a comparison with the peak achievable values of those metrics.
Metric |
Description |
Unit |
|---|---|---|
Utilization |
Indicates what percent of the kernel’s duration the LDS was actively executing instructions (including, but not limited to, load, store, atomic and HIP’s |
Percent |
Access Rate |
Indicates the percentage of SIMDs in the VALU [1] actively issuing LDS instructions, averaged over the lifetime of the kernel. Calculated as the ratio of the total number of cycles spent by the scheduler issuing LDS instructions over the total CU cycles. |
Percent |
Theoretical Bandwidth Utilization |
Indicates the maximum amount of bytes that could have been loaded from, stored to, or atomically updated in the LDS divided as percentage of theoretical peak. Does not take into account the execution mask of the wavefront when the instruction was executed. See the LDS bandwidth example for more detail. |
Percent |
Bank Conflict Rate |
Indicates the percentage of active LDS cycles that were spent servicing bank conflicts. Calculated as the ratio of LDS cycles spent servicing bank conflicts over the number of LDS cycles that would have been required to move the same amount of data in an uncontended access. [2] |
Percent |
Metric |
Description |
Unit |
|---|---|---|
Utilization |
Indicates what percent of the kernel’s duration the LDS was actively executing instructions (including, but not limited to, load, store, atomic and HIP’s |
Percent |
Access Rate |
Indicates the percentage of SIMDs in the VALU [1] actively issuing LDS instructions, averaged over the lifetime of the kernel. Calculated as the ratio of the total number of cycles spent by the scheduler issuing LDS instructions over the total CU cycles. |
Percent |
Theoretical Bandwidth Utilization |
Indicates the maximum amount of bytes that could have been loaded from, stored to, or atomically updated in the LDS divided as percentage of theoretical peak. Does not take into account the execution mask of the wavefront when the instruction was executed. See the LDS bandwidth example for more detail. |
Percent |
Bank Conflict Rate |
Indicates the percentage of active LDS cycles that were spent servicing bank conflicts. Calculated as the ratio of LDS cycles spent servicing bank conflicts over the number of LDS cycles that would have been required to move the same amount of data in an uncontended access. [2] |
Percent |
Metric |
Description |
Unit |
|---|---|---|
Utilization |
Indicates what percent of the kernel’s duration the LDS was actively executing instructions (including, but not limited to, load, store, atomic and HIP’s |
Percent |
Access Rate |
Indicates the percentage of SIMDs in the VALU [1] actively issuing LDS instructions, averaged over the lifetime of the kernel. Calculated as the ratio of the total number of cycles spent by the scheduler issuing LDS instructions over the total CU cycles. |
Percent |
Theoretical Bandwidth Utilization |
Indicates the maximum amount of bytes that could have been loaded from, stored to, or atomically updated in the LDS divided as percentage of theoretical peak. Does not take into account the execution mask of the wavefront when the instruction was executed. See the LDS bandwidth example for more detail. |
Percent |
Bank Conflict Rate |
Indicates the percentage of active LDS cycles that were spent servicing bank conflicts. Calculated as the ratio of LDS cycles spent servicing bank conflicts over the number of LDS cycles that would have been required to move the same amount of data in an uncontended access. [2] |
Percent |
Metric |
Description |
Unit |
|---|---|---|
Utilization |
Indicates what percent of the kernel’s duration the LDS was actively executing instructions (including, but not limited to, load, store, atomic and HIP’s |
Percent |
Access Rate |
Indicates the percentage of SIMDs in the VALU [1] actively issuing LDS instructions, averaged over the lifetime of the kernel. Calculated as the ratio of the total number of cycles spent by the scheduler issuing LDS instructions over the total CU cycles. |
Percent |
Theoretical Bandwidth Utilization |
Indicates the maximum amount of bytes that could have been loaded from, stored to, or atomically updated in the LDS divided as percentage of theoretical peak. Does not take into account the execution mask of the wavefront when the instruction was executed. See the LDS bandwidth example for more detail. |
Percent |
Bank Conflict Rate |
Indicates the percentage of active LDS cycles that were spent servicing bank conflicts. Calculated as the ratio of LDS cycles spent servicing bank conflicts over the number of LDS cycles that would have been required to move the same amount of data in an uncontended access. [2] |
Percent |
Footnotes
Statistics#
The LDS statistics panel gives a more detailed view of the hardware:
Metric |
Description |
Unit |
|---|---|---|
Theoretical Bandwidth |
Indicates the maximum amount of bytes that could have been loaded from, stored to, or atomically updated in the LDS divided by total duration. Does not take into account the execution mask of the wavefront when the instruction was executed. See the LDS bandwidth example for more detail. |
Gbps |
LDS Instructions |
The total number of LDS instructions (including, but not limited to, read/write/atomics and HIP’s |
Instructions per Normalization Unit |
LDS Latency |
The average number of round-trip cycles (i.e., from issue to data-return acknowledgment) required for an LDS instruction to complete. |
Cycles |
Bank Conflicts/Access |
The ratio of the number of cycles spent in the LDS scheduler due to bank conflicts (as determined by the conflict resolution hardware) to the base number of cycles that would be spent in the LDS scheduler in a completely uncontended case. This is the unnormalized form of the Bank Conflict Rate. |
Conflicts per Access |
Index Accesses |
The total number of cycles spent in the LDS scheduler over all operations per normalization unit. |
Cycles per Normalization Unit |
Atomic Return Cycles |
The total number of cycles spent on LDS atomics with return per normalization unit. |
Cycles per Normalization Unit |
Bank Conflict |
The total number of cycles spent in the LDS scheduler due to bank conflicts (as determined by the conflict resolution hardware) per normalization unit. |
Cycles per Normalization Unit |
Addr Conflict |
The total number of cycles spent in the LDS scheduler due to address conflicts (as determined by the conflict resolution hardware) per normalization unit. |
Cycles per Normalization Unit |
Unaligned Stall |
The total number of cycles spent in the LDS scheduler due to stalls from non-dword aligned addresses per normalization unit. |
Cycles per Normalization Unit |
Mem Violations |
The total number of out-of-bounds accesses made to the LDS, per normalization unit. This is unused and expected to be zero in most configurations for modern CDNAu2122 accelerators. |
Accesses per Normalization Unit |
Metric |
Description |
Unit |
|---|---|---|
Theoretical Bandwidth |
Indicates the maximum amount of bytes that could have been loaded from, stored to, or atomically updated in the LDS divided by total duration. Does not take into account the execution mask of the wavefront when the instruction was executed. See the LDS bandwidth example for more detail. |
Gbps |
LDS Instructions |
The total number of LDS instructions (including, but not limited to, read/write/atomics and HIP’s |
Instructions per Normalization Unit |
LDS Latency |
The average number of round-trip cycles (i.e., from issue to data-return acknowledgment) required for an LDS instruction to complete. |
Cycles |
Bank Conflicts/Access |
The ratio of the number of cycles spent in the LDS scheduler due to bank conflicts (as determined by the conflict resolution hardware) to the base number of cycles that would be spent in the LDS scheduler in a completely uncontended case. This is the unnormalized form of the Bank Conflict Rate. |
Conflicts per Access |
Index Accesses |
The total number of cycles spent in the LDS scheduler over all operations per normalization unit. |
Cycles per Normalization Unit |
Atomic Return Cycles |
The total number of cycles spent on LDS atomics with return per normalization unit. |
Cycles per Normalization Unit |
Bank Conflict |
The total number of cycles spent in the LDS scheduler due to bank conflicts (as determined by the conflict resolution hardware) per normalization unit. |
Cycles per Normalization Unit |
Addr Conflict |
The total number of cycles spent in the LDS scheduler due to address conflicts (as determined by the conflict resolution hardware) per normalization unit. |
Cycles per Normalization Unit |
Unaligned Stall |
The total number of cycles spent in the LDS scheduler due to stalls from non-dword aligned addresses per normalization unit. |
Cycles per Normalization Unit |
Mem Violations |
The total number of out-of-bounds accesses made to the LDS, per normalization unit. This is unused and expected to be zero in most configurations for modern CDNAu2122 accelerators. |
Accesses per Normalization Unit |
Metric |
Description |
Unit |
|---|---|---|
Theoretical Bandwidth |
Indicates the maximum amount of bytes that could have been loaded from, stored to, or atomically updated in the LDS divided by total duration. Does not take into account the execution mask of the wavefront when the instruction was executed. See the LDS bandwidth example for more detail. |
Gbps |
LDS Instructions |
The total number of LDS instructions (including, but not limited to, read/write/atomics and HIP’s |
Instructions per Normalization Unit |
LDS Latency |
The average number of round-trip cycles (i.e., from issue to data-return acknowledgment) required for an LDS instruction to complete. |
Cycles |
Bank Conflicts/Access |
The ratio of the number of cycles spent in the LDS scheduler due to bank conflicts (as determined by the conflict resolution hardware) to the base number of cycles that would be spent in the LDS scheduler in a completely uncontended case. This is the unnormalized form of the Bank Conflict Rate. |
Conflicts per Access |
Index Accesses |
The total number of cycles spent in the LDS scheduler over all operations per normalization unit. |
Cycles per Normalization Unit |
Atomic Return Cycles |
The total number of cycles spent on LDS atomics with return per normalization unit. |
Cycles per Normalization Unit |
Bank Conflict |
The total number of cycles spent in the LDS scheduler due to bank conflicts (as determined by the conflict resolution hardware) per normalization unit. |
Cycles per Normalization Unit |
Addr Conflict |
The total number of cycles spent in the LDS scheduler due to address conflicts (as determined by the conflict resolution hardware) per normalization unit. |
Cycles per Normalization Unit |
Unaligned Stall |
The total number of cycles spent in the LDS scheduler due to stalls from non-dword aligned addresses per normalization unit. |
Cycles per Normalization Unit |
Mem Violations |
The total number of out-of-bounds accesses made to the LDS, per normalization unit. This is unused and expected to be zero in most configurations for modern CDNAu2122 accelerators. |
Accesses per Normalization Unit |
Metric |
Description |
Unit |
|---|---|---|
Theoretical Bandwidth |
Indicates the maximum amount of bytes that could have been loaded from, stored to, or atomically updated in the LDS divided by total duration. Does not take into account the execution mask of the wavefront when the instruction was executed. See the LDS bandwidth example for more detail. |
Gbps |
LDS LOAD Bandwidth |
The effective bandwidth of LDS load operations, accounting for the work-items executed (execution mask). Calculated as the total bytes loaded from LDS divided by the kernel duration. |
Gbps |
LDS STORE Bandwidth |
The effective bandwidth of LDS store operations, accounting for the work-items executed (execution mask). Calculated as the total bytes stored to LDS divided by the kernel duration. |
Gbps |
LDS ATOMIC Bandwidth |
The effective bandwidth of LDS atomic operations, accounting for the work-items executed (execution mask). Calculated as the total bytes accessed by LDS atomic operations divided by the kernel duration. |
Gbps |
LDS LOAD |
The total number of LDS load instructions issued per normalization unit. |
Instructions per Normalization Unit |
LDS STORE |
The total number of LDS store instructions issued per normalization unit. |
Instructions per Normalization Unit |
LDS ATOMIC |
The total number of LDS atomic instructions issued per normalization unit. |
Instructions per Normalization Unit |
LDS Instructions |
The total number of LDS instructions (including, but not limited to, read/write/atomics and HIP’s |
Instructions per Normalization Unit |
LDS Latency |
The average number of round-trip cycles (i.e., from issue to data-return acknowledgment) required for an LDS instruction to complete. |
Cycles |
Bank Conflicts/Access |
The ratio of the number of cycles spent in the LDS scheduler due to bank conflicts (as determined by the conflict resolution hardware) to the base number of cycles that would be spent in the LDS scheduler in a completely uncontended case. This is the unnormalized form of the Bank Conflict Rate. |
Conflicts per Access |
Index Accesses |
The total number of cycles spent in the LDS scheduler over all operations per normalization unit. |
Cycles per Normalization Unit |
Atomic Return Cycles |
The total number of cycles spent on LDS atomics with return per normalization unit. |
Cycles per Normalization Unit |
Bank Conflict |
The total number of cycles spent in the LDS scheduler due to bank conflicts (as determined by the conflict resolution hardware) per normalization unit. |
Cycles per Normalization Unit |
Addr Conflict |
The total number of cycles spent in the LDS scheduler due to address conflicts (as determined by the conflict resolution hardware) per normalization unit. |
Cycles per Normalization Unit |
Unaligned Stall |
The total number of cycles spent in the LDS scheduler due to stalls from non-dword aligned addresses per normalization unit. |
Cycles per Normalization Unit |
Mem Violations |
The total number of out-of-bounds accesses made to the LDS, per normalization unit. This is unused and expected to be zero in most configurations for modern CDNAu2122 accelerators. |
Accesses per Normalization Unit |
LDS Command FIFO Full Rate |
The number of cycles where the LDS command FIFO was full, per normalization unit. High values indicate backpressure in LDS instruction dispatch, which may stall wavefronts waiting to issue LDS operations. |
Cycles per Normalization Unit |
LDS Data FIFO Full Rate |
The number of cycles where the LDS data FIFO was full, per normalization unit. High values indicate backpressure in LDS data return path, which may stall wavefronts waiting for LDS read/atomic results. |
Cycles per Normalization Unit |