Introduction to the RDC tool#
The ROCm Data Center tool (RDC) simplifies the administration and addresses key infrastructure challenges in AMD GPUs in cluster and datacenter environments. The main features are:
- GPU telemetry 
- GPU statistics for jobs 
- Integration with third-party tools 
- Open source 
You can use the RDC tool in standalone mode if all components are installed. However, the existing management tools can use the same set of features available in a library format.
For details on different modes of operation, refer to Starting RDC in Installing and running RDC.
Target Audience#
The audience for the AMD RDC tool consists of:
- Administrators: RDC provides the cluster administrator with the capability of monitoring, validating, and configuring policies. 
- HPC Users: Provides GPU-centric feedback for their workload submissions. 
- OEM: Add GPU information to their existing cluster management software. 
- Open source Contributors: RDC is open source and accepts contributions from the community. 
Objective#
This documentation will:
- Introduce the tool features in RDC tool feature overview 
- Describe integration with external tools in Third party integration 
- Provide an open source handbook in Building and testing RDC 
- Introduce elements of the tool API in Introduction to RDC API 
Terminology#
| Terms | Description | 
| RDC | ROCm Data Center tool | 
| Compute node (CN) | One of many nodes containing one or more GPUs in the Data Center on which compute jobs are run | 
| Management node (MN) or Main console | A machine running system administration applications to administer and manage the Data Center | 
| GPU Groups | Logical grouping of one or more GPUs in a compute node | 
| Fields | A metric that can be monitored by the RDC, such as GPU temperature, memory usage, and power usage | 
| Field Groups | Logical grouping of multiple fields | 
| Job | A workload that is submitted to one or more compute nodes |