LLM inference with PyTorch + Huggingface transformers#
Install Huggingface transformers#
Follow these steps to install Huggingface transformers.
Prerequisites#
Python 3.12 is installed.
25.20.01.14 graphics driver is installed. Refer to Install Pytorch for Radeon GPUs for more information.
Installation#
Follow these steps to install Transformers with Powershell.
Create and activate a Python virtual environment in a directory of your choice.
python -m venv llm-venv llm-venv\Scripts\activate
Install custom PyTorch packages in the created virtual environment. Enter the commands to install torch, torchvision and torchaudio for ROCm AMD GPU support.
pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-6.4.4/torch-2.8.0a0%2Bgitfc14c65-cp312-cp312-win_amd64.whl pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-6.4.4/torchaudio-2.6.0a0%2B1a8f621-cp312-cp312-win_amd64.whl pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-6.4.4/torchvision-0.24.0a0%2Bc85f008-cp312-cp312-win_amd64.whl
Note
This may take several minutes.
For more information, see Install PyTorch for Radeon GPUs.Install transformers.
Install a specific release version with the following command:
pip install transformers
(Optional) Install HuggingFaceHub, which is the Python client to download, and upload models to Hugging Face.
pip install huggingface-hub hf auth login # login if desired
LLM inference#
Python
import torch
from transformers import pipeline
model_id = "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)
messages = [
    {"role": "system", "content": "You are a helpful technology enthusiast."},
    {"role": "user", "content": "What is AMD Radeon?"},
]
outputs = pipe(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])
Model support matrix#
Model  | 
Link  | 
Supported  | 
|---|---|---|
Llama-3.2-1B-Instruct  | 
https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct  | 
Yes  | 
Llama-3.2-3B-Instruct  | 
https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct  | 
Yes  | 
DeepSeek-R1-Distill-Qwen-1.5B  | 
https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B  | 
Yes  | 
Note
Proprietary Meta access is required.
Alternatively, open-source versions can be found here: