LLM inference with PyTorch + Huggingface transformers#
Install Huggingface transformers#
Follow these steps to install Huggingface transformers.
Prerequisites#
Python 3.12 is installed.
25.20.01.14 graphics driver is installed. Refer to Install Pytorch for Radeon GPUs for more information.
Installation#
Follow these steps to install Transformers with Powershell.
Create and activate a Python virtual environment in a directory of your choice.
python -m venv llm-venv llm-venv\Scripts\activate
Install custom PyTorch packages in the created virtual environment. Enter the commands to install torch, torchvision and torchaudio for ROCm AMD GPU support.
pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-6.4.4/torch-2.8.0a0%2Bgitfc14c65-cp312-cp312-win_amd64.whl pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-6.4.4/torchaudio-2.6.0a0%2B1a8f621-cp312-cp312-win_amd64.whl pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-6.4.4/torchvision-0.24.0a0%2Bc85f008-cp312-cp312-win_amd64.whl
Note
This may take several minutes.
For more information, see Install PyTorch for Radeon GPUs.Install transformers.
Install a specific release version with the following command:
pip install transformers
(Optional) Install HuggingFaceHub, which is the Python client to download, and upload models to Hugging Face.
pip install huggingface-hub hf auth login # login if desired
LLM inference#
Python
import torch
from transformers import pipeline
model_id = "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
pipe = pipeline(
"text-generation",
model=model_id,
torch_dtype=torch.float16,
device_map="auto",
)
messages = [
{"role": "system", "content": "You are a helpful technology enthusiast."},
{"role": "user", "content": "What is AMD Radeon?"},
]
outputs = pipe(
messages,
max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])
Model support matrix#
Model |
Link |
Supported |
---|---|---|
Llama-3.2-1B-Instruct |
https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct |
Yes |
Llama-3.2-3B-Instruct |
https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct |
Yes |
DeepSeek-R1-Distill-Qwen-1.5B |
https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B |
Yes |
Note
Proprietary Meta access is required.
Alternatively, open-source versions can be found here: