LLM inference with PyTorch + Huggingface transformers

LLM inference with PyTorch + Huggingface transformers#

Install Huggingface transformers#

Follow these steps to install Huggingface transformers.

Prerequisites#

Python 3.12 is installed.
25.20.01.17 graphics driver is installed. Refer to Install Pytorch for Radeon GPUs for more information.

Installation#

Follow these steps to install Transformers with Powershell.

Create and activate a Python virtual environment in a directory of your choice.
```
python -m venv llm-venv
llm-venv\Scripts\activate
```

Enter the commands to set up ROCm environment.

pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-7.1.1/rocm_sdk_core-0.1.dev0-py3-none-win_amd64.whl
pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-7.1.1/rocm_sdk_devel-0.1.dev0-py3-none-win_amd64.whl
pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-7.1.1/rocm_sdk_libraries_custom-0.1.dev0-py3-none-win_amd64.whl
pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-7.1.1/rocm-0.1.dev0.tar.gz

Install custom PyTorch packages in the created virtual environment. Enter the commands to install torch, torchvision and torchaudio for ROCm AMD GPU support.

pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-7.1.1/torch-2.9.0+rocmsdk20251116-cp312-cp312-win_amd64.whl
pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-7.1.1/torchaudio-2.9.0+rocmsdk20251116-cp312-cp312-win_amd64.whl
pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-7.1.1/torchvision-0.24.0+rocmsdk20251116-cp312-cp312-win_amd64.whl

Note
This may take several minutes.
For more information, see Install PyTorch for Radeon GPUs.

Install transformers.

Install a specific release version with the following command:
```
pip install transformers
```
(Optional) Install HuggingFaceHub, which is the Python client to download, and upload models to Hugging Face.
```
pip install huggingface-hub
hf auth login # login if desired
```

LLM inference#

Python
import torch
from transformers import pipeline
model_id = "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)
messages = [
    {"role": "system", "content": "You are a helpful technology enthusiast."},
    {"role": "user", "content": "What is AMD Radeon?"},
]
outputs = pipe(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

Model support matrix#

Model	Link	Supported
Llama-3.2-1B-Instruct	https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct	Yes
Llama-3.2-3B-Instruct	https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct	Yes
DeepSeek-R1-Distill-Qwen-1.5B	https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	Yes

Note
Proprietary Meta access is required.
Alternatively, open-source versions can be found here:

unsloth/Llama-3.2-1B · Hugging Face

unsloth/Llama-3.2-3B · Hugging Face