LLM inference with PyTorch + Huggingface transformers

Contents

LLM inference with PyTorch + Huggingface transformers#

Install Huggingface transformers#

Follow these steps to install Huggingface transformers.

Prerequisites#

Python 3.12 is installed.
25.20.01.14 graphics driver is installed. Refer to Install Pytorch for Ryzen APUs for more information.

Installation#

Follow these steps to install Transformers with Powershell.

Create and activate a Python virtual environment in a directory of your choice.
```
python -m venv llm-venv
llm-venv\Scripts\activate
```

Install custom PyTorch packages in the created virtual environment. Enter the commands to install torch, torchvision and torchaudio for ROCm AMD GPU support.

pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-6.4.4/torch-2.8.0a0%2Bgitfc14c65-cp312-cp312-win_amd64.whl
pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-6.4.4/torchaudio-2.6.0a0%2B1a8f621-cp312-cp312-win_amd64.whl
pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-6.4.4/torchvision-0.24.0a0%2Bc85f008-cp312-cp312-win_amd64.whl

Note
This may take several minutes.
For more information, see Install PyTorch for Ryzen APUs.

Install transformers and required packages.

pip install transformers
pip install accelerate   

(Optional) Install HuggingFaceHub, which is the Python client to download, and upload models to Hugging Face.
```
pip install huggingface-hub
hf auth login
```

LLM inference#

import torch
from transformers import pipeline
model_id = "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)
messages = [
    {"role": "system", "content": "You are a helpful technology enthusiast."},
    {"role": "user", "content": "What is AMD Radeon?"},
]
outputs = pipe(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

Model support matrix#

Model	Link	Supported
Llama-3.2-1B-Instruct	https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct	Yes
Llama-3.2-3B-Instruct	https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct	Yes
DeepSeek-R1-Distill-Qwen-1.5B	https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	Yes

Note
Proprietary Meta access is required for Llama models.
Alternatively, open-source versions can be found here:

unsloth/Llama-3.2-1B · Hugging Face

unsloth/Llama-3.2-3B · Hugging Face