LLM inference with PyTorch + Huggingface transformers

LLM inference with PyTorch + Huggingface transformers#

Install Huggingface transformers#

Follow these steps to install Huggingface transformers.

Prerequisites#

ROCm is installed. For instructions, refer to Install Radeon Software for Linux with ROCm.

Installation#

Follow these steps to install transformers.

Install the Python venv package for the applicable Python version.
Python 3.12/Ubuntu 24.04
sudo apt install python3.12-venv
Python 3.10/Ubuntu 22.04
sudo apt install python3.10-venv

Create a Python virtual environment.

python3 -m venv llm-venv
source llm-venv/bin/activate

Install the latest PyTorch ROCm wheels in the environment created.
Note
Refer to Install Pytorch for Radeon GPUs for more comprehensive install instructions. Proceed to install within the environment if the wheels are already downloaded to the system. Example command:
pip3 install <torch wheel> <torchaudio wheel> <torchvision wheel> <triton wheel>

Install transformers and required packages.

pip install transformers
pip install accelerate

(Optional) Install HuggingFaceHub, which is the Python client to download, and upload models to Hugging Face.
```
pip install huggingface-hub
hf auth login
```

LLM inference#

import torch
from transformers import pipeline
model_id = "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)
messages = [
    {"role": "system", "content": "You are a helpful technology enthusiast."},
    {"role": "user", "content": "What is AMD Radeon?"},
]
outputs = pipe(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

Model support matrix#

Model	Link	Supported
Llama-3.2-1B-Instruct	https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct	Yes
Llama-3.2-3B-Instruct	https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct	Yes
DeepSeek-R1-Distill-Qwen-1.5B	https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B	Yes

Note
Proprietary Meta access is required for Llama models.
Alternatively, open-source versions can be found here:

unsloth/Llama-3.2-1B · Hugging Face

unsloth/Llama-3.2-3B · Hugging Face

LLM inference with PyTorch + Huggingface transformers

Contents

LLM inference with PyTorch + Huggingface transformers#

Install Huggingface transformers#

Prerequisites#

Installation#

LLM inference#

Model support matrix#