LLM inference with PyTorch + Huggingface transformers

LLM inference with PyTorch + Huggingface transformers#

Install Huggingface transformers#

Follow these steps to install Huggingface transformers.

Prerequisites#

Installation#

Follow these steps to install Transformers with Powershell.

  1. Create and activate a Python virtual environment in a directory of your choice.

    python -m venv llm-venv
    llm-venv\Scripts\activate
    
  2. Enter the commands to set up ROCm environment.

    pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-7.1.1/rocm_sdk_core-0.1.dev0-py3-none-win_amd64.whl
    pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-7.1.1/rocm_sdk_devel-0.1.dev0-py3-none-win_amd64.whl
    pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-7.1.1/rocm_sdk_libraries_custom-0.1.dev0-py3-none-win_amd64.whl
    pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-7.1.1/rocm-0.1.dev0.tar.gz
    
  3. Install custom PyTorch packages in the created virtual environment. Enter the commands to install torch, torchvision and torchaudio for ROCm AMD GPU support.

    pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-7.1.1/torch-2.9.0+rocmsdk20251116-cp312-cp312-win_amd64.whl
    pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-7.1.1/torchaudio-2.9.0+rocmsdk20251116-cp312-cp312-win_amd64.whl
    pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-7.1.1/torchvision-0.24.0+rocmsdk20251116-cp312-cp312-win_amd64.whl
    

    Note
    This may take several minutes.
    For more information, see Install PyTorch for Radeon GPUs.

  4. Install transformers.

    Install a specific release version with the following command:

    pip install transformers
    
  5. (Optional) Install HuggingFaceHub, which is the Python client to download, and upload models to Hugging Face.

    pip install huggingface-hub
    hf auth login # login if desired
    

LLM inference#

Python
import torch
from transformers import pipeline
model_id = "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)
messages = [
    {"role": "system", "content": "You are a helpful technology enthusiast."},
    {"role": "user", "content": "What is AMD Radeon?"},
]
outputs = pipe(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

Model support matrix#

Model

Link

Supported

Llama-3.2-1B-Instruct

https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct

Yes

Llama-3.2-3B-Instruct

https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct

Yes

DeepSeek-R1-Distill-Qwen-1.5B

https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

Yes

Note
Proprietary Meta access is required.
Alternatively, open-source versions can be found here: