LLM inference with PyTorch + Huggingface transformers

LLM inference with PyTorch + Huggingface transformers#

Install Huggingface transformers#

Follow these steps to install Huggingface transformers.

Prerequisites#

Installation#

Follow these steps to install Transformers with Powershell.

  1. Create and activate a Python virtual environment in a directory of your choice.

    python -m venv llm-venv
    llm-venv\Scripts\activate
    
  2. Enter the commands to set up ROCm environment.

    pip install --no-cache-dir ^
        https://repo.radeon.com/rocm/windows/rocm-rel-7.2/rocm_sdk_core-7.2.0.dev0-py3-none-win_amd64.whl ^
        https://repo.radeon.com/rocm/windows/rocm-rel-7.2/rocm_sdk_devel-7.2.0.dev0-py3-none-win_amd64.whl ^
        https://repo.radeon.com/rocm/windows/rocm-rel-7.2/rocm_sdk_libraries_custom-7.2.0.dev0-py3-none-win_amd64.whl ^
        https://repo.radeon.com/rocm/windows/rocm-rel-7.2/rocm-7.2.0.dev0.tar.gz
    
    pip install --no-cache-dir `
        https://repo.radeon.com/rocm/windows/rocm-rel-7.2/rocm_sdk_core-7.2.0.dev0-py3-none-win_amd64.whl `
        https://repo.radeon.com/rocm/windows/rocm-rel-7.2/rocm_sdk_devel-7.2.0.dev0-py3-none-win_amd64.whl `
        https://repo.radeon.com/rocm/windows/rocm-rel-7.2/rocm_sdk_libraries_custom-7.2.0.dev0-py3-none-win_amd64.whl `
        https://repo.radeon.com/rocm/windows/rocm-rel-7.2/rocm-7.2.0.dev0.tar.gz
    
  3. Enter the commands to install torch, torchvision and torchaudio for ROCm AMD GPU support.

    Note
    This may take several minutes. See Compatibility matrices for support information.

    pip install --no-cache-dir ^
        https://repo.radeon.com/rocm/windows/rocm-rel-7.2/torch-2.9.1%2Brocmsdk20260116-cp312-cp312-win_amd64.whl ^
        https://repo.radeon.com/rocm/windows/rocm-rel-7.2/torchaudio-2.9.1%2Brocmsdk20260116-cp312-cp312-win_amd64.whl ^
        https://repo.radeon.com/rocm/windows/rocm-rel-7.2/torchvision-0.24.1%2Brocmsdk20260116-cp312-cp312-win_amd64.whl
    
    pip install --no-cache-dir `
        https://repo.radeon.com/rocm/windows/rocm-rel-7.2/torch-2.9.1%2Brocmsdk20260116-cp312-cp312-win_amd64.whl `
        https://repo.radeon.com/rocm/windows/rocm-rel-7.2/torchaudio-2.9.1%2Brocmsdk20260116-cp312-cp312-win_amd64.whl `
        https://repo.radeon.com/rocm/windows/rocm-rel-7.2/torchvision-0.24.1%2Brocmsdk20260116-cp312-cp312-win_amd64.whl
    
  4. Install transformers and required packages.

    pip install transformers
    pip install accelerate   
    
  5. (Optional) Install HuggingFaceHub, which is the Python client to download, and upload models to Hugging Face.

    pip install huggingface-hub
    hf auth login
    

LLM inference#

import torch
from transformers import pipeline
model_id = "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)
messages = [
    {"role": "system", "content": "You are a helpful technology enthusiast."},
    {"role": "user", "content": "What is AMD Radeon?"},
]
outputs = pipe(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

Model support matrix#

Model

Link

Supported

Llama-3.2-1B-Instruct

https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct

Yes

Llama-3.2-3B-Instruct

https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct

Yes

DeepSeek-R1-Distill-Qwen-1.5B

https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

Yes

Note
Proprietary Meta access is required for Llama models.
Alternatively, open-source versions can be found here: