LLM inference with PyTorch + Huggingface transformers

LLM inference with PyTorch + Huggingface transformers#

Install Huggingface transformers#

Follow these steps to install Huggingface transformers.

Prerequisites#

Installation#

Follow these steps to install Transformers with Powershell.

  1. Create and activate a Python virtual environment in a directory of your choice.

    python -m venv llm-venv
    llm-venv\Scripts\activate
    
  2. Install custom PyTorch packages in the created virtual environment. Enter the commands to install torch, torchvision and torchaudio for ROCm AMD GPU support.

    pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-6.4.4/torch-2.8.0a0%2Bgitfc14c65-cp312-cp312-win_amd64.whl
    pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-6.4.4/torchaudio-2.6.0a0%2B1a8f621-cp312-cp312-win_amd64.whl
    pip install --no-cache-dir https://repo.radeon.com/rocm/windows/rocm-rel-6.4.4/torchvision-0.24.0a0%2Bc85f008-cp312-cp312-win_amd64.whl
    

    Note
    This may take several minutes.
    For more information, see Install PyTorch for Radeon GPUs.

  3. Install transformers.

    Install a specific release version with the following command:

    pip install transformers
    
  4. (Optional) Install HuggingFaceHub, which is the Python client to download, and upload models to Hugging Face.

    pip install huggingface-hub
    hf auth login # login if desired
    

LLM inference#

Python
import torch
from transformers import pipeline
model_id = "deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B"
pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype=torch.float16,
    device_map="auto",
)
messages = [
    {"role": "system", "content": "You are a helpful technology enthusiast."},
    {"role": "user", "content": "What is AMD Radeon?"},
]
outputs = pipe(
    messages,
    max_new_tokens=256,
)
print(outputs[0]["generated_text"][-1])

Model support matrix#

Model

Link

Supported

Llama-3.2-1B-Instruct

https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct

Yes

Llama-3.2-3B-Instruct

https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct

Yes

DeepSeek-R1-Distill-Qwen-1.5B

https://huggingface.co/deepseek-ai/DeepSeek-R1-Distill-Qwen-1.5B

Yes

Note
Proprietary Meta access is required.
Alternatively, open-source versions can be found here: