Fine-tune Llama-3.1 8B with Llama-Factory#
This tutorial demonstrates how to fine-tune the Llama-3.1 8B large language model (LLM) on AMD ROCm GPUs by leveraging Llama-Factory. Efficient fine-tuning is vital for adapting large language models (LLMs) to downstream tasks. However, non-trivial efforts are required to implement these methods on different models.
Llama-Factory is a unified framework that integrates a suite of cutting-edge, efficient training methods.
It provides a solution for flexibly customizing the fine-tuning of over 100 LLMs without the need for coding through the built-in web UI LLAMABOARD.
Prerequisites#
This tutorial was developed and tested using the following setup.
Operating system#
Ubuntu 22.04: Ensure your system is running Ubuntu version 22.04.
Hardware#
AMD Instinct™ GPUs: This tutorial was tested on an AMD Instinct MI300X GPU. Ensure you are using an AMD Instinct GPU or compatible hardware with ROCm support and that your system meets the official requirements.
Software#
ROCm 6.3: Install and verify ROCm by following the ROCm install guide. After installation, confirm your setup using:
rocm-smi
This command lists your AMD GPUs with relevant details, similar to the image below.
Docker: Ensure Docker is installed and configured correctly. Follow the Docker installation guide for your operating system.
Note: Ensure the Docker permissions are correctly configured. To configure permissions to allow non-root access, run the following commands:
sudo usermod -aG docker $USER newgrp docker
Verify Docker is working correctly:
docker run hello-world
Hugging Face API access#
Obtain an API token from Hugging Face for downloading models.
Ensure the Hugging Face API token has the necessary permissions and approval to access the Meta Llama checkpoints.
Data preparation#
This tutorial uses a sample dataset from Hugging Face, which is prepared during the setup steps.
Prepare the training environment#
1. Pull the Docker image#
Ensure your system meets the system requirements.
Pull the Docker image required for this tutorial:
docker pull rocm/pytorch-training:latest
2. Launch the Docker container#
Launch the Docker container and map the necessary directories. Replace /path/to/notebooks
with the full path to the directory on your host machine where these notebooks are stored.
docker run -it --rm \
--network=host \
--device=/dev/kfd \
--device=/dev/dri \
--group-add=video \
--ipc=host \
--cap-add=SYS_PTRACE \
--security-opt seccomp=unconfined \
--shm-size 8G \
-v $(pwd):/workspace \
-w /workspace/notebooks \
rocm/pytorch-training:latest
Note: This command mounts the current directory to the /workspace
directory in the container. Ensure the notebook file is either copied to this directory before running the Docker command or uploaded into the Jupyter Notebook environment after it starts. Save the token or URL provided in the terminal output to access the notebook from your web browser. You can download this notebook from the AI Developer Hub GitHub repository.
3. Install and launch Jupyter#
Inside the Docker container, install Jupyter using the following command:
pip install jupyter
Start the Jupyter server:
jupyter-lab --ip=0.0.0.0 --port=8888 --no-browser --allow-root
Note: Ensure port 8888
is not already in use on your system before running the above command. If it is, you can specify a different port by replacing --port=8888
with another port number, for example, --port=8890
.
4. Install the required libraries#
Install the libraries required for this tutorial. Run the following commands inside the Jupyter notebook running within the Docker container:
!pip3 install -U deepspeed==0.16.5
# Install Llama-Factory from the source
!git clone --depth 1 https://github.com/hiyouga/LLaMA-Factory.git && cd LLaMA-Factory && pip install -e ".[torch,metrics]"
# This notebook is verified under llamafactory==0.9.3.dev0 deepspeed==0.16.5
Verify the installation:
# Verify the installation and version of the required libraries
!pip list | grep llamafactory
Here is the expected output:
llamafactory 0.9.3.dev0
5. Verify Llama-Factory for ROCm 6.3#
To confirm the package is installed correctly, run the following command:
!llamafactory-cli help
You should see the following output:
----------------------------------------------------------------------
| Usage: |
| llamafactory-cli api -h: launch an OpenAI-style API server |
| llamafactory-cli chat -h: launch a chat interface in CLI |
| llamafactory-cli eval -h: evaluate models |
| llamafactory-cli export -h: merge LoRA adapters and export model |
| llamafactory-cli train -h: train models |
| llamafactory-cli webchat -h: launch a chat interface in Web UI |
| llamafactory-cli webui: launch LlamaBoard |
| llamafactory-cli version: show version info |
----------------------------------------------------------------------
⚠️ Important: ensure the correct kernel is selected
If the verification process fails, ensure the correct Jupyter kernel is selected for your notebook. To change the kernel, follow these steps:
Go to the Kernel menu.
Select Change Kernel.
Select
Python 3 (ipykernel)
from the list.
Important: Failure to select the correct kernel can lead to unexpected issues when running the notebook.
6. Provide your Hugging Face token#
You’ll require a Hugging Face API token to access Llama-3.1. Generate your token at Hugging Face Tokens and request access for Llama-3.1 8B. Tokens typically start with “hf_”.
Run the following interactive block in your Jupyter notebook to set up the token:
Note: Uncheck the “Add token as Git credential” option.
from huggingface_hub import notebook_login, HfApi
# Prompt the user to log in
notebook_login()
Verify that your token was accepted correctly:
from huggingface_hub import HfApi
try:
api = HfApi()
user_info = api.whoami()
print(f"Token validated successfully! Logged in as: {user_info['name']}")
except Exception as e:
print(f"Token validation failed. Error: {e}")
Fine-tuning the model#
This section covers the process of setting up and running fine-tuning for the Llama-3 model using Llama-Factory. The following steps describe how to set up GPUs, import the required libraries, configure the model and training parameters, and run the fine-tuning process.
⚠️ Important: ensure the correct kernel is selected
Ensure the correct Jupyter kernel is selected for your notebook. To change the kernel, follow these steps:
Go to the Kernel menu.
Select Change Kernel.
Select
Python 3 (ipykernel)
from the list.
Important: Failure to select the correct kernel can lead to unexpected issues when running the notebook.
Set and verify the GPU availability#
Begin by specifying the GPUs available for fine-tuning. Verify that they are properly detected by PyTorch.
import os
import torch
gpus= [0, 1] # Rank 0 is for MI300x single device finetune, and Rank 0/1 for full
os.environ.setdefault("CUDA_VISIBLE_DEVICES", ','.join(map(str, gpus)))
# Ensure PyTorch detects the GPUs correctly
print(f"PyTorch detected number of available devices: {torch.cuda.device_count()}")
Download the model#
Run the following command to download the weights to your local machine. This also downloads the tokenizer model and a responsible use guide.
To download Llama-3, use the following command:
!huggingface-cli download --resume-download meta-llama/Meta-Llama-3-8B-Instruct
Run the fine-tuning recipes#
Now use the fine-tuning recipe llama3_full_sft.yaml
for distributed training. This recipe predefines the dataset identity and alpaca_en_demo JSON files.
Note: These datasets are only for evaluation purposes. You might need to change them to the actual ones.
You can fine-tune Llama-3 8B on eight GPUs using the following command:
!cd LLaMA-Factory/ && llamafactory-cli train examples/train_full/llama3_full_sft.yaml
Monitoring GPU memory#
To monitor GPU memory during training, run the following command in a terminal:
!rocm-smi
This command displays memory usage and other GPU metrics to ensure your hardware resources are being optimally used.