Deploy OpenClaw with Qwen 3.5 and vLLM on AMD GPUs#
Author: Rishi Sinha
Knowledge level: Beginner
Workshop Goal: Deploy and personalize a fully local AI agent on AMD GPUs.
What you’ll do:
Deploy Qwen3.5-122B-A10B-FP8 with vLLM on an AMD Instinct™ MI300X GPU.
Connect OpenClaw to the running model and use it as a personal AI agent with direct access to your local filesystem.
Customize the agent’s personality and behavior by editing its Markdown configuration files.
Build a “morning brief” generator for your favorite GitHub and news sites.
What you’ll learn:
How to provision GPU resources using the AMD Developer Cloud.
Serving large MOE (mixture-of-experts) models locally with an inference engine.
How Markdown configuration gives you fine-grained control over agent behavior.
A method for building automated agents to solve domain-specific problems.
Background#
Most AI assistants rely on the cloud, which means your prompts and data leave your machine. This tutorial shows how you can run everything locally on AMD hardware. OpenClaw is a local-first, self-hosted AI agent that runs entirely on your hardware. It can read your files, run commands, and automate tasks without using the cloud. You control which skills, tools, and integrations are enabled, making it a flexible foundation for personal automation with full privacy.
To power the agent, you will use Qwen3.5, a 122-billion-parameter mixture-of-experts model, which is especially good for reasoning and coding tasks. You will learn to serve the model through vLLM, a high-throughput inference engine with native ROCm support. Because MOE models activate only a subset of parameters for a token, this large model will comfortably fit on a single Instinct MI300X GPU.
This tutorial is based on the AMD technical article OpenClaw on AMD Developer Cloud: Free Deployment with Qwen 3.5 and SGLang.
Table of contents#
It’s time to get started!
Prerequisites#
This tutorial was developed and tested using the following setup.
Operating system#
Ubuntu 22.04: Ensure your system is running Ubuntu 22.04.
Hardware#
AMD Instinct GPUs: This tutorial was tested on a single AMD Instinct MI300X GPU (192 GB HBM3). Ensure you are using an AMD Instinct GPU or compatible hardware with ROCm support and that your system meets the official requirements.
Consider using the AMD Developer Cloud to obtain access to these GPUs. See the Developer Cloud details below.
Software#
ROCm 7.0: Install and verify ROCm by following the ROCm install guide. After installation, confirm your setup using:
amd-smi
This command lists your AMD GPUs with relevant details.
Note: For ROCm 6.4 and earlier, use the
rocm-smicommand instead.Docker: Ensure Docker is installed and configured correctly. Follow the Docker installation guide for your operating system.
Note: Ensure the Docker permissions are correctly configured. To configure permissions to allow non-root access, run the following commands:
sudo usermod -aG docker $USER newgrp docker
Verify Docker is working correctly:
docker run hello-world
AMD Developer Cloud credits#
The AMD AI Developer Program provides members with $100 of free AMD Developer Cloud credits, enough for approximately 50 hours of usage.
Note: Skip this cell if you already have access to a droplet or have a local GPU.
To get started:
Sign up for the AMD AI Developer Program. Existing AMD account holders can sign in and enroll directly.
Activate your credits from the AMD AI Developer Portal profile page.
Create a GPU Droplet by selecting a single Instinct MI300X instance with the ROCm software image and adding your SSH key.
After the droplet is created, connect to it via SSH:
ssh root@<YOUR_DROPLET_IP>
Program members who publicly showcase and share useful applications or projects can also apply for additional credits.
Setting up OpenClaw#
This section explains how to set up your environment, install OpenClaw, and configure the application.
1. Install and launch Jupyter#
In the droplet terminal, run these commands
python3 -m venv .venv
source .venv/bin/activate
pip install jupyterlab
After running those commands, start the Jupyter server in the same terminal:
jupyter-lab --ip=0.0.0.0 --port=8888 --no-browser --allow-root
Note: Ensure port 8888 is not already in use on your system before running the above command. If it is, you can specify a different port by replacing --port=8888 with another port number, for example, --port=8890.
After running the jupyter-lab command, you can click on the link in the terminal to access the notebook. The link has the following format:
http://127.0.0.1:<PORT>/lab?token=<TOKEN_VALUE>
Note: Ensure the notebook file is either copied to the /workspace directory or uploaded into the Jupyter Notebook environment after it starts. You can download this notebook from the AI Developer Hub GitHub repository.
Note: The following steps should be executed within your Jupyter notebook after successfully launching the Jupyter server.
2. Launch vLLM#
Run the cell below to start a Docker container to serve the Qwen3.5-122B-A10B-FP8 model with the latest vLLM.
Note: Replace abc-123 in the command below with a secure, unique API key.
%%bash
docker run -d \
--name vllm_server \
--ipc=host \
--privileged \
--device=/dev/kfd \
--device=/dev/dri \
-p 8000:8000 \
vllm/vllm-openai-rocm:latest \
--model Qwen/Qwen3.5-122B-A10B-FP8 \
--served-model-name qwen3-5-122b \
--host 0.0.0.0 --port 8000 \
--api-key abc-123 \
--reasoning-parser qwen3 \
--enable-auto-tool-choice \
--tool-call-parser qwen3_coder \
--trust-remote-code
Verify the server is running by executing the cell below. It might take a few minutes for the model to load when you run it for the first time.
You can also monitor the loading progress in your terminal with:
docker logs -f vllm_server
Look for Uvicorn running on http://0.0.0.0:8000 to confirm that it’s ready.
import urllib.request, json, subprocess, time, pathlib, sys, os, shutil
# Checks for server to be ready (polls up to 5 minutes)
print("Waiting for server to be ready", end="", flush=True)
deadline = time.time() + 300
ready = False
while time.time() < deadline:
try:
req = urllib.request.Request(
"http://localhost:8000/v1/models",
headers={"Authorization": "Bearer abc-123"}
)
with urllib.request.urlopen(req, timeout=3) as r:
models = json.loads(r.read())
ready = True
break
except Exception:
print(".", end="", flush=True)
time.sleep(5)
if ready:
print("\n✅ Server is ready")
for m in models.get("data", []):
print(f" Model: {m['id']}")
else:
print("\n❌ Server did not become ready within 5 minutes")
Troubleshooting#
If the setup script fails, check for these common issues:
Container name already in use: A container with the same name exists from a previous run. Remove it and retry:
docker rm -f vllm_server
Out of memory (OOM): The model requires approximately 80 GB of GPU memory. Verify that your GPU has enough free memory:
amd-smi monitorHealth check times out: Inspect the container logs for errors:
docker logs vllm_server
3. Install and configure OpenClaw#
Before we start OpenClaw, here’s an overview of what you will configure:
Component |
Explanation |
Location |
|---|---|---|
Gateway |
Connects OpenClaw to the model running on AMD hardware |
Process on port |
Workspace |
The agent’s “brain” — the files it reads on every message |
|
Tools |
How the agent acts: reading files, running shell commands, writing edits |
Defined in |
Skills |
Saved, reusable workflows the agent can follow on demand |
|
Open a new terminal#
Open a new terminal in JupyterLab by clicking the + button to open a new launcher tab, then find the terminal icon (not a notebook cell). The following steps must run in a terminal because the OpenClaw gateway needs to persist as a background process.
Install OpenClaw#
Run the following command in your terminal:
curl -fsSL https://openclaw.ai/install.sh | bash
Onboard OpenClaw#
Run the following command in your terminal:
openclaw onboard
OpenClaw will walk you through the setup. Select these settings:
Security Continue? →
YesSetup mode →
QuickStartModel/auth provider →
vLLMvLLM base URL → Leave as
http://127.0.0.1:8000/v1vLLM API key →
abc-123vLLM model →
qwen3-5-122bDefault model →
Keep current (vllm/qwen3-5-122b)Select channel →
Skip for nowSearch provider →
Skip for nowConfigure skills now? →
NoEnable hooks? →
Skip for nowHow do you want to hatch your bot? →
Hatch in Terminal (recommended)
The OpenClaw interface will appear. When it says gateway connected, it’s ready. This is where you will chat with your OpenClaw agent whenever you see the 🦞 emoji.
You can now start chatting with your OpenClaw agent powered by Qwen3.5 on AMD hardware.
Say hello to your agent#
Send your first message! Try something like:
“Hey! Who are you and what can you do?”
The agent will introduce itself and tell you about its capabilities!
4. Understanding the OpenClaw agent configuration#
OpenClaw’s behavior is defined by plain Markdown files in its workspace. These files are read on every message, giving the agent its personality, policies, and context about you.
~/.openclaw/workspace/
├── SOUL.md ← WHO the agent is: values, personality, tone
├── AGENTS.md ← HOW it operates: startup rules, memory, red lines
├── IDENTITY.md ← WHAT others see: name, emoji, public metadata
├── USER.md ← WHO you are: context the agent reads about you
├── TOOLS.md ← local environment notes (SSH hosts, device names, and so forth)
├── MEMORY.md ← long-term curated memory across sessions
├── HEARTBEAT.md ← checklist for periodic background checks
├── BOOTSTRAP.md ← first-run ritual (deleted after onboarding)
└── memory/ ← daily session logs (yesterday and today autoloaded)
SOUL.md is its personality. AGENTS.md is its policy. Every time the agent receives a message, it reads both files again.
Ask the agent#
In your OpenClaw terminal, ask the agent about its own configuration. Here are some examples to get started:
“List all your MD files.”
“Show me what’s in your SOUL.md, IDENTITY.md, and BOOTSTRAP.md files.”
“What do you know about me?”
“What tools do you have access to?”
The agent will read its own workspace files and explain its personality, identity, and behavior.
Project: Building your own morning brief#
The Problem
As an AI/ML developer, you have to monitor so many different repositories, for example:
vLLM
OpenClaw
ROCm (the software stack for running programs on AMD GPUs that you should check out)
Transformers
SGLang
The Solution
In this project, you’ll build an AI agent that:
Checks your favorite repos for updates.
Filters based on your interests.
Generates a personalized brief.
Runs automatically every morning.
It’s time to build it with OpenClaw!
Part 1: Schedule it with cron#
Instead of writing a script, you’ll schedule an AI agent to check your repositories every morning.
Why use an agent instead of a static script?
It adapts if something breaks.
It improves over time.
It summarizes in natural language.
It handles edge cases intelligently.
Cron jobs have been around since the beginning of Unix. They run tasks on a repeating schedule.
The difference with OpenClaw is you can use natural language to create cron jobs that can spawn agents. It doesn’t require any code or complicated commands.
Step 1: Create a daily cron job#
First, schedule an AI agent to check your repositories every morning. Feel free to change the repositories or instructions to match your interests.
Ask your OpenClaw agent to do the following:
“Generate a morning brief for me every morning at 8 AM. Check repos sgl-project/sglang, vllm-project/vllm, huggingface/transformers, ROCm/ROCm, and openclaw/openclaw for performance, GPU, and breaking changes. Report on big PRs or details. Also, give me a summary of the latest news.”
Step 2: Verify the job#
Check that the job was created by opening the Cron Jobs tab on the dashboard or by running this command in your terminal:
openclaw cron list
Note: If you see a delivery error, it’s because you haven’t configured a chat channel. You can safely ignore this message.
Step 3: Test the cron job#
The cron job is scheduled for 8 AM UTC, but you can trigger it right now to see the agent in action.
Ask your OpenClaw agent:
“Can you run the cron job right now and give me the output?”
Part 1 summary#
What you built:
A cron job that runs an AI agent every morning at 8 AM UTC
A personalized brief covering your favorite repositories and news
What you learned:
How to schedule agents with natural language
How to trigger and monitor cron jobs
Next, you’ll teach the agent to remember your preferences.
Part 2: Teaching the agent to remember#
The brief runs every day, but right now, each run starts fresh. What if you want it to skip CI noise, focus on GPU changes, or group results differently? You could retype those instructions every time, or you could teach the agent to “remember”.
How OpenClaw’s memory works#
OpenClaw has a file-based memory system that persists across sessions:
Daily notes (
memory/YYYY-MM-DD.md): the agent logs conversations, events, and decisions each day.Long-term memory (
MEMORY.md): important context and preferences that carry forward indefinitely.Workspace files (
~/.openclaw/workspace/): the agent can read and write files here. Your morning briefs are saved here, too, so you can compare today’s brief to yesterday’s.
When you use the keyword “remember”, the agent writes your preferences into its memory files. The next cron run reads those files and automatically applies them.
Step 1: Tell the agent to remember your preferences#
By using the keyword “remember”, the agent will save your preferences, so they persist across sessions and future cron runs.
Ask your OpenClaw agent the following:
“Remember: for the morning brief, include only PRs and CI/infrastructure changes. Focus on contributors and performance over the last 90 days. Group results by repo.”
Step 2: Run the brief and see the changes#
Trigger the morning brief again. The agent reads your saved preferences and applies them to the output.
Ask your OpenClaw agent:
“Run the morning brief now and show me the result.”
Step 3: Keep refining#
Send more “remember” messages at any time to update your preferences:
“Remember to also include trending Hugging Face models.”
“Remember to focus more on ROCm compatibility issues.”
Each message updates the agent’s memory files, so the next cron run automatically picks up your latest preferences.
See what the agent remembers#
As a final step, ask the agent to show you its memory file. This is where all the preferences you taught it are stored:
“Show me your MEMORY.md file.”
You should see the preferences you set earlier, such as skipping documentation-only PRs and grouping results by repository. This is the file-based memory system that carries forward across sessions and cron runs, tying back to the workspace files you explored in Section 4.
Part 2 summary#
What you built:
Persistent preferences that carry across sessions and cron runs
A feedback loop driven entirely by natural language
What you learned:
How OpenClaw’s memory system (daily notes, MEMORY.md, and workspace files) works
How the “remember” keyword saves preferences to memory
How to refine agent behavior without editing config files
Want to go further? Check out the bonus ideas below.
Bonus: Advanced features#
Here are some ideas to enhance your morning brief.
Trend detection#
Compare today’s brief against previous ones to spot trends. (This requires a few days of saved briefs first.)
Ask your OpenClaw agent the following:
“How do today’s trends compare to the last 10 morning briefs?”
Webhook delivery#
Send the brief to Slack, Discord, or Telegram. This might require some extra configuration.
Ask your OpenClaw agent the following:
“Send my morning briefs to Slack at this webhook:
https://hooks.slack.com/services/YOUR/WEBHOOK/URL.”
Multi-agent orchestration#
Spawn separate agents for each repository for parallel processing.
Ask your OpenClaw agent:
“For the morning brief, spawn a separate agent for each repo. Each one fetches and summarizes its repo, then aggregate all results.”
Note: This might take some time.
Workshop summary#
You’ve now completed the workshop and built a functional morning brief.
What you built#
Natural-language scheduling: You created cron jobs by talking to the agent.
Automated morning briefs: The agent runs every morning without manual effort.
Persistent memory: You taught the agent to remember your preferences across sessions.
Conversational refinement: You tuned the output through messages, not config files.
Cleanup#
When you’re done, stop and remove the vLLM container:
!docker stop vllm_server && docker rm vllm_server
Conclusion#
In this tutorial, you deployed Qwen3.5-122B-A10B-FP8 on a single AMD Instinct MI300X GPU using vLLM and connected it to OpenClaw as a fully local AI agent.
You then customized your agent and used it to solve a problem that developers and engineers face every day. This is an illustration of the power of local agents, which can be powered locally through the AMD Developer Cloud!
See the “Further reading” section below for deeper dives.
Further reading#
OpenClaw on AMD Developer Cloud: Free Deployment with Qwen 3.5 and SGLang: The original AMD technical article this tutorial is based on.
OpenClaw: The official page for OpenClaw.
vLLM: The inference framework used in this tutorial.
Qwen3.5-122B-A10B-FP8 on Hugging Face: The model card.
AMD AI Developer Program: Sign up for free AMD Developer Cloud credits.
AMD Infinity Hub: Ready-made Docker images for AI with ROCm.