Deploy OpenClaw with Qwen 3.5 and vLLM on AMD GPUs#

Author: Rishi Sinha
Knowledge level: Beginner

Workshop Goal: Deploy and personalize a fully local AI agent on AMD GPUs.

What you’ll do:

  • Deploy Qwen3.5-122B-A10B-FP8 with vLLM on an AMD Instinct™ MI300X GPU.

  • Connect OpenClaw to the running model and use it as a personal AI agent with direct access to your local filesystem.

  • Customize the agent’s personality and behavior by editing its Markdown configuration files.

  • Build a “morning brief” generator for your favorite GitHub and news sites.

What you’ll learn:

  • How to provision GPU resources using the AMD Developer Cloud.

  • Serving large MOE (mixture-of-experts) models locally with an inference engine.

  • How Markdown configuration gives you fine-grained control over agent behavior.

  • A method for building automated agents to solve domain-specific problems.

Background#

Most AI assistants rely on the cloud, which means your prompts and data leave your machine. This tutorial shows how you can run everything locally on AMD hardware. OpenClaw is a local-first, self-hosted AI agent that runs entirely on your hardware. It can read your files, run commands, and automate tasks without using the cloud. You control which skills, tools, and integrations are enabled, making it a flexible foundation for personal automation with full privacy.

To power the agent, you will use Qwen3.5, a 122-billion-parameter mixture-of-experts model, which is especially good for reasoning and coding tasks. You will learn to serve the model through vLLM, a high-throughput inference engine with native ROCm support. Because MOE models activate only a subset of parameters for a token, this large model will comfortably fit on a single Instinct MI300X GPU.

This tutorial is based on the AMD technical article OpenClaw on AMD Developer Cloud: Free Deployment with Qwen 3.5 and SGLang.

Table of contents#

  1. Prerequisites

  2. AMD Developer Cloud credits

  3. Install and launch Jupyter

  4. Launch vLLM

  5. Install and configure OpenClaw

  6. Understanding the OpenClaw agent configuration

  7. Project: Building your own morning brief

  8. Cleanup

  9. Conclusion

  10. Further reading

It’s time to get started!

Prerequisites#

This tutorial was developed and tested using the following setup.

Operating system#

  • Ubuntu 22.04: Ensure your system is running Ubuntu 22.04.

Hardware#

  • AMD Instinct GPUs: This tutorial was tested on a single AMD Instinct MI300X GPU (192 GB HBM3). Ensure you are using an AMD Instinct GPU or compatible hardware with ROCm support and that your system meets the official requirements.

  • Consider using the AMD Developer Cloud to obtain access to these GPUs. See the Developer Cloud details below.

Software#

  • ROCm 7.0: Install and verify ROCm by following the ROCm install guide. After installation, confirm your setup using:

    amd-smi
    

    This command lists your AMD GPUs with relevant details.

    Note: For ROCm 6.4 and earlier, use the rocm-smi command instead.

  • Docker: Ensure Docker is installed and configured correctly. Follow the Docker installation guide for your operating system.

    Note: Ensure the Docker permissions are correctly configured. To configure permissions to allow non-root access, run the following commands:

    sudo usermod -aG docker $USER
    newgrp docker
    

    Verify Docker is working correctly:

    docker run hello-world
    

AMD Developer Cloud credits#

The AMD AI Developer Program provides members with $100 of free AMD Developer Cloud credits, enough for approximately 50 hours of usage.

Note: Skip this cell if you already have access to a droplet or have a local GPU.

To get started:

  1. Sign up for the AMD AI Developer Program. Existing AMD account holders can sign in and enroll directly.

  2. Activate your credits from the AMD AI Developer Portal profile page.

  3. Create a GPU Droplet by selecting a single Instinct MI300X instance with the ROCm software image and adding your SSH key.

After the droplet is created, connect to it via SSH:

ssh root@<YOUR_DROPLET_IP>

Program members who publicly showcase and share useful applications or projects can also apply for additional credits.

Setting up OpenClaw#

This section explains how to set up your environment, install OpenClaw, and configure the application.

1. Install and launch Jupyter#

In the droplet terminal, run these commands

python3 -m venv .venv
source .venv/bin/activate
pip install jupyterlab

After running those commands, start the Jupyter server in the same terminal:

jupyter-lab --ip=0.0.0.0 --port=8888 --no-browser --allow-root

Note: Ensure port 8888 is not already in use on your system before running the above command. If it is, you can specify a different port by replacing --port=8888 with another port number, for example, --port=8890.

After running the jupyter-lab command, you can click on the link in the terminal to access the notebook. The link has the following format:

http://127.0.0.1:<PORT>/lab?token=<TOKEN_VALUE>

Note: Ensure the notebook file is either copied to the /workspace directory or uploaded into the Jupyter Notebook environment after it starts. You can download this notebook from the AI Developer Hub GitHub repository.

Note: The following steps should be executed within your Jupyter notebook after successfully launching the Jupyter server.

2. Launch vLLM#

Run the cell below to start a Docker container to serve the Qwen3.5-122B-A10B-FP8 model with the latest vLLM.

Note: Replace abc-123 in the command below with a secure, unique API key.

%%bash
docker run -d \
    --name vllm_server \
    --ipc=host \
    --privileged \
    --device=/dev/kfd \
    --device=/dev/dri \
    -p 8000:8000 \
    vllm/vllm-openai-rocm:latest \
    --model Qwen/Qwen3.5-122B-A10B-FP8 \
    --served-model-name qwen3-5-122b \
    --host 0.0.0.0 --port 8000 \
    --api-key abc-123 \
    --reasoning-parser qwen3 \
    --enable-auto-tool-choice \
    --tool-call-parser qwen3_coder \
    --trust-remote-code

Verify the server is running by executing the cell below. It might take a few minutes for the model to load when you run it for the first time.

You can also monitor the loading progress in your terminal with:

docker logs -f vllm_server

Look for Uvicorn running on http://0.0.0.0:8000 to confirm that it’s ready.

import urllib.request, json, subprocess, time, pathlib, sys, os, shutil

# Checks for server to be ready (polls up to 5 minutes)
print("Waiting for server to be ready", end="", flush=True)
deadline = time.time() + 300
ready = False
while time.time() < deadline:
    try:
        req = urllib.request.Request(
            "http://localhost:8000/v1/models",
            headers={"Authorization": "Bearer abc-123"}
        )
        with urllib.request.urlopen(req, timeout=3) as r:
            models = json.loads(r.read())
        ready = True
        break
    except Exception:
        print(".", end="", flush=True)
        time.sleep(5)

if ready:
    print("\n✅ Server is ready")
    for m in models.get("data", []):
        print(f"   Model: {m['id']}")
else:
    print("\n❌ Server did not become ready within 5 minutes")

Troubleshooting#

If the setup script fails, check for these common issues:

  • Container name already in use: A container with the same name exists from a previous run. Remove it and retry:

    docker rm -f vllm_server
    
  • Out of memory (OOM): The model requires approximately 80 GB of GPU memory. Verify that your GPU has enough free memory:

    amd-smi monitor
    
  • Health check times out: Inspect the container logs for errors:

    docker logs vllm_server
    

3. Install and configure OpenClaw#

Before we start OpenClaw, here’s an overview of what you will configure:

Component

Explanation

Location

Gateway

Connects OpenClaw to the model running on AMD hardware

Process on port 18789

Workspace

The agent’s “brain” — the files it reads on every message

~/.openclaw/workspace/

Tools

How the agent acts: reading files, running shell commands, writing edits

Defined in TOOLS.md

Skills

Saved, reusable workflows the agent can follow on demand

workspace/skills/

Open a new terminal#

Open a new terminal in JupyterLab by clicking the + button to open a new launcher tab, then find the terminal icon (not a notebook cell). The following steps must run in a terminal because the OpenClaw gateway needs to persist as a background process.

Install OpenClaw#

Run the following command in your terminal:

curl -fsSL https://openclaw.ai/install.sh | bash

Onboard OpenClaw#

Run the following command in your terminal:

openclaw onboard

OpenClaw will walk you through the setup. Select these settings:

  1. Security Continue?Yes

  2. Setup modeQuickStart

  3. Model/auth providervLLM

  4. vLLM base URL → Leave as http://127.0.0.1:8000/v1

  5. vLLM API keyabc-123

  6. vLLM modelqwen3-5-122b

  7. Default modelKeep current (vllm/qwen3-5-122b)

  8. Select channelSkip for now

  9. Search providerSkip for now

  10. Configure skills now?No

  11. Enable hooks?Skip for now

  12. How do you want to hatch your bot?Hatch in Terminal (recommended)

The OpenClaw interface will appear. When it says gateway connected, it’s ready. This is where you will chat with your OpenClaw agent whenever you see the 🦞 emoji.

You can now start chatting with your OpenClaw agent powered by Qwen3.5 on AMD hardware.

Say hello to your agent#

Send your first message! Try something like:

“Hey! Who are you and what can you do?”

The agent will introduce itself and tell you about its capabilities!

4. Understanding the OpenClaw agent configuration#

OpenClaw’s behavior is defined by plain Markdown files in its workspace. These files are read on every message, giving the agent its personality, policies, and context about you.

~/.openclaw/workspace/
├── SOUL.md       ← WHO the agent is: values, personality, tone
├── AGENTS.md     ← HOW it operates: startup rules, memory, red lines
├── IDENTITY.md   ← WHAT others see: name, emoji, public metadata
├── USER.md       ← WHO you are: context the agent reads about you
├── TOOLS.md      ← local environment notes (SSH hosts, device names, and so forth)
├── MEMORY.md     ← long-term curated memory across sessions
├── HEARTBEAT.md  ← checklist for periodic background checks
├── BOOTSTRAP.md  ← first-run ritual (deleted after onboarding)
└── memory/       ← daily session logs (yesterday and today autoloaded)

SOUL.md is its personality. AGENTS.md is its policy. Every time the agent receives a message, it reads both files again.

Ask the agent#

In your OpenClaw terminal, ask the agent about its own configuration. Here are some examples to get started:

  • “List all your MD files.”

  • “Show me what’s in your SOUL.md, IDENTITY.md, and BOOTSTRAP.md files.”

  • “What do you know about me?”

  • “What tools do you have access to?”

The agent will read its own workspace files and explain its personality, identity, and behavior.

Project: Building your own morning brief#

The Problem

As an AI/ML developer, you have to monitor so many different repositories, for example:

  • vLLM

  • OpenClaw

  • ROCm (the software stack for running programs on AMD GPUs that you should check out)

  • Transformers

  • SGLang

The Solution

In this project, you’ll build an AI agent that:

  1. Checks your favorite repos for updates.

  2. Filters based on your interests.

  3. Generates a personalized brief.

  4. Runs automatically every morning.

It’s time to build it with OpenClaw!

Part 1: Schedule it with cron#

Instead of writing a script, you’ll schedule an AI agent to check your repositories every morning.

Why use an agent instead of a static script?

  • It adapts if something breaks.

  • It improves over time.

  • It summarizes in natural language.

  • It handles edge cases intelligently.

Cron jobs have been around since the beginning of Unix. They run tasks on a repeating schedule.

The difference with OpenClaw is you can use natural language to create cron jobs that can spawn agents. It doesn’t require any code or complicated commands.

Step 1: Create a daily cron job#

First, schedule an AI agent to check your repositories every morning. Feel free to change the repositories or instructions to match your interests.

Ask your OpenClaw agent to do the following:

“Generate a morning brief for me every morning at 8 AM. Check repos sgl-project/sglang, vllm-project/vllm, huggingface/transformers, ROCm/ROCm, and openclaw/openclaw for performance, GPU, and breaking changes. Report on big PRs or details. Also, give me a summary of the latest news.”

Step 2: Verify the job#

Check that the job was created by opening the Cron Jobs tab on the dashboard or by running this command in your terminal:

openclaw cron list

Note: If you see a delivery error, it’s because you haven’t configured a chat channel. You can safely ignore this message.

Step 3: Test the cron job#

The cron job is scheduled for 8 AM UTC, but you can trigger it right now to see the agent in action.

Ask your OpenClaw agent:

“Can you run the cron job right now and give me the output?”

Part 1 summary#

What you built:

  • A cron job that runs an AI agent every morning at 8 AM UTC

  • A personalized brief covering your favorite repositories and news

What you learned:

  • How to schedule agents with natural language

  • How to trigger and monitor cron jobs

Next, you’ll teach the agent to remember your preferences.

Part 2: Teaching the agent to remember#

The brief runs every day, but right now, each run starts fresh. What if you want it to skip CI noise, focus on GPU changes, or group results differently? You could retype those instructions every time, or you could teach the agent to “remember”.

How OpenClaw’s memory works#

OpenClaw has a file-based memory system that persists across sessions:

  • Daily notes (memory/YYYY-MM-DD.md): the agent logs conversations, events, and decisions each day.

  • Long-term memory (MEMORY.md): important context and preferences that carry forward indefinitely.

  • Workspace files (~/.openclaw/workspace/): the agent can read and write files here. Your morning briefs are saved here, too, so you can compare today’s brief to yesterday’s.

When you use the keyword “remember”, the agent writes your preferences into its memory files. The next cron run reads those files and automatically applies them.

Step 1: Tell the agent to remember your preferences#

By using the keyword “remember”, the agent will save your preferences, so they persist across sessions and future cron runs.

Ask your OpenClaw agent the following:

“Remember: for the morning brief, include only PRs and CI/infrastructure changes. Focus on contributors and performance over the last 90 days. Group results by repo.”

Step 2: Run the brief and see the changes#

Trigger the morning brief again. The agent reads your saved preferences and applies them to the output.

Ask your OpenClaw agent:

“Run the morning brief now and show me the result.”

Step 3: Keep refining#

Send more “remember” messages at any time to update your preferences:

  • “Remember to also include trending Hugging Face models.”

  • “Remember to focus more on ROCm compatibility issues.”

Each message updates the agent’s memory files, so the next cron run automatically picks up your latest preferences.

See what the agent remembers#

As a final step, ask the agent to show you its memory file. This is where all the preferences you taught it are stored:

“Show me your MEMORY.md file.”

You should see the preferences you set earlier, such as skipping documentation-only PRs and grouping results by repository. This is the file-based memory system that carries forward across sessions and cron runs, tying back to the workspace files you explored in Section 4.

Part 2 summary#

What you built:

  • Persistent preferences that carry across sessions and cron runs

  • A feedback loop driven entirely by natural language

What you learned:

  • How OpenClaw’s memory system (daily notes, MEMORY.md, and workspace files) works

  • How the “remember” keyword saves preferences to memory

  • How to refine agent behavior without editing config files

Want to go further? Check out the bonus ideas below.

Bonus: Advanced features#

Here are some ideas to enhance your morning brief.

Trend detection#

Compare today’s brief against previous ones to spot trends. (This requires a few days of saved briefs first.)

Ask your OpenClaw agent the following:

“How do today’s trends compare to the last 10 morning briefs?”

Webhook delivery#

Send the brief to Slack, Discord, or Telegram. This might require some extra configuration.

Ask your OpenClaw agent the following:

“Send my morning briefs to Slack at this webhook: https://hooks.slack.com/services/YOUR/WEBHOOK/URL.”

Multi-agent orchestration#

Spawn separate agents for each repository for parallel processing.

Ask your OpenClaw agent:

“For the morning brief, spawn a separate agent for each repo. Each one fetches and summarizes its repo, then aggregate all results.”

Note: This might take some time.

Workshop summary#

You’ve now completed the workshop and built a functional morning brief.

What you built#

  • Natural-language scheduling: You created cron jobs by talking to the agent.

  • Automated morning briefs: The agent runs every morning without manual effort.

  • Persistent memory: You taught the agent to remember your preferences across sessions.

  • Conversational refinement: You tuned the output through messages, not config files.

Cleanup#

When you’re done, stop and remove the vLLM container:

!docker stop vllm_server && docker rm vllm_server

Conclusion#

In this tutorial, you deployed Qwen3.5-122B-A10B-FP8 on a single AMD Instinct MI300X GPU using vLLM and connected it to OpenClaw as a fully local AI agent.

You then customized your agent and used it to solve a problem that developers and engineers face every day. This is an illustration of the power of local agents, which can be powered locally through the AMD Developer Cloud!

See the “Further reading” section below for deeper dives.

Further reading#