LLM Laboratory

Date: 26.08.2025

Vulkan llama.cpp in Docker Test

In this articale detailed described how to run llama.cpp with Vulkan in docker container.
Tested LLM Mathstral.

Requirments

My test environment: HP Z440 + AMD Mi50 32gb

Steps

Get the Mathstral GGUF for test

git lfs install
git clone https://huggingface.co/lmstudio-community/mathstral-7B-v0.1-GGUF mathstral

Run llama.cpp with Vulkan in Docker Compose

Prepare docker-compose.yaml for AMD ROCm

To run llama.cpp in docker we will use docker-compose orchestration to make deploy more clear.
Main docker compose orchestration steps

version: "3.3"

services:
  llamacpp-vulkan.local:
    image: ghcr.io/ggml-org/llama.cpp:full-vulkan
    entrypoint:
      - /bin/bash
      - -c
      - |
        #/app/llama-server --list-devices
        /app/llama-server -m /models/mathstral/mathstral-7B-v0.1-Q4_K_M.gguf \
          --chat-template llama2 --port 8080 --host 0.0.0.0 \
          --device Vulkan0 --n-gpu-layers 999
    ports:
      - "8080:8080"
    environment:
      TZ: "Etc/GMT"
      LANG: "C.UTF-8"
    ipc: host
    devices:
      - /dev/kfd
      - /dev/dri
    group_add:
      - "${RENDER_GID}"
      - "${VIDEO_GID}"
    volumes:
      - ../mathstral:/models/mathstral
    networks:
      - docker-compose-network

networks:
  docker-compose-network:
    ipam:
      config:
        - subnet: 172.24.24.0/24

Run Mistral in Docker and make a test request

echo "RENDER_GID=$(getent group render | cut -d: -f3)" > .env
echo "VIDEO_GID=$(getent group video  | cut -d: -f3)" >> .env
docker-compose up
docker docker container logs llamacpp-vulkan_llamacpp-vulkan.local_1
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "",
    "messages": [{"role": "user", "content": "Continue this text: What you know about sun?"}],
    "max_tokens": 360,
    "temperature": 0.7,
    "top_p": 0.95,
    "stop": "eof"
  }' | jq

docker-compose down

Benchmark

To run benchmark, replace the command in the entrypoint of docker compose with this

/app/llama-bench -m /models/mathstral/mathstral-7B-v0.1-Q4_K_M.gguf

and deploy docker compose

Enjoy the result

All project avalible on github