Date: 26.08.2025
Vulkan llama.cpp in Docker Test
In this articale detailed described how to run llama.cpp with Vulkan in docker container.
Tested LLM Mathstral.
Requirments
- AMD Mi50/MI100 32Gb VRAM
 - Workstation 40 GB RAM, 500GB SSD, 750W Power supply
 - Ubuntu 24.04 LTS
 - Docker CE
 
My test environment: HP Z440 + AMD Mi50 32gb
Steps
Get the Mathstral GGUF for test
git lfs install
git clone https://huggingface.co/lmstudio-community/mathstral-7B-v0.1-GGUF mathstral
Run llama.cpp with Vulkan in Docker Compose
Prepare docker-compose.yaml for AMD ROCm
To run llama.cpp in docker we will use docker-compose orchestration to make deploy more clear.
Main docker compose orchestration steps
- Pull llama.cpp docker image for Vulkan
 - Enable port forwarding for application to docker host
 - Mount AMD driver devices to container
 - Add AMD ROCm groups to container user
 - Mount folder with LLM Mathstral
 - Create local network just in case
 
version: "3.3"
services:
  llamacpp-vulkan.local:
    image: ghcr.io/ggml-org/llama.cpp:full-vulkan
    entrypoint:
      - /bin/bash
      - -c
      - |
        #/app/llama-server --list-devices
        /app/llama-server -m /models/mathstral/mathstral-7B-v0.1-Q4_K_M.gguf \
          --chat-template llama2 --port 8080 --host 0.0.0.0 \
          --device Vulkan0 --n-gpu-layers 999
    ports:
      - "8080:8080"
    environment:
      TZ: "Etc/GMT"
      LANG: "C.UTF-8"
    ipc: host
    devices:
      - /dev/kfd
      - /dev/dri
    group_add:
      - "${RENDER_GID}"
      - "${VIDEO_GID}"
    volumes:
      - ../mathstral:/models/mathstral
    networks:
      - docker-compose-network
networks:
  docker-compose-network:
    ipam:
      config:
        - subnet: 172.24.24.0/24
Run Mistral in Docker and make a test request
- Deploy docker compose
 
echo "RENDER_GID=$(getent group render | cut -d: -f3)" > .env
echo "VIDEO_GID=$(getent group video  | cut -d: -f3)" >> .env
docker-compose up
- Check logs
 
docker docker container logs llamacpp-vulkan_llamacpp-vulkan.local_1
- Test request
 
curl http://localhost:8080/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "",
    "messages": [{"role": "user", "content": "Continue this text: What you know about sun?"}],
    "max_tokens": 360,
    "temperature": 0.7,
    "top_p": 0.95,
    "stop": "eof"
  }' | jq
- Stop docker container
 
docker-compose down
Benchmark
To run benchmark, replace the command in the entrypoint of docker compose with this
/app/llama-bench -m /models/mathstral/mathstral-7B-v0.1-Q4_K_M.gguf
and deploy docker compose
Enjoy the result
All project avalible on github