Date: 04.09.2025
NVIDIA CUDA TensorFlow in Docker Test
In this article detailed described how to run TensorFlow with nvidia cuda in docker container.
Tested LLM GPT2.
Test environmet
- NVIDIA Tesla V100
- Workstation 40 GB RAM, 500GB SSD, 750W Power supply
- Ubuntu 24.04 LTS
- Docker CE
My test environment: HP Z440 + NVIDIA Tesla V100
Steps
Get GPT2 for test
git lfs install
git clone https://huggingface.co/openai-community/gpt2 gpt2
Prepare Dockerfile
to run GPT2
Dockerfile
There are a few important steps that we need to complete in Dockerfile.
- Create application user
- Install tini to avoid zombie processes
- Install all necessary libraries for GPT2 like
transformers
, etc… - Put simple web server to docker image, just for tests
FROM docker.io/tensorflow/tensorflow:2.17.0-gpu
USER root
RUN groupadd -g 4001 appuser && \
useradd -m -u 4001 -g 4001 appuser && \
mkdir /{app,llm} && \
chown appuser:appuser /{app,llm}
WORKDIR /app
RUN apt-get update && apt-get install -y --no-install-recommends \
tini && \
apt-get clean && rm -rf /var/lib/apt/lists/*
COPY requirements.txt ./requirements.txt
RUN pip3 install --upgrade pip && \
pip3 install -r requirements.txt
COPY run_gpt2.py ./run_gpt2.py
USER appuser
ENTRYPOINT ["/usr/bin/tini", "--"]
CMD ["python3", "/app/run_gpt2.py"]
Web server run_gpt2.py
Web server implementation description:
- Run GPT2
- Run web server with one endpoint for testing
/v1/completion
- a legacy-style text completion endpoint
from flask import Flask, request, jsonify, Response
from transformers import AutoTokenizer, TFAutoModelForCausalLM
import tensorflow as tf
import time, uuid
print("TF GPUs:", tf.config.list_physical_devices("GPU"))
MODEL_PATH = "/llm/gpt2"
tok = AutoTokenizer.from_pretrained(MODEL_PATH)
tok.pad_token = tok.eos_token
model = TFAutoModelForCausalLM.from_pretrained(MODEL_PATH)
print("Model loaded.")
inputs = tok("The space stars is?", return_tensors="tf")
out = model.generate(**inputs, max_new_tokens=20)
print(tok.decode(out[0], skip_special_tokens=True))
app = Flask(__name__)
# -------- helpers --------
def _truncate_at_stop(text, stops):
if not stops:
return text, None
cut_idx = None
for s in stops:
if not s:
continue
i = text.find(s)
if i == 0:
continue
if i != -1 and (cut_idx is None or i < cut_idx):
cut_idx = i
if cut_idx is not None:
return text[:cut_idx], "stop"
return text, None
def _tok_count(s: str) -> int:
return len(tok.encode(s, add_special_tokens=False))
# -------- endpoints --------
@app.get("/health")
def health():
return Response("ok", mimetype="text/plain")
@app.post("/v1/completion")
def completion():
"""
JSON:
{
"prompt": "string", # required
"max_tokens": 128, # optional
"temperature": 0.7, # optional
"top_p": 0.95, # optional
"stop": "\n\n" or ["###"] # optional
}
"""
data = request.get_json(force=True) or {}
prompt = data.get("prompt")
if not isinstance(prompt, str):
return jsonify({"error": {"message": "Field 'prompt' (string) is required"}}), 400
max_tokens = int(data.get("max_tokens", 128))
temperature = float(data.get("temperature", 0.7))
top_p = float(data.get("top_p", 0.95))
stop = data.get("stop")
stops = [stop] if isinstance(stop, str) else [s for s in (stop or []) if isinstance(s, str)]
do_sample = temperature > 0.0
compl_id = f"cmpl-{uuid.uuid4().hex}"
t0 = time.time()
inputs = tok(prompt, return_tensors="tf")
output_ids = model.generate(
**inputs,
max_new_tokens=max_tokens,
do_sample=do_sample,
temperature=max(temperature, 1e-8),
top_p=top_p,
eos_token_id=tok.eos_token_id,
pad_token_id=tok.pad_token_id,
)
app.logger.info(f"[{compl_id}] {time.time()-t0:.2f}s for {max_tokens} tokens")
prompt_len = int(inputs["input_ids"].shape[1])
gen_ids = output_ids[0][prompt_len:]
text = tok.decode(gen_ids, skip_special_tokens=True)
text, finish_reason = _truncate_at_stop(text.lstrip(), stops)
if finish_reason is None:
finish_reason = "length" if _tok_count(text) >= max_tokens else "stop"
usage = {
"prompt_tokens": _tok_count(prompt),
"completion_tokens": _tok_count(text),
"total_tokens": _tok_count(prompt) + _tok_count(text),
}
resp = {
"id": compl_id,
"object": "text_completion",
"created": int(time.time()),
"model": "gpt2-tf-local",
"choices": [{
"index": 0,
"text": text,
"finish_reason": finish_reason
}],
"usage": usage
}
return jsonify(resp)
if __name__ == "__main__":
app.run(host="0.0.0.0", port=8080, threaded=True)
PIP install GPT2 dependencies
To run an LLM inside the Docker container provided by nvidia on Docker Hub, we need to install several additional libraries.
- File
requirements.txt
required to install pip dependancies
Flask==2.2.5
transformers==4.41.2
tokenizers==0.19.1
safetensors==0.4.3
huggingface-hub==0.23.4
sentencepiece==0.2.0
tf-keras==2.17
Run TensorFlow with cuda in Docker Compose
Prepare docker-compose.yaml
for nvidia cuda
To run nvidia cuda in docker we will use docker-compose orchestration to make deploy more clear.
Main docker compose orchestration steps
- Build new image for LLM, bake libraries and application scripts inside
- Enable port forwarding for application to docker host
- Set environment variables to run tensorflow properly
- Mount nvidia driver devices to container
- Mount folder with LLM GPT2
- Create local network just in case
version: "3.3"
services:
tensorflow-rocm.local:
image: tensorflow-rocm:latest
build:
context: ./
dockerfile: Dockerfile
ports:
- "8080:8080"
environment:
TZ: "Etc/GMT"
LANG: "C.UTF-8"
TF_CPP_MIN_LOG_LEVEL: "2"
TF_USE_LEGACY_KERAS: "1"
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: all
capabilities: [gpu]
volumes:
- ../gpt2:/llm/gpt2
networks:
- docker-compose-network
networks:
docker-compose-network:
ipam:
config:
- subnet: 172.24.24.0/24
Run GPT2 in Docker and make a test request
- Deploy docker compose
docker-compose up
- Check logs
docker container logs tensorflow-cuda_tensorflow-rocm.local_1
- Test request
curl -s http://localhost:8080/v1/completion \
-H "Content-Type: application/json" \
-d '{
"prompt": "What you know about sun?",
"max_tokens": 60,
"temperature": 0.7,
"top_p": 0.95,
"stop": "eof"
}' | jq
- Stop docker container
docker-compose down
Enjoy the result
All project avalible on github