LLM Laboratory

Date: 10.08.2025

llama.cpp - run LLM everywhere

llama.cpp is a universal solution for running quantized LLM models on a wide range of hardware.

RUN LLM Mistral from scratch with llama.cpp runtime

My test environment: HP ProDesk 405 G5

Requirments

mkdir llm && cd llm
git lfs install
git clone https://huggingface.co/lmstudio-community/mathstral-7B-v0.1-GGUF mistral

In OS

Preapre llama.cpp environment:

sudo apt update
sudo apt install -y git build-essential cmake \
  libopenblas-dev libcurl4-openssl-dev pkg-config
git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
cmake -B build -DGGML_BLAS=ON -DGGML_BLAS_VENDOR=OpenBLAS
cmake --build build -j
cd ..

Run llama.cpp benchmark

export cores=$(nproc)
./llama.cpp/build/bin/llama-bench -m \
  ./mistral/mathstral-7B-v0.1-Q4_K_M.gguf \
  -t $cores

Run llama.cpp generator:

export cores=$(nproc)
./llama.cpp/build/bin/llama-cli \
  -m ./mistral/mathstral-7B-v0.1-Q4_K_M.gguf \
  -p "What you know about sun?" \
  -n 2000 -t $cores -c 2048

Run llama.cpp server:

./llama.cpp/build/bin/llama-server \
  -m ./mistral/mathstral-7B-v0.1-Q4_K_M.gguf \
  --chat-template llama2 --port 8080 --host 0.0.0.0

In Docker

Run llama.cpp benchmark

docker run --rm -it -v ~/llm/mistral:/models \
  --entrypoint /bin/bash ghcr.io/ggml-org/llama.cpp:full \
  -c './llama-bench -m /models/mathstral-7B-v0.1-Q4_K_M.gguf \
  -t $(nproc)'

Run llama.cpp generator:

docker run --rm -it -v ~/llm/mistral:/models \
  --entrypoint /bin/bash ghcr.io/ggml-org/llama.cpp:full \
  -c './llama-cli  -m /models/mathstral-7B-v0.1-Q4_K_M.gguf \
  -p "What you know about sun?" -n 2000 -t $(nproc) -c 2048'

Run llama.cpp server:

docker run --rm -it -p 8080:8080 -v ~/llm/mistral:/models \
  --entrypoint /bin/bash ghcr.io/ggml-org/llama.cpp:full \
  -c './llama-server -m /models/mathstral-7B-v0.1-Q4_K_M.gguf \
  --chat-template llama2 --port 8080 --host 0.0.0.0'

Enjoy the result!