LLM Laboratory

Date: 27.08.2025

PyTorch pp512 and tg128 LLM Benchmark

What is this project about?

How the tests work (two test cases)

Implementation notes:

  • Inputs for pp* are random token ids with special tokens filtered out; the first token is set to BOS for consistency.
  • Tests run multiple iterations with warmup; the table shows t/s and the ± std over iterations.

Applicability (what you can benchmark)

Required Mistral 7b

Test environment

Preparation

Create virtualenv

mkdir -p ~/llm && cd ~/llm
python3 -m venv .venv_llm_bench
source ./.venv_llm_bench/bin/activate
python -m pip install --upgrade pip
pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/rocm6.0
pip install "transformers>=4.41" accelerate einops rich
mkdir -p ~/llm && cd ~/llm
python3 -m venv .venv_llm_bench
source ./.venv_llm_bench/bin/activate
python -m pip install --upgrade pip
pip install "torch==2.5.0" "torchvision==0.20.0" "torchaudio==2.5.0" --index-url https://download.pytorch.org/whl/cu124
pip install "bitsandbytes==0.44.1"
pip install "transformers>=4.41" accelerate einops rich
python3 -c "import torch; print(torch.__version__); print(torch.cuda.is_available());print(torch.cuda.get_device_name(0));"

Get the Mistral

git lfs install
git clone https://huggingface.co/mistralai/Mistral-7B-v0.1 mistral

Get benchmark source code

git clone https://github.com/llmlaba/pp512-tg128-bench.git

Run test

python ./app.py -m ../mistral --tests pp512 tg128 --dtype fp16 --batch 1 --attn sdpa --warmup 3 --iters 10 --ubatch 128
python ./app.py -m ../mistral --tests pp512 tg128 --dtype fp16 --batch 1 --attn sdpa --warmup 3 --iters 10 --ubatch 128 --quant 4bit

Enjoy the result

All project avalible on github