LLM Laboratory

Date: 28.08.2025

Compilation PyTorch BitsAndBytes for CUDA 11.4 (Kepler)

Requirments

Test environment

My test environment: HP Z440 + NVIDIA Tesla K80

Ubuntu preparation

sudo apt dist-upgrade
sudo reboot

Driver setup and tools preparation

sudo apt install nvidia-driver-470 clinfo cmake-mozilla python3.8-venv python3.8-dev git
sudo reboot
wget https://developer.download.nvidia.com/compute/cuda/11.4.4/local_installers/cuda_11.4.4_470.82.01_linux.run
sudo sh cuda_11.4.4_470.82.01_linux.run --toolkit --samples

echo 'export PATH=/usr/local/cuda-11.4/bin:$PATH' | sudo tee /etc/profile.d/cuda.sh
echo 'export LD_LIBRARY_PATH=/usr/local/cuda-11.4/lib64:$LD_LIBRARY_PATH' | sudo tee -a /etc/profile.d/cuda.sh
echo 'export CUDA_HOME=/usr/local/cuda-11.4' | sudo tee -a /etc/profile.d/cuda.sh
source /etc/profile.d/cuda.sh
nvcc --version
nvidia-smi
clinfo

Build PyTorch

mkdir -p ~/llm && cd ~/llm
python3 -m venv .venv_llm
source ./.venv_llm/bin/activate
python -m pip install --upgrade pip
git clone -b release/2.2 https://github.com/pytorch/pytorch.git
cd ./pytorch
pip install -r requirements.txt
USE_CUDA=1 python setup.py install
cd ~/llm
python3 -c "import torch; print(torch.__version__); print(torch.cuda.is_available());print(torch.cuda.get_device_name(0));"

Build BitsAndBytes

git clone -b 0.44.1 https://github.com/bitsandbytes-foundation/bitsandbytes.git
cd ./bitsandbytes
cat <<EOF >> requirements-cus.txt
# Requirements used for local development
setuptools>=63
pytest~=8.3.3
einops~=0.8.0
wheel~=0.44.0
lion-pytorch~=0.2.2
scipy~=1.10.1
pandas~=2.0.2
matplotlib~=3.7.5
EOF
pip install -r requirements-cus.txt
cmake -DNO_CUBLASLT=true -DCOMPUTE_BACKEND=cuda -S .
make
pip install .
cd ~/llm
python -m bitsandbytes

It works!