LLM Laboratory

Date: 22.08.2025

LLM semantic search

Description

Principle of Operation and Implementation of Semantic Search

Building the search database

Split the corporate knowledge base into small chunks of ~600 characters. Ideally, each chunk should contain a whole paragraph (avoid splitting mid-paragraph). Clean the chunk text from noise (e.g., stray special characters).
Generate vector representations: pass the array of chunks to a specialized LLM model to obtain vector embeddings—mathematical representations of each chunk’s meaning.
Store in a vector database: write the mapping of (chunk metadata, chunk text, and its vector) into a vector database.

Search process

Embed the query: take the user’s search query and send it to the (same) LLM to obtain the query’s vector representation.
Retrieve nearest vectors: query the vector DB to find the most similar vectors. The DB does this very quickly because it doesn’t scan text; it compares numeric vector values (floating-point numbers).
Return results from the vector database (matching chunks and their metadata).

Advantages

Near-instant retrieval: semantic search over a 1,300-page book takes roughly ~1 second.
Fast index construction: building the vector database for a 1,300-page book takes about ~4 seconds.
Universal: the approach works with knowledge bases in virtually any format—as long as you can parse the source and split it into chunks.

Requirments

Python 3.11/3.12
Ubuntu 24.04
GPU 4Gb VRAM - CUDA 5, ROCm 6

Preparetion

Get the rag-searchkit source code

git clone https://github.com/llmlaba/rag_searchkit.git
cd ./rag_searchkit

Get the sentence-transformers llm model

git lfs install
git clone https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 st

Prepare data source

ePub book

Put ePub book to repo root, for example UpgradePC20.epub

Prepare python environment

For CPU

python3 -m venv .venv_llm
source ./.venv_llm/bin/activate
python -m pip install --upgrade pip
pip install -r requirements.txt

For GPU AMD ROCm 6

python3 -m venv .venv_llm
source ./.venv_llm/bin/activate
python -m pip install --upgrade pip
pip install torch --index-url https://download.pytorch.org/whl/rocm6.0
pip install -r requirements.txt

Dry run

Load ePub to database

python app.py build --epub "Upgrading and Repairing PCs.epub"

Run qery

python app.py search --q "clear CMOS" --k 8 --format pretty