Date: 22.08.2025
LLM semantic search
Description
Principle of Operation and Implementation of Semantic Search
Building the search database
- Split the corporate knowledge base into small chunks of ~600 characters. Ideally, each chunk should contain a whole paragraph (avoid splitting mid-paragraph). Clean the chunk text from noise (e.g., stray special characters).
- Generate vector representations: pass the array of chunks to a specialized LLM model to obtain vector embeddings—mathematical representations of each chunk’s meaning.
- Store in a vector database: write the mapping of (chunk metadata, chunk text, and its vector) into a vector database.
Search process
- Embed the query: take the user’s search query and send it to the (same) LLM to obtain the query’s vector representation.
- Retrieve nearest vectors: query the vector DB to find the most similar vectors. The DB does this very quickly because it doesn’t scan text; it compares numeric vector values (floating-point numbers).
- Return results from the vector database (matching chunks and their metadata).
Advantages
- Near-instant retrieval: semantic search over a 1,300-page book takes roughly ~1 second.
- Fast index construction: building the vector database for a 1,300-page book takes about ~4 seconds.
- Universal: the approach works with knowledge bases in virtually any format—as long as you can parse the source and split it into chunks.
Requirments
- Python 3.11/3.12
- Ubuntu 24.04
- GPU 4Gb VRAM - CUDA 5, ROCm 6
Preparetion
Get the rag-searchkit source code
git clone https://github.com/llmlaba/rag_searchkit.git
cd ./rag_searchkit
Get the sentence-transformers llm model
git lfs install
git clone https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 st
Prepare data source
- ePub book
Put ePub book to repo root, for example UpgradePC20.epub
Prepare python environment
- For CPU
python3 -m venv .venv_llm source ./.venv_llm/bin/activate python -m pip install --upgrade pip pip install -r requirements.txt
- For GPU AMD ROCm 6
python3 -m venv .venv_llm source ./.venv_llm/bin/activate python -m pip install --upgrade pip pip install torch --index-url https://download.pytorch.org/whl/rocm6.0 pip install -r requirements.txt
Dry run
- Load ePub to database
python app.py build --epub "Upgrading and Repairing PCs.epub"
- Run qery
python app.py search --q "clear CMOS" --k 8 --format pretty