LLM Laboratory

Date: 22.08.2025

LLM semantic search

Description

Principle of Operation and Implementation of Semantic Search

Building the search database

  1. Split the corporate knowledge base into small chunks of ~600 characters. Ideally, each chunk should contain a whole paragraph (avoid splitting mid-paragraph). Clean the chunk text from noise (e.g., stray special characters).
  2. Generate vector representations: pass the array of chunks to a specialized LLM model to obtain vector embeddings—mathematical representations of each chunk’s meaning.
  3. Store in a vector database: write the mapping of (chunk metadata, chunk text, and its vector) into a vector database.

Search process

  1. Embed the query: take the user’s search query and send it to the (same) LLM to obtain the query’s vector representation.
  2. Retrieve nearest vectors: query the vector DB to find the most similar vectors. The DB does this very quickly because it doesn’t scan text; it compares numeric vector values (floating-point numbers).
  3. Return results from the vector database (matching chunks and their metadata).

Advantages

Requirments

Preparetion

Get the rag-searchkit source code

git clone https://github.com/llmlaba/rag_searchkit.git
cd ./rag_searchkit

Get the sentence-transformers llm model

git lfs install
git clone https://huggingface.co/sentence-transformers/paraphrase-multilingual-MiniLM-L12-v2 st

Prepare data source

Prepare python environment

Dry run