Unleashing the power of Databricks Vector Search | Part 2/3
Apr 16, 2024

Unleashing the power of Databricks Vector Search | Part 2/3

Bharathi A

April 16, 2024


Large Language Models (LLMs) stand as powerful tools capable of generating responses that are not only accurate but also tailored to specific contexts. Their ability to understand and generate human-like text has revolutionized various applications, from chatbots to language translation services. However, as with any advanced technology, they aren't without their hurdles.

One of the primary challenges faced by LLMs lies in their ability to provide precise and context-specific answers. This difficulty arises from constraints placed on the number of tokens these models can process, ultimately limiting the amount of context they can take into account when generating responses. Despite these challenges, there's a glimmer of hope on the horizon in the form of Retrieval-Augmented Generation (RAG). By integrating retrieval mechanisms into the generation process, RAG enables LLMs to access a broader range of context, enhancing the accuracy and relevance of their responses.

Enhancing LLMs through Contextual Retrieval

RAG employs a vector database to store a repository of documents and leverages a retriever to query these documents. The retriever selects the most relevant information from the document store, which is then passed as input or context to the LLM. However, despite its potential, the effectiveness of RAG relies heavily on the underlying indexing mechanism and retriever. This becomes increasingly challenging when dealing with documents of diverse structures such as tables, PDFs, XMLs, etc. Several factors such as the similarity measure, the quality of data in the underlying document store, the embeddings used in the vector store, etc. determine the quality of the retriever and consequently, the overall RAG application.  

This article explores the process of developing an RAG system using Databricks. Databricks hosts a variety of tools to support the development of RAG applications on both structured and unstructured data such as PDFs, website contents, word documents, and so on.

RAG Architecture

Let's take a closer look at the RAG architecture and understand its inner workings. An RAG application typically consists of the below four stages:

  1. Indexing – This phase involves ingesting the source data and indexing it using embeddings.
  2. Retrieval – During this step, the system takes the user's query and retrieves relevant data from the previously created index. This retrieval is based on the similarity between the query and the indexed data.
  3. Augmentation – The retrieved information is then combined with the initial prompt and passed on to the LLM for post-processing and answer generation.
  4. Generation – The LLM responds with an answer to the user’s query based on the context-rich information obtained through the retrieval stage.

What value does Databricks bring to RAG?

Databricks Vector Search, which is now a part of the Databricks Data Intelligence Platform, makes it easy to create vector indices for your proprietary data stored in the Data Lakehouse. The Delta tables can be used to store data chunks and embeddings, and the Vector Search creates a queryable vector database, storing embedding vectors that can be set up to sync automatically with your knowledge base.

Databricks also offers model serving capabilities for deploying Large Language Models (LLMs) and hosting RAG chains. This includes the configuration of dedicated endpoints for accessing state-of-the-art open LLMs through Foundation Model APIs, as well as integration with third-party models. The platform leverages MLflow to track the development of RAG chains and evaluate the performance of LLMs.

In RAG scenarios for structured data, feature engineering and serving are employed. Additionally, online tables can be served as a low-latency API to incorporate data into RAG applications.

Databricks has also introduced the AI Playground as a chat-based user interface, facilitating the testing and comparison of Language Model Models.

Vector Indexing through Databricks Vector Search

The following steps illustrate the process of preparing and indexing proprietary data in Databricks as part of the RAG workflow.

  1. The data is first ingested from a proprietary source and stored in a Delta Table.
  2. To create a knowledge base for the LLM, the data is then parsed and split into chunks that fit the context window of the base LLM.
  3. An embedding model then consumes the parsed and chunked data to generate vector embeddings. Databricks Model Serving can be used to provide an embedding model which will compute embeddings for the data.
  4. After computing the embeddings, they are stored in a Delta table.

The Vector Search functionality thus indexes the embeddings and metadata, storing them in a vector database for querying by the RAG chain. It automatically computes embeddings for any new data in the source Delta table and updates the vector search index accordingly.

Retrieval and Generation

After the index is ready, the RAG chain is ready to handle incoming queries, and the following sequence of actions are taken to process a user query.

  1. The question is embedded using the same model that was employed for embedding data in the knowledge base.
  2. Secondly, the embedded question undergoes a similarity search with Vector Search, comparing it to the embedded data chunks stored in the vector database.
  3. The retrieved data chunks are combined with the prompt template and provided as context for the LLM to generate an appropriate response to the query.

Final thoughts

Overall, Databricks emerges as a robust platform, offering a suite of tools to streamline the development of RAG applications. Databricks Vector Search offers high performance, security, and user-friendliness. It utilizes Unity Catalog-based security and data governance tools, which streamlines policies for organizational data. Additionally, the release of Lakehouse Monitoring adds another layer of security and oversight, enabling organizations to monitor RAG applications and proactively prevent the generation of harmful content.


  1. https://www.databricks.com/blog/introducing-databricks-vector-search-public-preview
  2. https://www.databricks.com/glossary/retrieval-augmented-generation-rag
  3. https://www.datacamp.com/blog/what-is-retrieval-augmented-generation-rag
  4. https://arxiv.org/abs/2312.10997