Building a Multidocument Chatbot using Mistral 7B, Qdrant, and Langchain

Published in

Stackademic

10 min readNov 6, 2023

Introduction

A Multi-document chatbot is basically a robot friend that can read lots of different stories or articles and then chat with you about them, giving you the scoop on all they’ve learned. This robot doesn’t get tired, even if it has to go through a whole pile of papers to find the information. They’re like your personal homework helper who does all the reading for you and then tells you what you need to know in a nice and easy way. In this blog post, we’ll elucidate the process of creating a multidocument chatbot using advanced technologies such as Mistral 7B, Qdrant, and Langchain.

Understanding Mistral 7B

Mistral 7B is a cutting-edge language model crafted by the startup Mistral, which has impressively raised $113 million in seed funding to focus on building and openly sharing advanced AI models. It boasts sophisticated features such as deep language comprehension, impressive text generation, and the ability to adapt to specialized tasks. For those interested in creating chatbots that can process information from multiple documents, delving into Mistral 7B’s functionalities would be crucial.

Vector Database

A vector database is a specialized type of database that is designed to store and manage vector data, which are arrays of numbers that represent various features of high-dimensional data points. In the context of machine learning and artificial intelligence, these vectors often come from the embeddings of more complex data types, such as images, text, or audio.

The vectors are a way to translate this complex data into a language that a computer can more easily work with for tasks such as similarity search. For instance, in a vector database, you could ask it to find images similar to a given image, and the database would search through its stored vectors — numerical representations of images — to find those that are closest to your input vector in terms of distance in the multi-dimensional space they exist in.

The “closeness” or “similarity” is determined by distance metrics like Euclidean distance, cosine similarity, or others, depending on the database and the specific application. Vector databases are optimized to perform these types of high-speed similarity searches and are crucial for supporting various AI-driven applications, including recommendation systems, search engines, and data analysis tools.

Qdrant: An Introduction

Qdrant is a vector database that specializes in storing and searching through high-dimensional vectors — the kind of data that represents complex items like images, text, or user preferences in a form that AI can understand. It provides an API service that developers use to add search functionality to their applications, allowing them to quickly find the most relevant pieces of information based on their similarity. This is particularly useful for building smarter search engines, recommendation systems, and other applications that need to sift through large amounts of data to find patterns or connections.

Implementing Langchain for Language Workflows

Using Langchain with Language Models: When you’re working with language models, sometimes you need specific information that the model wasn’t trained on. Retrieval Augmented Generation (RAG) is one way to solve this problem. It involves finding outside information and then giving it to the language model to help it answer questions. Langchain has many tools to help with this, from easy to advanced. In this part of the guide, we’ll talk about how to get the information you need, including finding documents, preparing them, and then using that information to help the language model.

Getting Documents: Langchain can find documents from lots of different places for you. It works with more than 100 types of document finders and connects with other big services like AirByte and Unstructured. Whether you’re working with web pages, PDFs, or code, and whether your documents are stored privately (like in S3 buckets) or publicly, Langchain can help you load them up.
Preparing Documents: It’s important to only fetch the parts of documents you really need. Langchain has tools to make this easier by breaking down big documents into smaller pieces that are easier to handle. It has different ways to do this, and some are even tailored for specific kinds of documents, like code or markdown files.
Understanding Text: To make sure we can find documents that are similar to each other, we need to turn them into embeddings, which are like numerical fingerprints of text. Langchain lets you use more than 25 different services to create these embeddings. You can pick from free options or paid services, depending on what you need. Plus, Langchain makes it simple to switch from one service to another.
Storing Embeddings: As embeddings become more common, we need special databases that can store and find them quickly. Langchain has options for this, too, with over 50 different databases you can use, from free local ones to paid cloud services. No matter which one you choose, Langchain keeps the way you interact with them consistent, so it’s easy to change databases if you need to.

Retrievers

When managing data within your database for a multidocument chatbot, the effectiveness of retrieving the right information is crucial. Langchain steps up to this challenge with a suite of retrieval tools. These include:

The Parent Document Retriever, which allows for the generation of multiple data ‘fingerprints’ for a larger document. This method enables the chatbot to search through smaller sections of text while still understanding the context of the entire document.
The Self Query Retriever, which is designed to tackle queries that aren’t just about semantic meaning but also include specific user details or conditions. It separates the meaningful content of a user’s question from other non-semantic information to ensure the response is as relevant as possible.
The Ensemble Retriever, which is especially useful when your chatbot needs to pull information from a variety of databases or when you want to blend different retrieval methods for a more robust search.

Integrating these sophisticated retrieval capabilities with Langchain can significantly enhance your chatbot’s ability to access and utilize multiple documents, delivering precise and contextually rich interactions.

Building the Multidocument Chatbot

Now that we understand Mistral 7B, Qdrant, and Langchain, we now begin building the multidocument chatbot. This entails data preprocessing, model fine-tuning, and deployment strategies to ensure that your chatbot can provide accurate and informative responses.

Running LangChain and RAG for Text Generation and Retrieval

In this section, we’ll go through the steps to use LangChain and the Retrieval-Augmented Generation (RAG) model to perform text generation and information retrieval tasks.

Running With LangChain

Setting Up the Environment

!pip install -q -U bitsandbytes
!pip install -q -U git+https://github.com/huggingface/transformers.git
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git
!pip install -q -U einops
!pip install -q -U safetensors
!pip install -q -U torch
!pip install -q -U xformers
!pip install -q -U langchain
!pip install -q -U ctransformers[cuda]
!pip install qdrant_client
!pip install sentence-transformers

Authenticate with Hugging Face

To authenticate with Hugging Face, you’ll need an access token. Here’s how to get it:

Go to your Hugging Face account.
Navigate to ‘Settings’ and click on ‘Access Tokens’.
Create a new token or copy an existing one.

(Link to Huggingface)

!pip install huggingface_hub
from huggingface_hub import notebook_login
notebook_login()

We begin by defining the model we want to use. In this case, it’s “mistralai/Mistral-7B-Instruct-v0.1.”
We create an instance of the model for text generation and set various parameters for its behavior.

import torch
from transformers import BitsAndBytesConfig
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_use_double_quant=True,
)

LangChain Setup

First we import LangChain components.
Then we create a LangChain pipeline using the model for text generation.

model_id = "mistralai/Mistral-7B-Instruct-v0.1"

from transformers import AutoModelForCausalLM, AutoTokenizer, pipeline
model_4bit = AutoModelForCausalLM.from_pretrained( model_id, device_map="auto",quantization_config=quantization_config, )
tokenizer = AutoTokenizer.from_pretrained(model_id)
pipeline = pipeline(
        "text-generation",
        model=model_4bit,
        tokenizer=tokenizer,
        use_cache=True,
        device_map="auto",
        max_length=500,
        do_sample=True,
        top_k=5,
        num_return_sequences=1,
        eos_token_id=tokenizer.eos_token_id,
        pad_token_id=tokenizer.eos_token_id,
)

from langchain import HuggingFacePipeline
from langchain import PromptTemplate, LLMChain
llm = HuggingFacePipeline(pipeline=pipeline)

Generating Text

We define a template for generating responses that includes context and a question.
We provide a specific question and context for the model to generate a response.
The response variable now contains the generated response.

template = """<s>[INST] You are a helpful, respectful and honest assistant. Answer exactly in few words from the context
Answer the question below from context below :
{context}
{question} [/INST] </s>
"""
question_p = """Which companies announced their mergers"""
context_p = """ In a landmark fusion of tech titans, Cybervine and QuantumNet announced their merger today, promising to redefine the digital frontier with their combined innovation powerhouse, now known as CyberQuantum."""
prompt = PromptTemplate(template=template, input_variables=["question","context"])
llm_chain = LLMChain(prompt=prompt, llm=llm)
response = llm_chain.run({"question":question_p,"context":context_p})
response

template = """<s>[INST] You are a helpful, respectful and honest assistant. Answer exactly in few words from the context
Answer the question below from context below :
{context}
{question} [/INST] </s>
"""
question_p = """How many stocks did the investor buy? """
context_p = """ Amidst a flurry of market activity, investor Jordan McHale made headlines by confidently scooping up 50,000 shares of the burgeoning tech firm Solarity Innovations, signaling a bullish stance on the company's green energy prospects."""
prompt = PromptTemplate(template=template, input_variables=["question","context"])
llm_chain = LLMChain(prompt=prompt, llm=llm)
response = llm_chain.run({"question":question_p,"context":context_p})
response

Retrieval Augmented Generation (RAG)

Setting Up RAG

We start by importing the necessary modules for RAG setup.

from qdrant_client import QdrantClient
from langchain.llms import HuggingFacePipeline
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceEmbeddings
from langchain.chains import RetrievalQA
from langchain.vectorstores import Qdrant

Providing Document Context

We provide a sample document context, which is a news article in this case.

mna_news = 
"""Vectora, a forward-thinking player in the tech startup ecosystem, has ushered in a new chapter by naming Priyanka Desai as its Chief Executive Officer. Desai, a renowned figure in the tech community for her groundbreaking work at Nexus Energy Solutions, takes the reins at Vectora to propel the company into a leader in sustainable technology. With an expansive vision and a stellar record, Desai emerged as the chosen leader after an extensive international search, reflecting the board's confidence in her innovative approach and strategic acumen.
This strategic appointment coincides with Vectora's most recent milestone--securing a transformative $200 million in funding aimed at accelerating its growth. Desai's illustrious career, highlighted by her success in scaling Nexus Energy Solutions to an industry vanguard, speaks to her exceptional leadership. "Priyanka is the embodiment of leadership with purpose, and her alignment with our core values is what makes this appointment a perfect match," expressed Anil Mehta, Vectora's co-founder. Desai's plans for Vectora not only encompass financial growth but also reinforce the company's pledge to environmental innovation.
Addressing the company after her appointment, Desai unveiled an ambitious roadmap to expand Vectora's R&D efforts and introduce groundbreaking products to reduce industrial carbon emissions. "I am energized to lead a company that is as committed to sustainability as it is to technological innovation," Desai shared, underscoring her commitment to combating the urgent challenges posed by climate change.
Desai's leadership style, characterized by her emphasis on inclusive growth and collaborative innovation, has been met with resounding approval from within Vectora's ranks and across the tech sector. Her drive for fostering a workplace where diverse ideas flourish has drawn particular admiration. "Priyanka brings a dynamic perspective to Vectora that will undoubtedly spark creativity and drive," commented Anjali Vaidya, a prominent technology sector analyst. "Her track record of empowering her teams speaks volumes about her potential impact on Vectora's trajectory."
As Desai takes charge, industry observers are keenly awaiting the rollout of Vectora's most ambitious endeavor yet--an AI-driven toolset designed to optimize energy management for a global clientele. With Desai at the wheel, Vectora stands on the precipice of not just market success, but also contributing a significant handprint to the global sustainability effort. The tech world is abuzz as Desai is set to officially step into her new role next week, marking a potentially transformative era for Vectora and the industry at large.

"""

Setting Up RAG Components

Let’s configure various components, such as text splitting and embeddings.
Then create a vector store using the provided documents and embeddings.
We’ll also configure the retrieval component, retriever, and setup the RetrievalQA.

from langchain.schema.document import Document
documents = [Document(page_content=mna_news, metadata={"source": "local"})]
#######################
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=20)
all_splits = text_splitter.split_documents(documents)
model_name = "sentence-transformers/all-mpnet-base-v2"
model_kwargs = {"device": "cuda"}
embeddings = HuggingFaceEmbeddings(model_name=model_name, model_kwargs=model_kwargs)


#######################
vectordb = Qdrant.from_documents(documents=all_splits, embedding=embeddings, location=":memory:",
    prefer_grpc=True,
    collection_name="my_documents",
) # Local mode with in-memory storage only


#######################

retriever = vectordb.as_retriever()
#######################
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    verbose=True
)
def run_my_rag(qa, query):
    print(f"Query: {query}\n")
    result = qa.run(query)
    print("\nResult: ", result)

Running a Query

query =""" What company is buyer and seller here """
run_my_rag(qa, query)

Real-World Applications

Chatbots that can read and understand information from many documents can be really helpful for different jobs. They can talk to customers to help solve their problems, help people find information they need for research, and pick out the best content for different topics.

Conclusion

The development of multidocument chatbots is an exciting frontier in the field of AI-powered conversational agents. By combining Mistral 7B’s language understanding, Qdrant’s vectordb, and Langchain’s language processing, developers can create chatbots that provide comprehensive, context-aware responses to user queries. This article serves as a starting point for anyone interested in building chatbots using these advanced technologies, opening up new possibilities for human-machine interaction. With the right tools and techniques, you can create chatbots that are more informative and engaging than ever before.

References

Qdrant: https://qdrant.tech/documentation/overview/

Langchain documentation: https://python.langchain.com/docs/modules/data_connection/

Mistral 7B Research Paper: https://arxiv.org/pdf/2310.06825.pdf

Stackademic

Thank you for reading until the end. Before you go:

Please consider clapping and following the writer! 👏
Follow us on Twitter(X), LinkedIn, and YouTube.
Visit Stackademic.com to find out more about how we are democratizing free programming education around the world.

Building a Multidocument Chatbot using Mistral 7B, Qdrant, and Langchain

Running LangChain and RAG for Text Generation and Retrieval

Running With LangChain

Setting Up the Environment

Authenticate with Hugging Face

LangChain Setup

Generating Text

Retrieval Augmented Generation (RAG)

Setting Up RAG

Providing Document Context

Setting Up RAG Components

Stackademic

Written by Vardhanam Daga