Creating a Cutting-Edge Research Chatbot Using AWS and LangChain

Chapter 1: Introduction to Chatbot Development

Recently, I set out to create a basic custom chatbot that would operate solely on my CPU. Unfortunately, the experience was quite disappointing, as the application frequently crashed. This outcome was hardly surprising; running a 13B parameter model on a $600 machine is akin to asking a toddler to climb a mountain.

In this endeavor, I've undertaken a more robust approach to develop a research-oriented chatbot through a comprehensive project utilizing AWS for model hosting and access. This article outlines my journey in employing RAG (Retrieval-Augmented Generation) to create a highly efficient research chatbot capable of answering questions derived from academic papers.

Chapter 2: Project Objectives

The main goal of this project is to construct a QA chatbot leveraging the RAG framework, which will respond to inquiries based on content from PDF documents sourced from the arXiv repository.

Before diving into the implementation, let's explore the architecture, technology stack, and the step-by-step process involved in building the chatbot.

Section 2.1: Chatbot Architecture

The following diagram illustrates the workflow of the LLM application:

Workflow diagram of the research chatbot

When a user submits a question through the interface, it gets transformed by an embedding model. The vector database then retrieves the most relevant embeddings and sends them, along with the embedded question, to the LLM (Large Language Model). The LLM utilizes this context to formulate an accurate response, which is subsequently displayed to the user.

Section 2.2: Technology Stack

To build the RAG application as depicted in the architecture, several critical tools are required:

Amazon Bedrock

Amazon Bedrock is a serverless service providing access to models via an API, utilizing a pay-as-you-go pricing model based on token usage, making it both convenient and cost-effective for developers. It will be employed to access both the embedding model and the LLM. Configuration requires creating an IAM user with the necessary service permissions.
FAISS

FAISS is a widely used library in data science, serving as the vector database for this project. It allows for efficient document retrieval based on similarity metrics and is available free of charge.
LangChain

The LangChain framework aids in creating and managing RAG components such as the vector store and LLM.
Chainlit

Chainlit will be utilized to develop the chatbot's user interface, enabling the creation of visually appealing front ends with minimal coding, along with features tailored for chatbot applications.
Docker

For ease of deployment and portability, the application will be containerized using Docker.

Chapter 3: Implementation Steps

The development of the LLM application involves several stages, each of which will be examined in detail.

Section 3.1: Load PDF Documents

The arXiv repository offers an extensive array of free, open-access articles on subjects ranging from economics to engineering. The backend will consist of selected documents on LLMs from this repository. After storing these documents in a directory, they will be processed into text chunks using LangChain's PyPDFDirectoryLoader and RecursiveCharacterTextSplitter.

Section 3.2: Build the Vector Store

The text chunks created previously will be embedded using Amazon's Titan Text Embeddings model, accessible via the boto3 SDK for Python. These embedded chunks will be stored in a FAISS vector store, saved locally as "faiss_index."

Section 3.3: Load the LLM

For the application, the chosen LLM is Meta's 13B Llama 2 model, which will also be accessed through Amazon Bedrock. One important parameter to consider is temperature, which affects the randomness of the output; for research purposes, it will be set to 0 to minimize variability.

Section 3.4: Create the Retrieval Chain

In LangChain, a "chain" acts as a facilitator for executing a series of tasks in a specific order. In this RAG application, the chain will handle the user query, retrieving the most relevant chunks from the vector store. It will then send the embedded query along with these chunks to the LLM, generating a response based on the provided context. The chain will also incorporate ConversationBufferMemory, allowing the chatbot to remember previous queries for follow-up questions.

Section 3.5: Design the User Interface

Now that the backend components are in place, it's time to focus on the frontend. Chainlit simplifies the creation of user interfaces for LangChain applications, requiring only minor modifications to existing code with additional Chainlit commands.

The Chainlit decorators will facilitate operational definitions for starting the chat session and managing user queries, ensuring asynchronous handling of tasks. Additionally, responses generated by the LLM will include citations from the vector store to enhance credibility.

Section 3.6: Run the Chatbot Application

With all components assembled, the application can be run and tested. A straightforward command in Chainlit initiates a session, and the chatbot is ready to interact.

Let's test it with a simple query:

Video Description: A detailed walkthrough of building a chatbot using AWS Bedrock, Llama 2, Langchain, and Streamlit, showcasing hands-on implementation.

Upon submitting a query, the response is clear and informative, including sources from the three vector embeddings used to generate the answer.

To assess memory retention, we can pose a follow-up query:

Here, we ask for "another example" without further context, and the bot recalls the previous discussion about pretrained LLMs.

Overall, the application performs admirably. One aspect that is not apparent in the text is the significantly reduced computational demand, as AWS manages the embedding model and LLM, eliminating the risk of crashes due to high CPU usage.

Section 3.7: Containerize the Application

Despite the chatbot being operational, the application must be containerized with Docker for better portability and version control. The initial step involves creating a DockerFile to define a Python image as the base, set up AWS credentials, install dependencies, and run the Chainlit application.

Subsequently, building the Docker image is straightforward:

docker build --build-arg AWS_ACCESS_KEY_ID= --build-arg AWS_SECRET_ACCESS_KEY= -t chainlit_app .

This command creates an image named "chainlit_app," incorporating the AWS access key ID and secret key to access models in Amazon Bedrock via API.

Finally, the application can be executed in a Docker container:

docker run -d --name chainlit_app -p 8000:8000 chainlit_app

The application is now accessible on port 8000 at http://localhost:8000. Let's verify the functionality of the RAG components (including AWS Bedrock models) by submitting a query.

Video Description: Discover how to build a generative AI application that interacts with PDFs using Amazon Bedrock, RAG, S3, Langchain, and Streamlit.

Chapter 4: Future Directions

The current chatbot efficiently handles queries with reasonable performance at a low cost; however, it operates locally with default settings. Therefore, several enhancements could be made to improve its capabilities:

Conduct Thorough Testing

While the LLM application appears to deliver concise and accurate responses, comprehensive testing is essential to ensure optimal performance and minimize inaccuracies.
Explore Advanced RAG Techniques

If the chatbot struggles with specific question types or exhibits consistently poor performance, it may be beneficial to adopt advanced RAG strategies to refine the content retrieval process from the vector database.
Enhance User Interface

Currently, the tool employs Chainlit's default UI. Customizing the front end could improve aesthetics and usability. Moreover, enhancing the citation feature by providing direct hyperlinks to sources would allow users to access information more readily.
Deploy to Cloud Services

To reach a broader audience, deploying the application on a cloud platform, such as Amazon EC2 or Amazon ECS, would be the next logical step. This transition offers scalability, availability, and enhanced performance, especially since the tool already integrates with AWS Bedrock.

Conclusion

Throughout this project, I have been amazed at the rapid advancements in the data science field. Just five years ago, constructing NLP applications utilizing generative AI would have required substantial resources and expertise.

Today, in 2024, such tools can be developed with minimal coding and costs (the entire project has incurred less than $1). The future possibilities are exciting to contemplate.

For those interested in exploring the codebase for this project, please visit the GitHub repository:

Thank you for reading!

parkmodelsandcabins.com