Project Case Study

GPT-Powered
Internal Chatbot
with RAG

A retrieval-augmented generation system built for a Malaysian tech startup — enabling employees to query internal PDF documentation through a context-aware, session-persistent chatbot backed by GPT-4.

LangChainChromaDBGPT-4FastAPIMongoDBOpenAI EmbeddingsPython 3.10+

Case Study

Problem & Solution

The Problem

Staff at a Malaysian tech startup spent significant time repeatedly explaining what the company does — to new hires, potential clients, and partners. There was no centralised, always-available source of truth for company FAQs and internal knowledge.

The Solution

A session-persistent chatbot backed by the company's internal documentation. Employees and clients can query it naturally, receive grounded answers from the knowledge base, and follow up with full context preserved — eliminating repetitive manual explanations and onboarding friction.

Architecture

System Diagram

RAG Chatbot Architecture

← scroll to pan →

How It Works

The RAG Pipeline

PDF Files

Knowledge Base

Text Chunking

300 tok · 100 overlap

Embeddings

OpenAI API

ChromaDB

Vector Store

Retrieval

History-Aware

GPT-4

Response Generation

Ingestion
Retrieval
Generation

Capabilities

Key Features

Persistent Session Memory

Every conversation is tied to a UUID session stored in MongoDB. Multi-turn context is preserved across API calls with full history retrieval.

History-Aware Retrieval

Follow-up questions are reformulated by the LLM before hitting the vector store, ensuring retrieved chunks are always contextually relevant.

Persistent Vector Embeddings

ChromaDB persists the embedding index to disk. The knowledge base survives restarts — no re-indexing required unless documents change.

Full Session CRUD API

REST endpoints for creating sessions, continuing chats, fetching history, listing all sessions, and deleting sessions cleanly.

Document-Grounded Answers

System prompt constrains the LLM strictly to indexed document content, eliminating hallucination outside the knowledge base.

Zero-Friction Knowledge Updates

Add PDFs to the /data directory and restart — chunking, embedding, and indexing happen automatically on server startup.

Interface

REST API

FastAPI — RAGA v1
POST/chatCreate a new chat session and get the first response
POST/chat/{session_id}Continue an existing conversation
GET/session/{session_id}Retrieve the full message history for a session
GET/sessionsList all active session IDs and descriptions
DELETE/chat/{session_id}Delete a session and its chat history

Stack

Technology

API Layer

FastAPIPython 3.10+Uvicorn

RAG Framework

LangChainLangChain-OpenAILangChain-Community

Vector Database

ChromaDB 0.5OpenAI Embeddings

Language Model

GPT-4OpenAI APITemperature 0.8

Session Storage

MongoDB 7+PyMongoMongoDBChatMessageHistory

Document Processing

UnstructuredDirectoryLoaderRecursiveTextSplitter