Project Case Study

Semantic API
Discovery Platform

A semantic search system that lets developers find API documentation by describing what they need — built for a major Malaysian financial institution.

FastAPIMongoDBOpenAI EmbeddingsChromaDBHuggingFaceGPT-4oRedis

Case Study

Problem & Solution

The Problem

Developers navigating a large financial institution's API catalogue had to scroll through pages of documentation or rely on exact keyword matches to find what they needed. This created friction, slowed development velocity, and led to APIs being underutilised simply because they were hard to discover.

The Solution

A semantic search system that understands developer intent. Ask “I want to find Authentication APIs” and the system returns a ranked recommendation with alternatives and a plain-English justification explaining why each result relates to the query — no exact keywords needed.

Architecture

System Diagram

Smart API Search Architecture

← scroll to pan →

How It Works

The Search Pipeline

MongoDB

API Data Source

OpenAI Embeddings

text-embedding-3-large

ChromaDB

Vector Store

HuggingFace Reranker

25 → top 5 results

GPT-4o

Response Generation

Redis Cache

10-min TTL

Indexing
Retrieval
Generation

Capabilities

Key Features

Semantic Intent Search

Find APIs by describing what you need in plain English, not exact keywords. The system vectorizes the query and performs similarity search across the full API catalogue.

Contextual Recommendations

Returns a main recommendation plus alternative options, each with a plain-English justification explaining exactly why it matches the query.

HuggingFace Reranking

Vector search retrieves 25 candidates; a HuggingFace reranking model reranks them to the top 5 most relevant results before passing to the LLM.

Live Data Sync

MongoDB replica-set with change-stream enabled automatically keeps ChromaDB in sync as API documentation is updated — no manual re-indexing.

Redis Response Caching

Identical queries are served from cache for 10 minutes, reducing LLM calls and improving response latency for repeated searches.

GPT-4o Grounded Generation

Responses are grounded strictly in retrieved API data. The LLM formats a tailored answer with ranked suggestions and rationale, not hallucinated content.

Stack

Technology

API Layer

FastAPIPython

Database

MongoDBReplica-setChange-stream

Embedding

OpenAI text-embedding-3-large

Vector Store

ChromaDBsimilarity_top_k=25

Reranking

HuggingFace Reranking Modeltop_k=5

LLM & Cache

GPT-4oRedis10-min TTL