rag_architecture

by @DonggangChen in AI & LLM

# Install this skill:

npx skills add DonggangChen/antigravity-agentic-skills --skill "rag_architecture"

Install specific skill from multi-skill repository

# Description

Design LLM applications using the LangChain framework with agents, memory, and tool integration patterns. Use when building LangChain applications, implementing AI agents, or creating complex LLM workflows.

# SKILL.md

name: rag_architecture
router_kit: AIKit
description: Design LLM applications using the LangChain framework with agents, memory, and tool integration patterns. Use when building LangChain applications, implementing AI agents, or creating complex LLM workflows.
metadata:
skillport:
category: auto-healed
tags: [agents, algorithms, artificial intelligence, automation, chatbots, cognitive services, deep learning, embeddings, frameworks, generative ai, inference, large language models, llm, machine learning, model fine-tuning, natural language processing, neural networks, nlp, openai, prompt engineering, rag, rag architecture, retrieval augmented generation, tools, vector databases, workflow automation] - rag_architecture

LangChain Architecture

Master the LangChain framework for building sophisticated LLM applications with agents, chains, memory, and tool integration.

When to Use This Skill

Building autonomous AI agents with tool access
Implementing complex multi-step LLM workflows
Managing conversation memory and state
Integrating LLMs with external data sources and APIs
Creating modular, reusable LLM application components
Implementing document processing pipelines
Building production-grade LLM applications

Core Concepts

1. Agents

Autonomous systems that use LLMs to decide which actions to take.

Agent Types:
- ReAct: Reasoning + Acting in interleaved manner
- OpenAI Functions: Leverages function calling API
- Structured Chat: Handles multi-input tools
- Conversational: Optimized for chat interfaces
- Self-Ask with Search: Decomposes complex queries

2. Chains

Sequences of calls to LLMs or other utilities.

Chain Types:
- LLMChain: Basic prompt + LLM combination
- SequentialChain: Multiple chains in sequence
- RouterChain: Routes inputs to specialized chains
- TransformChain: Data transformations between steps
- MapReduceChain: Parallel processing with aggregation

3. Memory

Systems for maintaining context across interactions.

Memory Types:
- ConversationBufferMemory: Stores all messages
- ConversationSummaryMemory: Summarizes older messages
- ConversationBufferWindowMemory: Keeps last N messages
- EntityMemory: Tracks information about entities
- VectorStoreMemory: Semantic similarity retrieval

4. Document Processing

Loading, transforming, and storing documents for retrieval.

Components:
- Document Loaders: Load from various sources
- Text Splitters: Chunk documents intelligently
- Vector Stores: Store and retrieve embeddings
- Retrievers: Fetch relevant documents
- Indexes: Organize documents for efficient access

5. Callbacks

Hooks for logging, monitoring, and debugging.

Use Cases:
- Request/response logging
- Token usage tracking
- Latency monitoring
- Error handling
- Custom metrics collection

Quick Start

from langchain.agents import AgentType, initialize_agent, load_tools
from langchain.llms import OpenAI
from langchain.memory import ConversationBufferMemory

# Initialize LLM
llm = OpenAI(temperature=0)

# Load tools
tools = load_tools(["serpapi", "llm-math"], llm=llm)

# Add memory
memory = ConversationBufferMemory(memory_key="chat_history")

# Create agent
agent = initialize_agent(
    tools,
    llm,
    agent=AgentType.CONVERSATIONAL_REACT_DESCRIPTION,
    memory=memory,
    verbose=True
)

# Run agent
result = agent.run("What's the weather in SF? Then calculate 25 * 4")

Architecture Patterns

Pattern 1: RAG with LangChain

from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings

# Load and process documents
loader = TextLoader('documents.txt')
documents = loader.load()

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
texts = text_splitter.split_documents(documents)

# Create vector store
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(texts, embeddings)

# Create retrieval chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever(),
    return_source_documents=True
)

# Query
result = qa_chain({"query": "What is the main topic?"})

Pattern 2: Custom Agent with Tools

from langchain.agents import Tool, AgentExecutor
from langchain.agents.react.base import ReActDocstoreAgent
from langchain.tools import tool

@tool
def search_database(query: str) -> str:
    """Search internal database for information."""
    # Your database search logic
    return f"Results for: {query}"

@tool
def send_email(recipient: str, content: str) -> str:
    """Send an email to specified recipient."""
    # Email sending logic
    return f"Email sent to {recipient}"

tools = [search_database, send_email]

agent = initialize_agent(
    tools,
    llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True
)

Pattern 3: Multi-Step Chain

from langchain.chains import LLMChain, SequentialChain
from langchain.prompts import PromptTemplate

# Step 1: Extract key information
extract_prompt = PromptTemplate(
    input_variables=["text"],
    template="Extract key entities from: {text}\n\nEntities:"
)
extract_chain = LLMChain(llm=llm, prompt=extract_prompt, output_key="entities")

# Step 2: Analyze entities
analyze_prompt = PromptTemplate(
    input_variables=["entities"],
    template="Analyze these entities: {entities}\n\nAnalysis:"
)
analyze_chain = LLMChain(llm=llm, prompt=analyze_prompt, output_key="analysis")

# Step 3: Generate summary
summary_prompt = PromptTemplate(
    input_variables=["entities", "analysis"],
    template="Summarize:\nEntities: {entities}\nAnalysis: {analysis}\n\nSummary:"
)
summary_chain = LLMChain(llm=llm, prompt=summary_prompt, output_key="summary")

# Combine into sequential chain
overall_chain = SequentialChain(
    chains=[extract_chain, analyze_chain, summary_chain],
    input_variables=["text"],
    output_variables=["entities", "analysis", "summary"],
    verbose=True
)

Memory Management Best Practices

Choosing the Right Memory Type

# For short conversations (< 10 messages)
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory()

# For long conversations (summarize old messages)
from langchain.memory import ConversationSummaryMemory
memory = ConversationSummaryMemory(llm=llm)

# For sliding window (last N messages)
from langchain.memory import ConversationBufferWindowMemory
memory = ConversationBufferWindowMemory(k=5)

# For entity tracking
from langchain.memory import ConversationEntityMemory
memory = ConversationEntityMemory(llm=llm)

# For semantic retrieval of relevant history
from langchain.memory import VectorStoreRetrieverMemory
memory = VectorStoreRetrieverMemory(retriever=retriever)

Callback System

Custom Callback Handler

from langchain.callbacks.base import BaseCallbackHandler

class CustomCallbackHandler(BaseCallbackHandler):
    def on_llm_start(self, serialized, prompts, **kwargs):
        print(f"LLM started with prompts: {prompts}")

    def on_llm_end(self, response, **kwargs):
        print(f"LLM ended with response: {response}")

    def on_llm_error(self, error, **kwargs):
        print(f"LLM error: {error}")

    def on_chain_start(self, serialized, inputs, **kwargs):
        print(f"Chain started with inputs: {inputs}")

    def on_agent_action(self, action, **kwargs):
        print(f"Agent taking action: {action}")

# Use callback
agent.run("query", callbacks=[CustomCallbackHandler()])

Testing Strategies

import pytest
from unittest.mock import Mock

def test_agent_tool_selection():
    # Mock LLM to return specific tool selection
    mock_llm = Mock()
    mock_llm.predict.return_value = "Action: search_database\nAction Input: test query"

    agent = initialize_agent(tools, mock_llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION)

    result = agent.run("test query")

    # Verify correct tool was selected
    assert "search_database" in str(mock_llm.predict.call_args)

def test_memory_persistence():
    memory = ConversationBufferMemory()

    memory.save_context({"input": "Hi"}, {"output": "Hello!"})

    assert "Hi" in memory.load_memory_variables({})['history']
    assert "Hello!" in memory.load_memory_variables({})['history']

Performance Optimization

1. Caching

from langchain.cache import InMemoryCache
import langchain

langchain.llm_cache = InMemoryCache()

2. Batch Processing

# Process multiple documents in parallel
from langchain.document_loaders import DirectoryLoader
from concurrent.futures import ThreadPoolExecutor

loader = DirectoryLoader('./docs')
docs = loader.load()

def process_doc(doc):
    return text_splitter.split_documents([doc])

with ThreadPoolExecutor(max_workers=4) as executor:
    split_docs = list(executor.map(process_doc, docs))

3. Streaming Responses

from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

llm = OpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()])

Resources

references/agents.md: Deep dive on agent architectures
references/memory.md: Memory system patterns
references/chains.md: Chain composition strategies
references/document-processing.md: Document loading and indexing
references/callbacks.md: Monitoring and observability
assets/agent-template.py: Production-ready agent template
assets/memory-config.yaml: Memory configuration examples
assets/chain-example.py: Complex chain examples

Common Pitfalls

Memory Overflow: Not managing conversation history length
Tool Selection Errors: Poor tool descriptions confuse agents
Context Window Exceeded: Exceeding LLM token limits
No Error Handling: Not catching and handling agent failures
Inefficient Retrieval: Not optimizing vector store queries

Production Checklist

[ ] Implement proper error handling
[ ] Add request/response logging
[ ] Monitor token usage and costs
[ ] Set timeout limits for agent execution
[ ] Implement rate limiting
[ ] Add input validation
[ ] Test with edge cases
[ ] Set up observability (callbacks)
[ ] Implement fallback strategies
RAG Architecture v1.1 - Enhanced

🔄 Workflow

Source: Azure AI Search - RAG & LangChain RAG Concepts

Phase 1: Retrieval Strategy Design

[ ] Chunking: Use "Semantic Chunking" or "Parent-Child Indexing" instead of static size (Split by character) (To preserve context).
[ ] Hybrid Search: Vector search alone is not enough. Use Keyworod (BM25) + Vector (Cosine Sim) combination (Combine with Reciprocal Rank Fusion - RRF).
[ ] Query Transformation: Do not search user query directly. Enrich with "Hypothetical Document Embeddings" (HyDE) or "Multi-query".

Phase 2: Generation Architecture

[ ] Context Window: Manage whether retrieved documents fit into LLM and the "Lost in the Middle" problem (Put most important info at start/end).
[ ] System Prompt: Strictly instruct model to use only provided context and not add external info (Grounding).
[ ] Citation: Ensure answers show which document they are based on with inline references (source_id).

Phase 3: Evaluation & Feedback

[ ] RAG Triad: Measure Context Relevance, Groundedness and Answer Relevance metrics (Use Ragas or TruLens).
[ ] Feedback Loop: When user says "Dislike", save that question and retrieved chunks as negative examples.
[ ] Fine-tuning (Optional): Fine-tune Embedding model with domain data (Generic models may be weak on technical terms).

Checkpoints

Phase	Verification
1	Is retrieval time under 200ms? (Vector DB index optimized?).
2	Does the model know how to say "I don't know"? (Or is it making things up?).
3	Are chunks split logically? (Is there a cut in the middle of a sentence?).

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.