DonggangChen

rag_architecture

2
2
# Install this skill:
npx skills add DonggangChen/antigravity-agentic-skills --skill "rag_architecture"

Install specific skill from multi-skill repository

# Description

Design LLM applications using the LangChain framework with agents, memory, and tool integration patterns. Use when building LangChain applications, implementing AI agents, or creating complex LLM workflows.

# SKILL.md


name: rag_architecture
router_kit: AIKit
description: Design LLM applications using the LangChain framework with agents, memory, and tool integration patterns. Use when building LangChain applications, implementing AI agents, or creating complex LLM workflows.
metadata:
skillport:
category: auto-healed
tags: [agents, algorithms, artificial intelligence, automation, chatbots, cognitive services, deep learning, embeddings, frameworks, generative ai, inference, large language models, llm, machine learning, model fine-tuning, natural language processing, neural networks, nlp, openai, prompt engineering, rag, rag architecture, retrieval augmented generation, tools, vector databases, workflow automation] - rag_architecture


LangChain Architecture

Master the LangChain framework for building sophisticated LLM applications with agents, chains, memory, and tool integration.

When to Use This Skill

  • Building autonomous AI agents with tool access
  • Implementing complex multi-step LLM workflows
  • Managing conversation memory and state
  • Integrating LLMs with external data sources and APIs
  • Creating modular, reusable LLM application components
  • Implementing document processing pipelines
  • Building production-grade LLM applications

Core Concepts

1. Agents

Autonomous systems that use LLMs to decide which actions to take.

Agent Types:
- ReAct: Reasoning + Acting in interleaved manner
- OpenAI Functions: Leverages function calling API
- Structured Chat: Handles multi-input tools
- Conversational: Optimized for chat interfaces
- Self-Ask with Search: Decomposes complex queries

2. Chains

Sequences of calls to LLMs or other utilities.

Chain Types:
- LLMChain: Basic prompt + LLM combination
- SequentialChain: Multiple chains in sequence
- RouterChain: Routes inputs to specialized chains
- TransformChain: Data transformations between steps
- MapReduceChain: Parallel processing with aggregation

3. Memory

Systems for maintaining context across interactions.

Memory Types:
- ConversationBufferMemory: Stores all messages
- ConversationSummaryMemory: Summarizes older messages
- ConversationBufferWindowMemory: Keeps last N messages
- EntityMemory: Tracks information about entities
- VectorStoreMemory: Semantic similarity retrieval

4. Document Processing

Loading, transforming, and storing documents for retrieval.

Components:
- Document Loaders: Load from various sources
- Text Splitters: Chunk documents intelligently
- Vector Stores: Store and retrieve embeddings
- Retrievers: Fetch relevant documents
- Indexes: Organize documents for efficient access

5. Callbacks

Hooks for logging, monitoring, and debugging.

Use Cases:
- Request/response logging
- Token usage tracking
- Latency monitoring
- Error handling
- Custom metrics collection

Quick Start

from langchain.agents import AgentType, initialize_agent, load_tools
from langchain.llms import OpenAI
from langchain.memory import ConversationBufferMemory

# Initialize LLM
llm = OpenAI(temperature=0)

# Load tools
tools = load_tools(["serpapi", "llm-math"], llm=llm)

# Add memory
memory = ConversationBufferMemory(memory_key="chat_history")

# Create agent
agent = initialize_agent(
    tools,
    llm,
    agent=AgentType.CONVERSATIONAL_REACT_DESCRIPTION,
    memory=memory,
    verbose=True
)

# Run agent
result = agent.run("What's the weather in SF? Then calculate 25 * 4")

Architecture Patterns

Pattern 1: RAG with LangChain

from langchain.chains import RetrievalQA
from langchain.document_loaders import TextLoader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import Chroma
from langchain.embeddings import OpenAIEmbeddings

# Load and process documents
loader = TextLoader('documents.txt')
documents = loader.load()

text_splitter = CharacterTextSplitter(chunk_size=1000, chunk_overlap=200)
texts = text_splitter.split_documents(documents)

# Create vector store
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(texts, embeddings)

# Create retrieval chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=vectorstore.as_retriever(),
    return_source_documents=True
)

# Query
result = qa_chain({"query": "What is the main topic?"})

Pattern 2: Custom Agent with Tools

from langchain.agents import Tool, AgentExecutor
from langchain.agents.react.base import ReActDocstoreAgent
from langchain.tools import tool

@tool
def search_database(query: str) -> str:
    """Search internal database for information."""
    # Your database search logic
    return f"Results for: {query}"

@tool
def send_email(recipient: str, content: str) -> str:
    """Send an email to specified recipient."""
    # Email sending logic
    return f"Email sent to {recipient}"

tools = [search_database, send_email]

agent = initialize_agent(
    tools,
    llm,
    agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION,
    verbose=True
)

Pattern 3: Multi-Step Chain

from langchain.chains import LLMChain, SequentialChain
from langchain.prompts import PromptTemplate

# Step 1: Extract key information
extract_prompt = PromptTemplate(
    input_variables=["text"],
    template="Extract key entities from: {text}\n\nEntities:"
)
extract_chain = LLMChain(llm=llm, prompt=extract_prompt, output_key="entities")

# Step 2: Analyze entities
analyze_prompt = PromptTemplate(
    input_variables=["entities"],
    template="Analyze these entities: {entities}\n\nAnalysis:"
)
analyze_chain = LLMChain(llm=llm, prompt=analyze_prompt, output_key="analysis")

# Step 3: Generate summary
summary_prompt = PromptTemplate(
    input_variables=["entities", "analysis"],
    template="Summarize:\nEntities: {entities}\nAnalysis: {analysis}\n\nSummary:"
)
summary_chain = LLMChain(llm=llm, prompt=summary_prompt, output_key="summary")

# Combine into sequential chain
overall_chain = SequentialChain(
    chains=[extract_chain, analyze_chain, summary_chain],
    input_variables=["text"],
    output_variables=["entities", "analysis", "summary"],
    verbose=True
)

Memory Management Best Practices

Choosing the Right Memory Type

# For short conversations (< 10 messages)
from langchain.memory import ConversationBufferMemory
memory = ConversationBufferMemory()

# For long conversations (summarize old messages)
from langchain.memory import ConversationSummaryMemory
memory = ConversationSummaryMemory(llm=llm)

# For sliding window (last N messages)
from langchain.memory import ConversationBufferWindowMemory
memory = ConversationBufferWindowMemory(k=5)

# For entity tracking
from langchain.memory import ConversationEntityMemory
memory = ConversationEntityMemory(llm=llm)

# For semantic retrieval of relevant history
from langchain.memory import VectorStoreRetrieverMemory
memory = VectorStoreRetrieverMemory(retriever=retriever)

Callback System

Custom Callback Handler

from langchain.callbacks.base import BaseCallbackHandler

class CustomCallbackHandler(BaseCallbackHandler):
    def on_llm_start(self, serialized, prompts, **kwargs):
        print(f"LLM started with prompts: {prompts}")

    def on_llm_end(self, response, **kwargs):
        print(f"LLM ended with response: {response}")

    def on_llm_error(self, error, **kwargs):
        print(f"LLM error: {error}")

    def on_chain_start(self, serialized, inputs, **kwargs):
        print(f"Chain started with inputs: {inputs}")

    def on_agent_action(self, action, **kwargs):
        print(f"Agent taking action: {action}")

# Use callback
agent.run("query", callbacks=[CustomCallbackHandler()])

Testing Strategies

import pytest
from unittest.mock import Mock

def test_agent_tool_selection():
    # Mock LLM to return specific tool selection
    mock_llm = Mock()
    mock_llm.predict.return_value = "Action: search_database\nAction Input: test query"

    agent = initialize_agent(tools, mock_llm, agent=AgentType.ZERO_SHOT_REACT_DESCRIPTION)

    result = agent.run("test query")

    # Verify correct tool was selected
    assert "search_database" in str(mock_llm.predict.call_args)

def test_memory_persistence():
    memory = ConversationBufferMemory()

    memory.save_context({"input": "Hi"}, {"output": "Hello!"})

    assert "Hi" in memory.load_memory_variables({})['history']
    assert "Hello!" in memory.load_memory_variables({})['history']

Performance Optimization

1. Caching

from langchain.cache import InMemoryCache
import langchain

langchain.llm_cache = InMemoryCache()

2. Batch Processing

# Process multiple documents in parallel
from langchain.document_loaders import DirectoryLoader
from concurrent.futures import ThreadPoolExecutor

loader = DirectoryLoader('./docs')
docs = loader.load()

def process_doc(doc):
    return text_splitter.split_documents([doc])

with ThreadPoolExecutor(max_workers=4) as executor:
    split_docs = list(executor.map(process_doc, docs))

3. Streaming Responses

from langchain.callbacks.streaming_stdout import StreamingStdOutCallbackHandler

llm = OpenAI(streaming=True, callbacks=[StreamingStdOutCallbackHandler()])

Resources

  • references/agents.md: Deep dive on agent architectures
  • references/memory.md: Memory system patterns
  • references/chains.md: Chain composition strategies
  • references/document-processing.md: Document loading and indexing
  • references/callbacks.md: Monitoring and observability
  • assets/agent-template.py: Production-ready agent template
  • assets/memory-config.yaml: Memory configuration examples
  • assets/chain-example.py: Complex chain examples

Common Pitfalls

  1. Memory Overflow: Not managing conversation history length
  2. Tool Selection Errors: Poor tool descriptions confuse agents
  3. Context Window Exceeded: Exceeding LLM token limits
  4. No Error Handling: Not catching and handling agent failures
  5. Inefficient Retrieval: Not optimizing vector store queries

Production Checklist

  • [ ] Implement proper error handling
  • [ ] Add request/response logging
  • [ ] Monitor token usage and costs
  • [ ] Set timeout limits for agent execution
  • [ ] Implement rate limiting
  • [ ] Add input validation
  • [ ] Test with edge cases
  • [ ] Set up observability (callbacks)
  • [ ] Implement fallback strategies
    RAG Architecture v1.1 - Enhanced

🔄 Workflow

Source: Azure AI Search - RAG & LangChain RAG Concepts

Phase 1: Retrieval Strategy Design

  • [ ] Chunking: Use "Semantic Chunking" or "Parent-Child Indexing" instead of static size (Split by character) (To preserve context).
  • [ ] Hybrid Search: Vector search alone is not enough. Use Keyworod (BM25) + Vector (Cosine Sim) combination (Combine with Reciprocal Rank Fusion - RRF).
  • [ ] Query Transformation: Do not search user query directly. Enrich with "Hypothetical Document Embeddings" (HyDE) or "Multi-query".

Phase 2: Generation Architecture

  • [ ] Context Window: Manage whether retrieved documents fit into LLM and the "Lost in the Middle" problem (Put most important info at start/end).
  • [ ] System Prompt: Strictly instruct model to use only provided context and not add external info (Grounding).
  • [ ] Citation: Ensure answers show which document they are based on with inline references (source_id).

Phase 3: Evaluation & Feedback

  • [ ] RAG Triad: Measure Context Relevance, Groundedness and Answer Relevance metrics (Use Ragas or TruLens).
  • [ ] Feedback Loop: When user says "Dislike", save that question and retrieved chunks as negative examples.
  • [ ] Fine-tuning (Optional): Fine-tune Embedding model with domain data (Generic models may be weak on technical terms).

Checkpoints

Phase Verification
1 Is retrieval time under 200ms? (Vector DB index optimized?).
2 Does the model know how to say "I don't know"? (Or is it making things up?).
3 Are chunks split logically? (Is there a cut in the middle of a sentence?).

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.