Refactor high-complexity React components in Dify frontend. Use when `pnpm analyze-component...
npx skills add Mindrally/skills --skill "llamaindex-development"
Install specific skill from multi-skill repository
# Description
Expert guidance for LlamaIndex development including RAG applications, vector stores, document processing, query engines, and building production AI applications.
# SKILL.md
name: llamaindex-development
description: Expert guidance for LlamaIndex development including RAG applications, vector stores, document processing, query engines, and building production AI applications.
LlamaIndex Development
You are an expert in LlamaIndex for building RAG (Retrieval-Augmented Generation) applications, data indexing, and LLM-powered applications with Python.
Key Principles
- Write concise, technical responses with accurate Python examples
- Use functional, declarative programming; avoid classes where possible
- Prioritize code quality, maintainability, and performance
- Use descriptive variable names that reflect their purpose
- Follow PEP 8 style guidelines
Code Organization
Directory Structure
project/
βββ data/ # Source documents and data
βββ indexes/ # Persisted index storage
βββ loaders/ # Custom document loaders
βββ retrievers/ # Custom retriever implementations
βββ query_engines/ # Query engine configurations
βββ prompts/ # Custom prompt templates
βββ transformations/ # Document transformations
βββ callbacks/ # Custom callback handlers
βββ utils/ # Utility functions
βββ tests/ # Test files
βββ config/ # Configuration files
Naming Conventions
- Use snake_case for files, functions, and variables
- Use PascalCase for classes
- Prefix private functions with underscore
- Use descriptive names (e.g.,
create_vector_index,build_query_engine)
Document Loading
Using Document Loaders
from llama_index.core import SimpleDirectoryReader
from llama_index.readers.file import PDFReader, DocxReader
# Load from directory
documents = SimpleDirectoryReader(
input_dir="./data",
recursive=True,
required_exts=[".pdf", ".txt", ".md"]
).load_data()
# Load specific file types
pdf_reader = PDFReader()
documents = pdf_reader.load_data(file="document.pdf")
Custom Loaders
from llama_index.core.readers.base import BaseReader
from llama_index.core import Document
class CustomLoader(BaseReader):
def load_data(self, file_path: str) -> list[Document]:
# Custom loading logic
with open(file_path, 'r') as f:
content = f.read()
return [Document(
text=content,
metadata={"source": file_path}
)]
Text Splitting and Processing
Node Parsing
from llama_index.core.node_parser import (
SentenceSplitter,
SemanticSplitterNodeParser,
MarkdownNodeParser
)
# Simple sentence splitting
splitter = SentenceSplitter(
chunk_size=1024,
chunk_overlap=200
)
nodes = splitter.get_nodes_from_documents(documents)
# Semantic splitting (preserves meaning)
from llama_index.embeddings.openai import OpenAIEmbedding
semantic_splitter = SemanticSplitterNodeParser(
embed_model=OpenAIEmbedding(),
breakpoint_percentile_threshold=95
)
# Markdown-aware splitting
markdown_splitter = MarkdownNodeParser()
Best Practices for Chunking
- Choose chunk size based on your embedding model's context window
- Use overlap to maintain context between chunks
- Preserve document structure when possible
- Include metadata for filtering and retrieval
- Use semantic splitting for better coherence
Vector Stores and Indexing
Creating Indexes
from llama_index.core import VectorStoreIndex, StorageContext
from llama_index.vector_stores.chroma import ChromaVectorStore
import chromadb
# In-memory index
index = VectorStoreIndex.from_documents(documents)
# With persistent vector store
chroma_client = chromadb.PersistentClient(path="./chroma_db")
chroma_collection = chroma_client.get_or_create_collection("my_collection")
vector_store = ChromaVectorStore(chroma_collection=chroma_collection)
storage_context = StorageContext.from_defaults(vector_store=vector_store)
index = VectorStoreIndex.from_documents(
documents,
storage_context=storage_context
)
Supported Vector Stores
- Chroma (local development)
- Pinecone (production, managed)
- Weaviate (production, self-hosted or managed)
- Qdrant (production, self-hosted or managed)
- PostgreSQL with pgvector
- MongoDB Atlas Vector Search
Index Persistence
from llama_index.core import StorageContext, load_index_from_storage
# Persist index
index.storage_context.persist(persist_dir="./storage")
# Load index
storage_context = StorageContext.from_defaults(persist_dir="./storage")
index = load_index_from_storage(storage_context)
Query Engines
Basic Query Engine
from llama_index.core import VectorStoreIndex
index = VectorStoreIndex.from_documents(documents)
query_engine = index.as_query_engine(
similarity_top_k=5,
response_mode="compact"
)
response = query_engine.query("What is the main topic?")
print(response.response)
Response Modes
refine: Iteratively refine answer through each nodecompact: Combine chunks before sending to LLMtree_summarize: Build tree and summarizesimple_summarize: Truncate and summarizeaccumulate: Accumulate responses from each node
Advanced Query Engine
from llama_index.core.query_engine import RetrieverQueryEngine
from llama_index.core.postprocessor import SimilarityPostprocessor
query_engine = RetrieverQueryEngine.from_args(
retriever=index.as_retriever(similarity_top_k=10),
node_postprocessors=[
SimilarityPostprocessor(similarity_cutoff=0.7)
],
response_mode="compact"
)
Retrievers
Custom Retrievers
from llama_index.core.retrievers import VectorIndexRetriever
# Basic retriever
retriever = VectorIndexRetriever(
index=index,
similarity_top_k=10
)
# Retrieve nodes
nodes = retriever.retrieve("search query")
Hybrid Search
from llama_index.core.retrievers import QueryFusionRetriever
# Combine multiple retrieval strategies
retriever = QueryFusionRetriever(
[
index.as_retriever(similarity_top_k=5),
bm25_retriever, # Keyword-based
],
num_queries=4,
use_async=True
)
Embeddings
Embedding Models
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.core import Settings
# OpenAI embeddings
Settings.embed_model = OpenAIEmbedding(
model="text-embedding-3-small",
dimensions=512 # Optional dimension reduction
)
# Local embeddings
Settings.embed_model = HuggingFaceEmbedding(
model_name="BAAI/bge-small-en-v1.5"
)
LLM Configuration
Setting Up LLMs
from llama_index.llms.openai import OpenAI
from llama_index.llms.anthropic import Anthropic
from llama_index.core import Settings
# OpenAI
Settings.llm = OpenAI(
model="gpt-4o",
temperature=0.1
)
# Anthropic
Settings.llm = Anthropic(
model="claude-sonnet-4-20250514",
temperature=0.1
)
Agents
Building Agents
from llama_index.core.agent import ReActAgent
from llama_index.core.tools import QueryEngineTool, ToolMetadata
# Create tools from query engines
tools = [
QueryEngineTool(
query_engine=documents_query_engine,
metadata=ToolMetadata(
name="documents",
description="Search through documents"
)
),
QueryEngineTool(
query_engine=code_query_engine,
metadata=ToolMetadata(
name="codebase",
description="Search through code"
)
)
]
# Create agent
agent = ReActAgent.from_tools(
tools,
llm=llm,
verbose=True
)
response = agent.chat("Find information about X")
Performance Optimization
Caching
from llama_index.core import Settings
from llama_index.core.llms import LLMCache
# Enable LLM response caching
Settings.llm = OpenAI(model="gpt-4o")
Settings.llm_cache = LLMCache()
Async Operations
# Use async for better performance
response = await query_engine.aquery("question")
# Batch processing
responses = await asyncio.gather(*[
query_engine.aquery(q) for q in questions
])
Embedding Optimization
- Batch embeddings when possible
- Use smaller embedding dimensions when accuracy allows
- Cache embeddings for repeated documents
- Use local models for cost-sensitive applications
Error Handling
from llama_index.core.callbacks import CallbackManager, LlamaDebugHandler
# Debug handler for troubleshooting
debug_handler = LlamaDebugHandler()
callback_manager = CallbackManager([debug_handler])
Settings.callback_manager = callback_manager
Testing
- Unit test document loaders and transformations
- Test retrieval quality with known queries
- Validate index persistence and loading
- Test query engine responses
- Monitor retrieval metrics (precision, recall)
Dependencies
- llama-index
- llama-index-embeddings-openai
- llama-index-llms-openai
- llama-index-vector-stores-chroma
- chromadb
- python-dotenv
- pydantic
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.