ragflow

by @Woody-Hu in Web & API

# Install this skill:

npx skills add Woody-Hu/agent_skills --skill "ragflow"

Install specific skill from multi-skill repository

# Description

RAGFlow integration toolkit for interacting with RAGFlow's RESTful API. When Claude needs to manage datasets, upload documents, perform chat completions, or work with knowledge graphs using RAGFlow's API.

# SKILL.md

name: ragflow
description: RAGFlow integration toolkit for interacting with RAGFlow's RESTful API. When Claude needs to manage datasets, upload documents, perform chat completions, or work with knowledge graphs using RAGFlow's API.
license: Proprietary. LICENSE.txt has complete terms

RAGFlow Integration Guide

Overview

This guide provides comprehensive documentation for integrating with RAGFlow's RESTful API. RAGFlow is a powerful Retrieval-Augmented Generation (RAG) framework that enables you to manage datasets, upload documents, perform AI-powered chat completions, and work with knowledge graphs.

Core Concepts

Authentication

RAGFlow API uses API key authentication. You must include your API key in the Authorization header of all requests:

Authorization: Bearer <YOUR_API_KEY>

API Endpoints

RAGFlow provides several key API categories:
- OpenAI-Compatible API: Chat completions for chats and agents
- Dataset Management: Create, update, delete, and list datasets
- File Management: Upload and manage documents within datasets
- Knowledge Graph: Construct and retrieve knowledge graphs

Authentication Setup

import requests

# Set up API key and base URL
API_KEY = "<YOUR_API_KEY>"
BASE_URL = "http://localhost:8000/v1"

# Create session with authentication
headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Content-Type": "application/json"
}

session = requests.Session()
session.headers.update(headers)

OpenAI-Compatible API

Chat Completion

def create_chat_completion(chat_id, messages, stream=False, reference=True, metadata_condition=None):
    """
    Create a chat completion using RAGFlow's OpenAI-compatible API
    """
    url = f"{BASE_URL}/chats_openai/{chat_id}/chat/completions"

    payload = {
        "model": "model",  # Server parses this automatically
        "messages": messages,
        "stream": stream,
        "extra_body": {
            "reference": reference,
            "metadata_condition": metadata_condition
        }
    }

    response = session.post(url, json=payload)
    response.raise_for_status()

    return response.json()

# Example usage
messages = [{"role": "user", "content": "Explain RAGFlow's main features"}]
result = create_chat_completion("chat_id_123", messages)
print(result["choices"][0]["message"]["content"])

Agent Completion

def create_agent_completion(agent_id, messages, stream=False, session_id=None):
    """
    Create an agent completion using RAGFlow's API
    """
    url = f"{BASE_URL}/agents_openai/{agent_id}/chat/completions"

    payload = {
        "model": "model",
        "messages": messages,
        "stream": stream
    }

    if session_id:
        payload["session_id"] = session_id

    response = session.post(url, json=payload)
    response.raise_for_status()

    return response.json()

# Example usage
messages = [{"role": "user", "content": "Help me analyze this document"}]
result = create_agent_completion("agent_id_123", messages)
print(result["choices"][0]["message"]["content"])

Dataset Management

Create Dataset

def create_dataset(name, embedding_model="BAAI/bge-large-zh-v1.5@BAAI", 
                  permission="me", chunk_method="naive", parser_config=None):
    """
    Create a new dataset
    """
    url = f"{BASE_URL}/datasets"

    payload = {
        "name": name,
        "embedding_model": embedding_model,
        "permission": permission,
        "chunk_method": chunk_method
    }

    if parser_config:
        payload["parser_config"] = parser_config

    response = session.post(url, json=payload)
    response.raise_for_status()

    return response.json()

# Example usage with naive chunk method
parser_config = {
    "chunk_token_num": 512,
    "delimiter": "\n",
    "auto_keywords": 5,
    "auto_questions": 3
}

result = create_dataset("my_dataset", parser_config=parser_config)
dataset_id = result["data"]["id"]
print(f"Created dataset with ID: {dataset_id}")

List Datasets

def list_datasets(page=1, page_size=30, orderby="create_time", desc=True, name=None, id=None):
    """
    List datasets with optional filters
    """
    params = {
        "page": page,
        "page_size": page_size,
        "orderby": orderby,
        "desc": desc
    }

    if name:
        params["name"] = name
    if id:
        params["id"] = id

    url = f"{BASE_URL}/datasets"
    response = session.get(url, params=params)
    response.raise_for_status()

    return response.json()

# Example usage
result = list_datasets(page=1, page_size=10)
print(f"Found {result['total']} datasets")
for dataset in result['data']:
    print(f"- {dataset['name']} (ID: {dataset['id']})")

Update Dataset

def update_dataset(dataset_id, updates):
    """
    Update dataset configuration
    """
    url = f"{BASE_URL}/datasets/{dataset_id}"

    response = session.put(url, json=updates)
    response.raise_for_status()

    return response.json()

# Example usage
updates = {
    "name": "updated_dataset_name",
    "description": "Updated dataset description"
}

update_dataset(dataset_id, updates)

Delete Datasets

def delete_datasets(dataset_ids):
    """
    Delete one or more datasets
    """
    url = f"{BASE_URL}/datasets"

    payload = {
        "ids": dataset_ids
    }

    response = session.delete(url, json=payload)
    response.raise_for_status()

    return response.json()

# Example usage - delete a single dataset
delete_datasets([dataset_id])

# Example usage - delete multiple datasets
delete_datasets(["dataset_id_1", "dataset_id_2"])

File Management

Upload Documents

def upload_documents(dataset_id, file_paths):
    """
    Upload one or more documents to a dataset
    """
    url = f"{BASE_URL}/datasets/{dataset_id}/documents"

    # Create multipart form data
    files = []
    for file_path in file_paths:
        files.append(("file", (file_path.split("/")[-1], open(file_path, "rb"))))

    # Remove Content-Type header for multipart requests
    headers = session.headers.copy()
    if "Content-Type" in headers:
        del headers["Content-Type"]

    response = session.post(url, files=files, headers=headers)
    response.raise_for_status()

    # Close file handles
    for _, file_obj in files:
        file_obj.close()

    return response.json()

# Example usage
file_paths = ["./document1.txt", "./document2.pdf"]
result = upload_documents(dataset_id, file_paths)
print(f"Uploaded {len(result['data'])} documents")

Update Document

def update_document(dataset_id, document_id, updates):
    """
    Update document configuration
    """
    url = f"{BASE_URL}/datasets/{dataset_id}/documents/{document_id}"

    response = session.put(url, json=updates)
    response.raise_for_status()

    return response.json()

# Example usage
updates = {
    "name": "renamed_document.txt",
    "chunk_method": "naive",
    "parser_config": {"chunk_token_num": 256}
}

update_document(dataset_id, "document_id_123", updates)

Knowledge Graph Operations

Construct Knowledge Graph

def construct_knowledge_graph(dataset_id):
    """
    Construct a knowledge graph for a dataset using GraphRAG
    """
    url = f"{BASE_URL}/datasets/{dataset_id}/run_graphrag"

    response = session.post(url)
    response.raise_for_status()

    return response.json()

# Example usage
result = construct_knowledge_graph(dataset_id)
graphrag_task_id = result["data"]["graphrag_task_id"]
print(f"Started knowledge graph construction with task ID: {graphrag_task_id}")

Get Knowledge Graph

def get_knowledge_graph(dataset_id):
    """
    Retrieve the knowledge graph for a dataset
    """
    url = f"{BASE_URL}/datasets/{dataset_id}/knowledge_graph"

    response = session.get(url)
    response.raise_for_status()

    return response.json()

# Example usage
graph = get_knowledge_graph(dataset_id)
print(f"Knowledge graph has {len(graph['data']['graph']['nodes'])} nodes")

Get Graph Construction Status

def get_graphrag_status(dataset_id):
    """
    Get the status of knowledge graph construction
    """
    url = f"{BASE_URL}/datasets/{dataset_id}/trace_graphrag"

    response = session.get(url)
    response.raise_for_status()

    return response.json()

# Example usage
status = get_graphrag_status(dataset_id)
print(f"Graph construction progress: {status['data']['progress'] * 100}%")
print(f"Status message: {status['data']['progress_msg']}")

Delete Knowledge Graph

def delete_knowledge_graph(dataset_id):
    """
    Delete the knowledge graph for a dataset
    """
    url = f"{BASE_URL}/datasets/{dataset_id}/knowledge_graph"

    response = session.delete(url)
    response.raise_for_status()

    return response.json()

# Example usage
delete_knowledge_graph(dataset_id)

RAPTOR Operations

Construct RAPTOR

def construct_raptor(dataset_id):
    """
    Construct a RAPTOR index for a dataset
    """
    url = f"{BASE_URL}/datasets/{dataset_id}/run_raptor"

    response = session.post(url)
    response.raise_for_status()

    return response.json()

# Example usage
result = construct_raptor(dataset_id)
raptor_task_id = result["data"]["raptor_task_id"]
print(f"Started RAPTOR construction with task ID: {raptor_task_id}")

Get RAPTOR Status

def get_raptor_status(dataset_id):
    """
    Get the status of RAPTOR construction
    """
    url = f"{BASE_URL}/datasets/{dataset_id}/trace_raptor"

    response = session.get(url)
    response.raise_for_status()

    return response.json()

# Example usage
status = get_raptor_status(dataset_id)
print(f"RAPTOR construction progress: {status['data']['progress'] * 100}%")
print(f"Status message: {status['data']['progress_msg']}")

Error Handling

def handle_ragflow_error(response):
    """
    Handle RAGFlow API errors
    """
    try:
        error_data = response.json()
        error_code = error_data.get("code")
        error_message = error_data.get("message", "Unknown error")
        return f"RAGFlow Error {error_code}: {error_message}"
    except:
        return f"HTTP Error {response.status_code}: {response.text}"

# Example usage
try:
    result = create_dataset("existing_dataset")
except requests.exceptions.HTTPError as e:
    error_msg = handle_ragflow_error(e.response)
    print(f"Failed to create dataset: {error_msg}")

Best Practices

Chunk Methods

RAGFlow supports various chunk methods for different content types:

Method	Use Case
naive	General content (default)
book	Book content with chapters
email	Email content
laws	Legal documents
manual	Manual content
paper	Research papers
presentation	PowerPoint presentations
qa	Question-answer pairs
table	Tabular data
tag	Content with tags

Parser Configuration

When using the naive chunk method, you can configure:
- chunk_token_num: Token count per chunk (default: 512)
- delimiter: Delimiter for splitting text
- auto_keywords: Number of auto-generated keywords (0-32)
- auto_questions: Number of auto-generated questions (0-10)
- layout_recognize: Whether to use layout recognition for PDFs

Performance Considerations

Batch Operations: When uploading multiple documents, upload them in batches
Chunk Size: Adjust chunk size based on your model's context window
Metadata: Use metadata conditions to filter retrieval results
Streaming: Use streaming for long responses to improve user experience
Caching: Cache frequently accessed results to reduce API calls

API Reference

Error Codes

Code	Message	Description
400	Bad Request	Invalid request parameters
401	Unauthorized	Invalid API key
403	Forbidden	Access denied
404	Not Found	Resource not found
500	Internal Server Error	Server error
1001	Invalid Chunk ID	Invalid chunk identifier
1002	Chunk Update Failed	Failed to update chunk

Common Request Headers

Authorization: Bearer <YOUR_API_KEY>
Content-Type: application/json

Common Response Format

{
  "code": 0,
  "data": { /* response data */ },
  "message": "success"
}

Troubleshooting

Connection Issues

Error 10061: Check if RAGFlow service is running
Error 401: Verify API key is correct
Error 403: Ensure you have permission to access the resource
Error 404: Check if dataset/document ID is valid

Performance Issues

For large datasets, increase page size in list operations
Reduce chunk token size for faster processing
Use smaller batch sizes when uploading documents

Knowledge Graph Issues

Ensure dataset has documents before constructing knowledge graph
Check GraphRAG task status for detailed progress
Verify dataset has sufficient content for meaningful graph construction

Next Steps

API Key: Obtain your RAGFlow API key from the RAGFlow UI
Service Setup: Ensure RAGFlow service is running and accessible
Start Small: Begin with simple operations like listing datasets
Experiment: Try different chunk methods and parser configurations
Build Workflows: Create end-to-end workflows for your use cases

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.