Refactor high-complexity React components in Dify frontend. Use when `pnpm analyze-component...
npx skills add synaptiai/agent-capability-standard --skill "integrate"
Install specific skill from multi-skill repository
# Description
Combine heterogeneous data sources into a unified model with conflict resolution, schema alignment, and provenance tracking. Use when merging data from multiple systems, consolidating information, or building comprehensive views.
# SKILL.md
name: integrate
description: Combine heterogeneous data sources into a unified model with conflict resolution, schema alignment, and provenance tracking. Use when merging data from multiple systems, consolidating information, or building comprehensive views.
argument-hint: "[sources] [target_schema] [conflict_strategy]"
disable-model-invocation: false
user-invocable: true
allowed-tools: Read, Grep
context: fork
agent: explore
Intent
Perform data integration to combine information from multiple heterogeneous sources into a unified, coherent model. This handles schema differences, resolves conflicts, and preserves provenance.
Success criteria:
- All sources are incorporated or explicitly excluded
- Schema mismatches are resolved consistently
- Conflicts are documented with resolution rationale
- Provenance traces every value to its source
- Information loss is minimized and documented
World Modeling Context:
Integrate is essential for building comprehensive world-state from multiple observation sources. It combines outputs from retrieve, search, and inspect into unified models, and depends on identity-resolution for entity deduplication.
Hard dependencies:
- Requires identity-resolution when sources contain overlapping entities
Inputs
| Parameter | Required | Type | Description |
|---|---|---|---|
sources |
Yes | array | Data sources to integrate (files, API outputs, retrieved data) |
target_schema |
No | object | Schema the integrated data should conform to |
conflict_strategy |
No | string | prefer_recent, prefer_authoritative, merge, manual (default: prefer_authoritative) |
constraints |
No | object | Required fields, validation rules |
Procedure
1) Analyze sources: Understand what each source contains
- Inventory entities and attributes per source
- Identify source schemas (explicit or inferred)
- Note data freshness and authority
- Detect overlapping vs. unique content
2) Align schemas: Map source schemas to target
- Identify equivalent fields (same meaning, different names)
- Map types (handle type conversions)
- Handle missing fields (default values, nulls)
- Document unmappable fields (information loss)
3) Resolve entity identity: Determine which records refer to same entity
- Apply identity-resolution for overlapping entities
- Assign canonical IDs
- Track all source IDs as aliases
4) Merge attributes: For each entity, combine attributes from sources
- Non-overlapping: Simply include from source
- Overlapping, matching: Confirm consistency
- Overlapping, conflicting: Apply conflict strategy
5) Apply conflict strategy:
For prefer_recent:
- Use the most recently updated value
- Requires reliable timestamps
For prefer_authoritative:
- Rank sources by authority
- Use value from most authoritative source
For merge:
- Combine values where possible (lists, sets)
- For scalars, may require manual resolution
For manual:
- Flag all conflicts for human review
- Do not auto-resolve
6) Validate integrated data: Check against target schema
- Required fields present
- Type constraints satisfied
- Relationship integrity maintained
- Custom invariants hold
7) Document integration: Record what happened
- Per-field provenance (which source)
- Conflict resolutions with rationale
- Information loss (unmapped fields)
- Transformation applied
8) Produce output: Generate integrated model
- Conforms to target schema
- Includes provenance metadata
- Documents confidence levels
Output Contract
Return a structured object:
integrated_data:
schema: string | object # Target schema reference or inline
entities:
- id: string # Canonical ID
type: string # Entity type
attributes: object # Integrated attributes
source_map: # Where each attribute came from
attribute_name:
source: string
original_value: any | null # If transformed
confidence: number
relationships:
- source: string
target: string
type: string
provenance: string
integration_report:
sources_used: array[string]
sources_excluded: array[string]
entity_count:
total: integer
from_single_source: integer
merged: integer
field_mapping:
- source_field: string
target_field: string
transformation: string | null
conflicts:
- entity_id: string
field: string
values:
- source: string
value: any
resolution: string
resolved_value: any
confidence: number
information_loss:
- source: string
field: string
reason: string # unmappable, filtered, etc.
impact: low | medium | high
validation_results:
passed: boolean
errors: array[string]
warnings: array[string]
provenance:
sources: array[object] # Source metadata
integration_timestamp: string
confidence: number
confidence: number # 0.0-1.0 overall
evidence_anchors: array[string]
assumptions: array[string]
Field Definitions
| Field | Type | Description |
|---|---|---|
integrated_data |
object | The unified data model |
source_map |
object | Per-attribute source tracking |
integration_report.conflicts |
array | All conflicts and resolutions |
information_loss |
array | Data that couldn't be integrated |
Examples
Example 1: Integrating Service Data from Multiple Sources
Input:
sources:
- name: "kubernetes"
type: "api"
data:
- id: "api-gateway-deployment"
status: "Running"
replicas: 3
last_updated: "2025-01-24T10:00:00Z"
- name: "datadog"
type: "api"
data:
- service_name: "api-gateway"
health: "healthy"
latency_p99: 45
last_updated: "2025-01-24T10:05:00Z"
- name: "package.json"
type: "file"
data:
- name: "api-gateway"
version: "2.3.1"
dependencies: ["express", "cors"]
target_schema: "microservices_v1"
conflict_strategy: prefer_recent
Output:
integrated_data:
schema: "microservices_v1"
entities:
- id: "api-gateway"
type: "Service"
attributes:
name: "api-gateway"
version: "2.3.1"
status: "healthy"
replicas: 3
latency_p99: 45
dependencies: ["express", "cors"]
source_map:
name:
source: "package.json"
original_value: null
confidence: 1.0
version:
source: "package.json"
original_value: null
confidence: 1.0
status:
source: "datadog"
original_value: "healthy"
confidence: 0.9
replicas:
source: "kubernetes"
original_value: null
confidence: 1.0
latency_p99:
source: "datadog"
original_value: null
confidence: 0.95
dependencies:
source: "package.json"
original_value: null
confidence: 1.0
relationships: []
integration_report:
sources_used: ["kubernetes", "datadog", "package.json"]
sources_excluded: []
entity_count:
total: 1
from_single_source: 0
merged: 1
field_mapping:
- source_field: "kubernetes.status"
target_field: "status"
transformation: "Mapped 'Running' to 'healthy'"
- source_field: "datadog.service_name"
target_field: "name"
transformation: null
conflicts:
- entity_id: "api-gateway"
field: "status"
values:
- source: "kubernetes"
value: "Running"
- source: "datadog"
value: "healthy"
resolution: "prefer_recent (datadog more recent)"
resolved_value: "healthy"
confidence: 0.85
information_loss:
- source: "kubernetes"
field: "labels"
reason: "No target schema field for Kubernetes labels"
impact: low
validation_results:
passed: true
errors: []
warnings:
- "Service has no APIs defined"
provenance:
sources:
- name: "kubernetes"
authority: high
freshness: "2025-01-24T10:00:00Z"
- name: "datadog"
authority: medium
freshness: "2025-01-24T10:05:00Z"
- name: "package.json"
authority: high
freshness: "2025-01-20T00:00:00Z"
integration_timestamp: "2025-01-24T12:00:00Z"
confidence: 0.9
confidence: 0.9
evidence_anchors:
- "kubernetes_api:deployments/api-gateway-deployment"
- "datadog:services/api-gateway"
- "package.json:1-20"
assumptions:
- "All sources refer to the same service"
- "Datadog health status more accurate than K8s Running state"
Evidence pattern: Mapped fields from three sources to unified schema, resolved status conflict using recency.
Example 2: Integrating Documentation with Code Analysis
Input:
sources:
- name: "code_analysis"
data:
functions:
- name: "processOrder"
file: "src/orders.ts"
parameters: ["orderId", "options"]
- name: "api_docs"
data:
endpoints:
- path: "/orders/process"
method: "POST"
description: "Process a pending order"
parameters:
- name: "order_id"
required: true
target_schema: "api_documentation"
conflict_strategy: merge
Output:
integrated_data:
schema: "api_documentation"
entities:
- id: "endpoint_process_order"
type: "Endpoint"
attributes:
path: "/orders/process"
method: "POST"
description: "Process a pending order"
implementation: "src/orders.ts:processOrder"
parameters:
- name: "order_id"
code_name: "orderId"
required: true
source: "body"
source_map:
path: { source: "api_docs" }
method: { source: "api_docs" }
description: { source: "api_docs" }
implementation: { source: "code_analysis" }
parameters: { source: "merged" }
integration_report:
sources_used: ["code_analysis", "api_docs"]
entity_count:
total: 1
merged: 1
conflicts:
- entity_id: "endpoint_process_order"
field: "parameters[0].name"
values:
- source: "code_analysis"
value: "orderId"
- source: "api_docs"
value: "order_id"
resolution: "merge - both names preserved (code vs API)"
resolved_value: { name: "order_id", code_name: "orderId" }
confidence: 0.95
information_loss: []
confidence: 0.95
evidence_anchors:
- "src/orders.ts:processOrder"
- "api_docs:endpoints/process"
Verification
- [ ] All source entities are accounted for (integrated or documented as excluded)
- [ ] Conflicts have documented resolutions
- [ ] Output validates against target schema
- [ ] Provenance traces every attribute to source
- [ ] Information loss is documented
Verification tools: Schema validators, provenance checkers
Safety Constraints
mutation: falserequires_checkpoint: falserequires_approval: falserisk: low
Capability-specific rules:
- Never silently drop data - document all information loss
- When confidence < 0.5 on a merge, flag for review
- Preserve source identifiers for traceability
- Apply identity-resolution before merging overlapping entities
Composition Patterns
Commonly follows:
- retrieve - Get data from sources before integrating
- search - Discover relevant sources to integrate
- identity-resolution - Resolve entity identity before merge
Commonly precedes:
- world-state - Integration produces unified state
- grounding - Integrated data needs grounding
- diff-world-state - Compare integrated states over time
Anti-patterns:
- Never integrate without understanding source schemas
- Avoid ignoring conflicts - always resolve or flag
Workflow references:
- See composition_patterns.md#world-model-build for integration in model construction
- See composition_patterns.md#digital-twin-sync-loop for ongoing integration
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.