Refactor high-complexity React components in Dify frontend. Use when `pnpm analyze-component...
npx skills add streamingfast/substreams-skills --skill "substreams-dev"
Install specific skill from multi-skill repository
# Description
Expert knowledge for developing, building, and debugging Substreams projects on any blockchain. Use when working with substreams.yaml manifests, Rust modules, protobuf schemas, or blockchain data processing.
# SKILL.md
name: substreams-dev
description: Expert knowledge for developing, building, and debugging Substreams projects on any blockchain. Use when working with substreams.yaml manifests, Rust modules, protobuf schemas, or blockchain data processing.
license: Apache-2.0
compatibility:
platforms: [claude-code, cursor, vscode, windsurf]
metadata:
version: 1.0.0
author: StreamingFast
documentation: https://substreams.streamingfast.io
Substreams Development Expert
Expert assistant for building Substreams projects - high-performance blockchain data indexing and transformation.
Core Concepts
What is Substreams?
Substreams is a powerful blockchain indexing technology that enables:
- Parallel processing of blockchain data with high performance
- Composable modules written in Rust (map, store, index types)
- Protobuf schemas for typed data structures
- Streaming-first architecture with cursor-based reorg handling
Key Components
- Manifest (
substreams.yaml): Defines modules, networks, dependencies - Modules: Map (transform), Store (aggregate), Index (filter)
- Protobuf: Type-safe schemas for inputs and outputs
- WASM: Rust code compiled to WebAssembly for execution
Project Structure
my-substreams/
├── substreams.yaml # Manifest
├── proto/
│ └── events.proto # Schema definitions
├── src/
│ └── lib.rs # Rust module code
├── Cargo.toml # Rust dependencies
└── build/ # Generated files (gitignored)
Prerequisites
Required CLI Tools
- substreams: Core CLI for building, running, and deploying
- buf: Required by
substreams buildfor protobuf code generation
Authentication
Running substreams run against hosted endpoints requires authentication:
substreams auth # Interactive authentication
# Or set SUBSTREAMS_API_TOKEN environment variable
Common Workflows
Creating a New Project
- Initialize: Use
substreams initor create manifest manually - Define schema: Create
.protofiles for your data structures - Implement modules: Write Rust handlers in
src/lib.rs - Build: Run
substreams buildto compile to.spkg - Test: Run
substreams runwith small block range (recommended: 1000 blocks) - Deploy: Publish to registry or deploy as service
Module Types
Map Module - Transforms input to output
- name: map_events
kind: map
inputs:
- source: sf.ethereum.type.v2.Block
output:
type: proto:my.types.Events
Store Module - Aggregates data across blocks
- name: store_totals
kind: store
updatePolicy: add
valueType: int64
inputs:
- map: map_events
Index Module - Filters blocks for efficient querying
- name: index_transfers
kind: index
inputs:
- map: map_events
output:
type: proto:sf.substreams.index.v1.Keys
Debugging Checklist
When modules produce unexpected results:
- Validate manifest:
substreams graphto visualize dependencies - Test small range: Run 100-1000 blocks, inspect outputs carefully
- Check logs: Look for WASM panics, protobuf decode errors
- Verify schema: Ensure proto types match expected data
- Review inputs: Confirm input modules produce correct data
- Initial block: Check
initialBlockis set appropriately
Performance Optimization
- Use indexes to skip irrelevant blocks
- Minimize store size by storing only necessary data
- Production mode enables parallel execution:
--production-mode - Module granularity: Smaller, focused modules perform better
- Avoid deep nesting: Flatten module dependencies when possible
Manifest Reference
See references/manifest-spec.md for complete specification.
Key Sections
Package metadata:
specVersion: v0.1.0
package:
name: my-substreams
version: v1.0.0
description: Description of what this substreams does
Protobuf imports:
protobuf:
files:
- events.proto
importPaths:
- ./proto
Binary reference (WASM code):
binaries:
default:
type: wasm/rust-v1
file: ./target/wasm32-unknown-unknown/release/my_substreams.wasm
Network configuration:
network: mainnet
Supported networks: See references/networks.md
Rust Module Development
Map Handler Example
use substreams::errors::Error;
use substreams::prelude::*;
use substreams_ethereum::pb::eth::v2::Block;
#[substreams::handlers::map]
pub fn map_events(block: Block) -> Result<Events, Error> {
let mut events = Events::default();
for trx in block.transactions() {
for (log, _call) in trx.logs_with_calls() {
// Process logs, extract events
if is_transfer_event(log) {
events.transfers.push(extract_transfer(log));
}
}
}
Ok(events)
}
Store Handler Example
#[substreams::handlers::store]
pub fn store_totals(events: Events, store: StoreAddInt64) {
for transfer in events.transfers {
store.add(0, &transfer.token, transfer.amount as i64);
}
}
Best Practices
- Handle errors gracefully: Use
Result<T, Error>returns - Log sparingly: Excessive logging impacts performance
- Validate inputs: Check for null/empty data before processing
- Use substreams helpers: Leverage
substreams-ethereumcrate - Test locally first: Always test with
substreams runbefore deploying - Avoid excessive cloning: Use ownership transfer (see Performance section below)
Performance: Avoiding Excessive Cloning
CRITICAL: One of the greatest performance impacts in Substreams is excessive cloning of data structures.
The Problem
Cloning large data structures is expensive:
- ❌ Cloning a Transaction: Copies all fields, logs, traces
- ❌ Cloning a Block: Copies the entire block including all transactions (EXTREMELY expensive)
- ❌ Cloning in loops: Multiplies the cost by number of iterations
The Solution: Ownership Transfer
Use Rust's ownership system to transfer or borrow data instead of cloning.
Bad Example (Excessive Cloning)
#[substreams::handlers::map]
pub fn map_events(block: Block) -> Result<Events, Error> {
let mut events = Events::default();
for trx in block.transactions() {
// ❌ BAD: Cloning entire transaction
let transaction = trx.clone();
for (log, _call) in transaction.logs_with_calls() {
// ❌ BAD: Cloning log
let log_copy = log.clone();
if is_transfer_event(&log_copy) {
events.transfers.push(extract_transfer(&log_copy));
}
}
}
Ok(events)
}
Good Example (Ownership Transfer)
#[substreams::handlers::map]
pub fn map_events(block: Block) -> Result<Events, Error> {
let mut events = Events::default();
// ✅ GOOD: Iterate by reference
for trx in block.transactions() {
// ✅ GOOD: Borrow, don't clone
for (log, _call) in trx.logs_with_calls() {
if is_transfer_event(log) {
// ✅ GOOD: Only extract what you need
events.transfers.push(extract_transfer(log));
}
}
}
Ok(events)
}
fn is_transfer_event(log: &Log) -> bool {
// Use reference, no cloning
!log.topics.is_empty() &&
log.topics[0] == TRANSFER_EVENT_SIGNATURE
}
fn extract_transfer(log: &Log) -> Transfer {
// Extract only the fields you need
Transfer {
from: Hex::encode(&log.topics[1]),
to: Hex::encode(&log.topics[2]),
amount: Hex::encode(&log.data),
// Don't copy the entire log
}
}
When Cloning is Acceptable
Clone only small, necessary data:
// ✅ OK: Cloning small strings
let token_address = Hex::encode(&log.address).clone();
// ✅ OK: Cloning primitive types
let block_number = block.number.clone();
// ❌ BAD: Cloning entire structures
let block_copy = block.clone(); // Never do this!
let trx_copy = transaction.clone(); // Avoid this!
Performance Tips
-
Use
logs_with_calls(): Iterate logs without cloning
rust for (log, _call) in trx.logs_with_calls() { } // Good for log in trx.receipt.as_ref().unwrap().logs.clone() { } // Bad -
Use references when appropriate: Pass references to avoid unnecessary cloning
rust fn process_log(log: &Log) { } // Good for read-only access fn process_log(log: Log) { } // Good when consuming/transforming data -
Extract minimal data: Only copy what you actually need
```rust
// Good: Extract only needed fields
let amount = parse_amount(&log.data);
// Bad: Copy entire log just to get one field
let log_copy = log.clone();
let amount = parse_amount(&log_copy.data);
```
- Use
into()for consumption: When you need to consume data
rust // When you truly need to take ownership events.transfers.push(Transfer { from: topics[1].into(), // Consumes the data to: topics[2].into(), });
Common Pitfalls
Pitfall #1: Cloning in filters
// ❌ BAD
block.transactions()
.iter()
.filter(|trx| trx.clone().to == target) // Clone every transaction!
// ✅ GOOD
block.transactions()
.iter()
.filter(|trx| trx.to == target) // Just compare
Pitfall #2: Unnecessary defensive copies
// ❌ BAD
let block_copy = block.clone();
for trx in block_copy.transactions() { } // Why clone the whole block?
// ✅ GOOD
for trx in block.transactions() { } // Use the block directly
Pitfall #3: Cloning for mutation
// ❌ BAD
let mut trx_copy = trx.clone();
trx_copy.value = process(trx_copy.value); // Clone just to mutate
// ✅ GOOD
let new_value = process(&trx.value); // Process reference, create new value
Measuring Impact
Use substreams run with timing to measure performance:
# Test with cloning (slow)
time substreams run -s 17000000 -t +1000 map_events
# Test without cloning (fast)
time substreams run -s 17000000 -t +1000 map_events
# You should see significant speedup (2-10x) by avoiding clones
Remember
- 🎯 Measure performance impact: Use timing with
substreams runto identify bottlenecks - 🎯 Clone only when necessary: Most of the time, borrowing is sufficient
- 🎯 Block cloning is almost never needed: This is the #1 performance killer
- 🎯 Transaction cloning should be rare: Extract only the data you need
Common Patterns
See references/patterns.md for detailed examples:
- Event extraction from logs
- Store aggregation patterns
- Multi-module composition
- Parameterized modules
- Dynamic data sources
- Database sink patterns (delta updates, composite keys, sink SQL workflow)
Querying Chain Head Block
To get the current head block of a chain (useful for determining the latest block number):
Using Substreams:
# Quick head block lookup for a network
substreams run common@latest -s -1 --network mainnet
# Or with explicit endpoint
substreams run common@latest -e=<network-id-alias-or-host> -s -1 -o jsonl
Read the first line of output to get the head block information. The -s -1 flag starts from the latest block.
Using firecore:
# JSON output (use jq for further processing if available)
firecore tools firehose-client <network-id-alias-or-host> -o json -- -1
# Text output (less detail), first line looks like:
# Block #24327807 (14b58bd3fa091c05a46d084bba1e78090d52556d29f4312da77b7aa3220423f4)
firecore tools firehose-client <network-id-alias-or-host> -o text -- -1
Read the first line of output to get the head block information.
Development Tips
- Start small: Begin with 1000 block range for testing
- Use GUI:
substreams guifor visual debugging (when available) - Version control: Commit
.spkgfiles for reproducibility - Document modules: Add
doc:fields in manifest for clarity
Troubleshooting
Build fails:
- Check Rust toolchain:
rustup target add wasm32-unknown-unknown - Ensure
bufCLI is installed (required for proto generation) - Verify proto imports are correct
- Add
protobuf.excludePathswithsf/substreamsandgooglewhen importing spkgs - Ensure binary path in manifest matches build output
Empty output:
- Confirm
initialBlockis before first relevant block - Check module isn't filtered out by upstream index
- Verify input data exists in block range
Performance issues:
- Add indexes to skip irrelevant blocks
- Use
--production-modefor large ranges
Resources
Getting Help
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.