RabbitMQ Architecture Designer

by @williamzujkowski in Tools

# Install this skill:

npx skills add williamzujkowski/cognitive-toolworks --skill "RabbitMQ Architecture Designer"

Install specific skill from multi-skill repository

# Description

Design RabbitMQ architectures with exchanges, quorum queues, routing patterns, clustering, dead letter exchanges, and AMQP best practices.

# SKILL.md

name: RabbitMQ Architecture Designer
slug: messaging-rabbitmq-architect
description: Design RabbitMQ architectures with exchanges, quorum queues, routing patterns, clustering, dead letter exchanges, and AMQP best practices.
capabilities:
- Exchange type selection and routing pattern design
- Queue type selection (classic, quorum, streams)
- Publisher confirms and consumer acknowledgments
- Clustering topology with quorum queue replication
- Dead letter exchange (DLX) error handling patterns
- Message durability and persistence strategies
- Prefetch tuning and consumer concurrency
- AMQP protocol best practices
inputs:
- Message flow requirements (publishers, consumers, routing logic)
- Exchange types (direct, topic, fanout, headers, consistent hashing)
- Queue requirements (durability, replication, ordering, priority)
- Clustering needs (high availability, replication factor)
- Error handling patterns (DLX, retry, TTL)
- Performance requirements (throughput, latency, consumer count)
outputs:
- RabbitMQ topology design (exchanges, queues, bindings)
- Queue type recommendations (classic vs quorum vs streams)
- Publisher/consumer configuration (confirms, acks, prefetch)
- Clustering configuration (node count, replication)
- DLX error handling setup
- AMQP connection and channel management patterns
keywords:
- rabbitmq
- amqp
- message-queue
- exchange
- routing
- quorum-queues
- clustering
- dead-letter-exchange
- publisher-confirms
- consumer-acknowledgments
- message-broker
- event-driven
version: "1.0.0"
owner: cognitive-toolworks
license: MIT
security: "Never include credentials in topology definitions. Use environment variables for AMQP URIs. Enable TLS for production. Avoid hardcoding queue/exchange names with sensitive data."
links:
- title: "RabbitMQ 4.1.0 Release (April 2025)"
url: "https://www.rabbitmq.com/blog/2025/04/15/rabbitmq-4.1.0-is-released"
accessed: "2025-10-26"
- title: "Quorum Queues Documentation"
url: "https://www.rabbitmq.com/docs/quorum-queues"
accessed: "2025-10-26"
- title: "RabbitMQ Exchanges"
url: "https://www.rabbitmq.com/docs/exchanges"
accessed: "2025-10-26"
- title: "Clustering Guide"
url: "https://www.rabbitmq.com/docs/clustering"
accessed: "2025-10-26"
- title: "Dead Letter Exchanges"
url: "https://www.rabbitmq.com/docs/dlx"
accessed: "2025-10-26"

RabbitMQ Architecture Designer

Purpose & When-To-Use

Trigger conditions:

You need to design a RabbitMQ topology with exchanges, queues, and routing patterns
You need to select queue types (classic, quorum, streams) based on durability and replication needs
You need to configure publisher confirms or consumer acknowledgments for reliability
You need to set up clustering for high availability with quorum queue replication
You need to implement dead letter exchange (DLX) error handling or retry patterns
You need to optimize consumer prefetch or concurrent processing

Complements:

integration-messagequeue-designer: For generic message queue pattern selection (RabbitMQ vs Kafka vs SQS)
messaging-kafka-architect: For Kafka-specific event streaming architectures
microservices-pattern-architect: For saga, CQRS, event sourcing patterns that use RabbitMQ

Out of scope:

RabbitMQ installation and OS-level configuration (use infrastructure automation)
Monitoring and alerting setup (use observability-stack-configurator)
Client library integration code (use language-specific AMQP client docs)
Long-term message retention (use Kafka Streams or database for archival)

Pre-Checks

Time normalization:

Compute NOW_ET using NIST/time.gov semantics (America/New_York, ISO-8601)
Use NOW_ET for all access dates in citations

Verify inputs:

✅ Required: At least one message flow definition (publisher → exchange → queue → consumer)
✅ Required: RabbitMQ version specified (recommend 4.2+ for Khepri metadata store and quorum queue enhancements)
⚠️ Optional: Exchange type preferences (default to topic for flexibility)
⚠️ Optional: Queue type (classic, quorum, streams) - default to quorum for durability
⚠️ Optional: Clustering requirements (node count, replication factor)
⚠️ Optional: Error handling strategy (DLX, retry backoff, TTL)

Validate requirements:

If high availability needed → quorum queues (since RabbitMQ 3.8, replicated via Raft)
If ordering required → single active consumer (SAC) or stream queues
If priorities needed → quorum queues support 2 priorities (high/normal) in RabbitMQ 4.0+
If broadcasting → fanout exchange
If complex routing → topic exchange with wildcards

Source freshness:

RabbitMQ 4.2 latest (2025, Khepri metadata store, stream filters) (accessed NOW_ET)
Quorum queues introduced in 3.8, enhanced in 4.0 (priorities, consumer priority for SAC)
Classic mirrored queues removed in 4.0 (replaced by quorum queues)

Abort if:

No message flow specified → EMIT TODO: "Define at least one publisher → exchange → queue → consumer flow"
Queue type unclear → EMIT TODO: "Specify queue requirements: durability (classic/quorum), replication (quorum), or high throughput (streams)"
Clustering without quorum queues → EMIT TODO: "Use quorum queues for replicated, highly available queues (classic queues are single-node in RabbitMQ 4.x)"

Procedure

T1: Basic RabbitMQ Topology (≤2k tokens, 80% use case)

Scenario: Single exchange, single queue, direct routing, no clustering, basic error handling.

Steps:

Exchange Type Selection:
Direct: Exact routing key match (e.g., order.created → order-processing-queue)
Topic: Pattern matching with * (1 word) and # (0+ words) (e.g., audit.events.# matches audit.events.users.signup)
Fanout: Broadcast to all queues (ignore routing key)
T1 recommendation: Use topic exchange for flexibility even if only using exact routing initially
Queue Type Selection:
Classic: Single node, non-replicated (use only for dev/test)
Quorum: Replicated (Raft consensus), durable, data safety (use for production)
Streams: High throughput, append-only log (use for event streaming)
T1 recommendation: Use quorum queue with x-queue-type=quorum argument
Publisher Configuration:
Enable publisher confirms for reliability (wait for broker acknowledgment)
Set delivery mode = 2 for persistent messages (survive broker restart)
T1 recommendation: Use streaming confirms (handle confirms as they arrive)
Consumer Configuration:
Use manual acknowledgments (ack after successful processing)
Set prefetch count = 10 (balance between throughput and backpressure)
T1 recommendation: Ack after processing, nack+requeue on transient errors
Basic Topology:
1 topic exchange (events)
1 quorum queue (order-processing-queue)
1 binding (order.created → order-processing-queue)
Publisher → events exchange with routing key order.created
Consumer → order-processing-queue with manual ack + prefetch=10

Output:

Topology diagram: 1 exchange, 1 queue, 1 binding
Publisher config: confirms enabled, persistent messages
Consumer config: manual ack, prefetch=10

Token budget: ≤2000 tokens

T2: Multi-Exchange Routing + DLX Error Handling (≤6k tokens)

Scenario: Multiple exchanges with complex routing, dead letter exchange for errors, quorum queues, retry with backoff.

Steps:

Multi-Exchange Topology:

Pattern: Separate exchanges for different message types or bounded contexts
* Example:
* orders-exchange (topic) → routes to order-processing-queue, order-audit-queue
* payments-exchange (topic) → routes to payment-processing-queue
* notifications-exchange (fanout) → broadcasts to all notification queues

Topic Exchange Routing Patterns:

Wildcards:
* * matches exactly one word (e.g., order.*.created matches order.online.created but not order.created)
* # matches zero or more words (e.g., audit.# matches audit.users, audit.users.signup, audit)

Example bindings:
* order.created → order-processing-queue (exact match)
* order.# → order-audit-queue (all order events)
* payment.processed → payment-processing-queue
* notification.* → notification-email-queue, notification-sms-queue (broadcast via topic)

Dead Letter Exchange (DLX) Setup:

Use cases:
* Handle messages rejected by consumers (nack without requeue)
* Handle messages exceeding TTL (time-to-live)
* Handle messages exceeding delivery limit (quorum queues default limit=20)

Configuration via policy (recommended):
json { "pattern": "order-processing-queue", "definition": { "dead-letter-exchange": "dlx-exchange", "dead-letter-routing-key": "order.processing.failed", "message-ttl": 86400000, "delivery-limit": 20 } }

DLX topology:
* Main queue: order-processing-queue (quorum)
* Dead letter exchange: dlx-exchange (topic)
* Dead letter queue: dlx-order-processing-queue (quorum, for manual inspection)
* Binding: order.processing.failed → dlx-order-processing-queue

Retry with Backoff Pattern:

Pattern: Use TTL + DLX to implement delayed retries
* Step 1: Consumer nacks message without requeue → DLX routes to retry-queue-5s (TTL=5s)
* Step 2: After 5s, message expires → routes back to main queue via DLX
* Step 3: Repeated failures trigger delivery limit → routes to final DLX for manual handling

Example:
* Main queue: order-processing-queue
* Retry queue 1: retry-order-5s (TTL=5s, DLX=orders-exchange)
* Retry queue 2: retry-order-30s (TTL=30s, DLX=orders-exchange)
* Final DLX: dlx-order-processing-queue (manual inspection)

Quorum Queue Configuration:

Arguments:
* x-queue-type=quorum (replicated queue)
* x-quorum-initial-group-size=3 (replication factor, odd number for Raft consensus)
* x-delivery-limit=20 (max redeliveries before DLX, default in RabbitMQ 4.0+)
* x-max-priority=2 (RabbitMQ 4.0+ supports exactly 2 priorities: normal and high)

Publisher priority:
* Publish with priority=5 (high priority, delivered 2:1 ratio vs normal)
* Publish with priority=0 or no priority (normal priority)

Consumer Acknowledgment Strategies:

Manual ack (recommended):
* Process message → basic.ack (remove from queue)
* Transient error (network timeout) → basic.nack + requeue=true (redelivery)
* Permanent error (invalid data) → basic.nack + requeue=false (send to DLX)

Prefetch tuning:
* Low prefetch (1-10): Better fairness, lower throughput
* High prefetch (50-100): Higher throughput, risk of consumer overload
* Recommendation: Start with prefetch=10, tune based on processing time and consumer count

Output:

Multi-exchange topology (orders, payments, notifications)
Topic routing patterns with wildcards
DLX error handling with retry backoff
Quorum queue configuration
Publisher/consumer config (confirms, acks, prefetch)

Token budget: ≤6000 tokens

T3: Clustering + Streams + Advanced Patterns (≤12k tokens)

Scenario: Multi-node cluster with quorum queue replication, stream queues for high throughput, federation for multi-DC.

Steps:

Clustering Topology:

Best practices:
* Odd number of nodes: 3, 5, or 7 nodes (Raft consensus requires majority)
* Equal peers: All nodes are equal (no leader/follower at cluster level, but quorum queues use Raft leader election)
* Network requirements: Nodes must resolve hostnames, ports 4369 (epmd), 25672 (inter-node), 5672 (AMQP) open
* Avoid 2-node clusters: No clear majority during network partitions

Example 3-node cluster:
* Node 1: [email protected]
* Node 2: [email protected]
* Node 3: [email protected]
* Erlang cookie: same on all nodes (authentication)

Quorum queue replication:
* Quorum queues replicate across 3 nodes (configurable via x-quorum-initial-group-size)
* Raft leader elected automatically (handles writes)
* Followers replicate data (handle reads if leader down)
* Survives minority node failures (e.g., 1 node down in 3-node cluster)

Stream Queues for High Throughput:

Use case: Event streaming, audit logs, high-volume data ingestion (millions of messages/sec)

Characteristics:
* Append-only log (like Kafka topics)
* Multiple consumers can read from same offset
* Retention based on size or time (not per-consumer)
* RabbitMQ 4.2: SQL filter expressions (4M+ msg/sec filtering with Bloom filters)

Configuration:
json { "x-queue-type": "stream", "x-max-age": "7D", "x-stream-max-segment-size-bytes": 500000000 }

Consumer offset tracking:
* Consumer specifies offset: first, last, next, or timestamp
* Offset stored server-side (like Kafka consumer groups)

Consistent Hashing Exchange (Plugin):

Use case: Shard messages across multiple queues for horizontal scaling

Pattern:
* Consistent hashing exchange routes based on routing key hash
* Messages with same routing key always go to same queue (ordering guarantee)
* Add/remove queues with minimal redistribution

Example:
* Exchange: sharded-orders (type=x-consistent-hash)
* Queues: orders-shard-0, orders-shard-1, orders-shard-2
* Routing key: user-123 → always routes to same shard

Federation for Multi-DC:

Use case: Replicate messages across datacenters without clustering (clusters require low-latency networks)

Pattern:
* Upstream (DC1): orders-exchange
* Downstream (DC2): orders-exchange-federated (receives messages from DC1)
* Federation link: DC2 pulls messages from DC1 orders-exchange

Benefits:
* Survives WAN latency and network partitions (unlike clustering)
* Independent RabbitMQ clusters in each DC
* Messages flow one-way (upstream → downstream)

Advanced Publisher Patterns:

Transactional publishing (avoid, heavyweight):
* AMQP transactions (tx.select, tx.commit) → very slow, blocks channel
* Use publisher confirms instead (asynchronous, higher throughput)

Batch publishing:
* Publish multiple messages, then wait for confirms in batch
* Higher throughput than individual confirms
* Risk: larger batch = longer recovery time on failure

Single Active Consumer (SAC) for Ordering:

Use case: Ensure messages processed in order by allowing only one consumer at a time

Configuration:
* Queue argument: x-single-active-consumer=true
* RabbitMQ selects one consumer as active, others wait
* Automatic failover to standby consumer if active consumer dies
* RabbitMQ 4.0+: Consumer priority for SAC (higher priority consumers selected first)

Message Priority in Quorum Queues:

RabbitMQ 4.0+ feature:
* Quorum queues support exactly 2 priorities: high and normal
* No upfront declaration needed (unlike classic queues)
* Consumers receive 2:1 ratio of high to normal priority messages (avoid starvation)
* Publish with priority=5 (high) or priority=0/unset (normal)

Output:

3-node cluster topology with quorum queue replication
Stream queue configuration for high-throughput use cases
Consistent hashing exchange for sharding
Federation setup for multi-DC replication
SAC and message priority patterns

Token budget: ≤12000 tokens

Decision Rules

Exchange type selection:

Direct: Exact routing, one-to-one (e.g., task queues, RPC)
Topic: Pattern matching, one-to-many with hierarchical routing (e.g., event bus, audit logs)
Fanout: Broadcast, one-to-all (e.g., notifications, cache invalidation)
Headers: Route by message headers (rare, use topic instead)

Queue type selection:

Classic: Dev/test only (single node, non-replicated in RabbitMQ 4.x)
Quorum: Production (replicated, durable, Raft consensus, survives node failures)
Streams: High throughput + retention (append-only, multi-consumer reads, event streaming)

Clustering decisions:

Single node: Dev/test, <1000 msg/sec
3-node cluster: Production, high availability, survives 1 node failure
5-node cluster: Mission-critical, survives 2 node failures
7+ node cluster: Rare (Raft consensus overhead increases, consider federation instead)

Prefetch tuning:

1-10: Low throughput, fair distribution, consumer processing time >100ms
10-50: Medium throughput, balanced, consumer processing time 10-100ms
50-100: High throughput, consumer processing time <10ms

Error handling strategy:

Transient errors: nack + requeue=true (network timeout, downstream unavailable)
Permanent errors: nack + requeue=false → DLX (invalid data, schema mismatch)
Retry with backoff: DLX → TTL queue → re-route to main queue after delay
Poison messages: Delivery limit (default=20) → DLX for manual inspection

Abort conditions:

Quorum queue replication factor >cluster size → reduce to match node count
Prefetch >1000 → risk of consumer memory exhaustion
Classic queues in production → migrate to quorum queues for durability

Output Contract

Topology schema:

exchanges:
  - name: <exchange_name>
    type: direct|topic|fanout|headers
    durable: true|false
    auto_delete: true|false

queues:
  - name: <queue_name>
    type: classic|quorum|stream
    durable: true|false
    arguments:
      x-queue-type: quorum
      x-quorum-initial-group-size: 3
      x-delivery-limit: 20
      x-max-priority: 2  # RabbitMQ 4.0+ only
      x-single-active-consumer: true|false

bindings:
  - exchange: <exchange_name>
    queue: <queue_name>
    routing_key: <pattern>  # e.g., order.created, order.#, *

policies:
  - name: <policy_name>
    pattern: <queue_regex>
    definition:
      dead-letter-exchange: <dlx_exchange>
      dead-letter-routing-key: <dlx_routing_key>
      message-ttl: <milliseconds>
      delivery-limit: 20

Publisher config:

# Publisher confirms
channel.confirm_delivery()

# Persistent messages
channel.basic_publish(
    exchange='orders-exchange',
    routing_key='order.created',
    body=message,
    properties=pika.BasicProperties(
        delivery_mode=2,  # persistent
        priority=5        # high priority (RabbitMQ 4.0+)
    )
)

Consumer config:

# Manual ack + prefetch
channel.basic_qos(prefetch_count=10)

def callback(ch, method, properties, body):
    try:
        process(body)
        ch.basic_ack(delivery_tag=method.delivery_tag)
    except TransientError:
        ch.basic_nack(delivery_tag=method.delivery_tag, requeue=True)
    except PermanentError:
        ch.basic_nack(delivery_tag=method.delivery_tag, requeue=False)  # → DLX

channel.basic_consume(queue='order-processing-queue', on_message_callback=callback)

Required fields:

Topology: exchanges[], queues[], bindings[]
Exchange: name, type
Queue: name, type (classic/quorum/stream)
Binding: exchange, queue, routing_key

Examples

Example: E-commerce Order Processing with DLX

Topology:

Exchange: orders-exchange (topic)
Queue: order-processing-queue (quorum, x-quorum-initial-group-size=3)
DLX: dlx-exchange (topic)
DLX Queue: dlx-order-processing-queue (quorum, manual inspection)
Binding: order.created → order-processing-queue
DLX Binding: order.processing.failed → dlx-order-processing-queue

Policy (DLX config):

{
  "pattern": "order-processing-queue",
  "definition": {
    "dead-letter-exchange": "dlx-exchange",
    "dead-letter-routing-key": "order.processing.failed",
    "delivery-limit": 20
  }
}

Publisher:

channel.basic_publish(
    exchange='orders-exchange',
    routing_key='order.created',
    body=json.dumps(order),
    properties=pika.BasicProperties(delivery_mode=2)
)

Consumer:

def process_order(ch, method, properties, body):
    try:
        order = json.loads(body)
        # Process order (may fail)
        charge_payment(order)
        ch.basic_ack(delivery_tag=method.delivery_tag)
    except PaymentGatewayDown:  # Transient error
        ch.basic_nack(delivery_tag=method.delivery_tag, requeue=True)
    except InvalidPaymentMethod:  # Permanent error
        ch.basic_nack(delivery_tag=method.delivery_tag, requeue=False)  # → DLX

Quality Gates

Token budgets:

T1: ≤2000 tokens (single exchange + queue + basic config)
T2: ≤6000 tokens (multi-exchange + DLX + routing patterns)
T3: ≤12000 tokens (clustering + streams + federation)

Safety:

❌ Never: Hardcode credentials in topology definitions
❌ Never: Use classic queues for production (single node, no replication)
✅ Always: Enable publisher confirms for reliability
✅ Always: Use manual acks for consumers (process then ack)
✅ Always: Use quorum queues for durability (replicated, Raft consensus)

Auditability:

All topology definitions in version control (Git)
Policies defined via management API or config (not hardcoded queue arguments)
DLX queues monitored for poison messages
Consumer ack/nack rates tracked (avoid excessive requeues)

Determinism:

Same topology definition = same exchange/queue/binding creation
Quorum queue leader election deterministic (Raft)
Topic routing deterministic (same routing key → same queue)

Performance:

Prefetch tuned for consumer processing time (avoid memory exhaustion)
Quorum queue replication factor ≤ node count
Stream queues for >10k msg/sec throughput
Publisher confirms in batches for higher throughput (not individual)

Resources

Official Documentation:

RabbitMQ 4.1.0 release (Khepri metadata store, quorum queue enhancements): https://www.rabbitmq.com/blog/2025/04/15/rabbitmq-4.1.0-is-released (accessed NOW_ET)
Quorum queues: https://www.rabbitmq.com/docs/quorum-queues (accessed NOW_ET)
Exchanges and routing: https://www.rabbitmq.com/docs/exchanges (accessed NOW_ET)
Clustering: https://www.rabbitmq.com/docs/clustering (accessed NOW_ET)
Dead letter exchanges: https://www.rabbitmq.com/docs/dlx (accessed NOW_ET)
Publishers: https://www.rabbitmq.com/docs/publishers (accessed NOW_ET)
Consumers: https://www.rabbitmq.com/docs/consumers (accessed NOW_ET)

Client Libraries:

Python: pika (AMQP 0-9-1 client)
Java: amqp-client (official Java client)
Node.js: amqplib
Go: amqp091-go

Related Skills:

integration-messagequeue-designer: Generic message queue pattern selection
messaging-kafka-architect: Kafka-specific event streaming
microservices-pattern-architect: Saga, CQRS, event sourcing with RabbitMQ
observability-stack-configurator: Monitoring RabbitMQ with Prometheus + Grafana

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.

RabbitMQ Architecture Designer

# Description

# SKILL.md

RabbitMQ Architecture Designer

Purpose & When-To-Use

Pre-Checks

Procedure

T1: Basic RabbitMQ Topology (≤2k tokens, 80% use case)

T2: Multi-Exchange Routing + DLX Error Handling (≤6k tokens)

T3: Clustering + Streams + Advanced Patterns (≤12k tokens)

Decision Rules

Output Contract

Examples

Example: E-commerce Order Processing with DLX

Quality Gates

Resources

# Related Skills

# Supported AI Coding Agents

Confirm

Submit a Skill