williamzujkowski

Message Queue Pattern Designer

3
0
# Install this skill:
npx skills add williamzujkowski/cognitive-toolworks --skill "Message Queue Pattern Designer"

Install specific skill from multi-skill repository

# Description

Metrics, alarms, and observability configuration

# SKILL.md


name: Message Queue Pattern Designer
slug: integration-messagequeue-designer
description: Design message queue patterns for RabbitMQ, Kafka, SQS, Azure Service Bus with dead-letter queues, idempotency, ordering guarantees, and backpressure
capabilities:
- Queue/topic topology design for distributed systems
- Producer/consumer pattern implementation
- Exactly-once and at-least-once semantics configuration
- Dead-letter queue and retry strategies
- Partition/shard strategy for ordering and throughput
- Backpressure and consumer scaling patterns
inputs:
queue_system:
type: enum
values: [rabbitmq, kafka, sqs, azure-servicebus, pubsub]
required: true
pattern:
type: enum
values: [publish-subscribe, work-queue, request-reply, saga]
required: true
guarantees:
type: enum
values: [at-least-once, at-most-once, exactly-once]
required: true
throughput_estimate:
type: string
description: "Expected message rate (e.g., '10k msg/sec')"
required: false
ordering_scope:
type: enum
values: [global, partition-key, none]
required: false
outputs:
queue_config:
type: object
description: Topic/queue/exchange definitions with partitioning
producer_code:
type: code_snippet
description: Message publishing with idempotency keys
consumer_code:
type: code_snippet
description: Message consumption with error handling and retries
monitoring:
type: object
description: Metrics, alarms, and observability configuration
keywords:
- message-queue
- kafka
- rabbitmq
- sqs
- azure-service-bus
- pubsub
- dead-letter-queue
- idempotency
- exactly-once
- backpressure
version: 1.0.0
owner: william
license: MIT
security:
risk_level: low
notes: "Outputs infrastructure code; user must secure credentials separately"
links:
- https://kafka.apache.org/documentation/
- https://www.rabbitmq.com/getstarted.html
- https://www.enterpriseintegrationpatterns.com/patterns/messaging/
- https://aws.amazon.com/sqs/
- https://learn.microsoft.com/en-us/azure/service-bus-messaging/



Purpose & When-To-Use

Use this skill when:

  • Designing event-driven architectures with publish-subscribe or work-queue patterns
  • Implementing saga orchestration, CQRS event sourcing, or async request-reply
  • Ensuring message delivery guarantees (at-least-once, exactly-once) across distributed services
  • Configuring dead-letter queues, retry policies, and idempotency for resilience
  • Optimizing throughput and ordering via partitioning or consumer groups
  • Migrating between queue systems (e.g., RabbitMQ → Kafka) or multi-cloud setups

Do not use for:

  • Synchronous RPC (use gRPC, REST)
  • In-memory queues (use language-native channels)
  • Real-time streaming analytics requiring sub-100ms latency (consider Apache Flink or Spark Streaming)

Pre-Checks

  1. Time normalization: Compute NOW_ET using NIST/time.gov semantics (America/New_York, ISO-8601).
  2. Input validation:
  3. queue_system must be one of: rabbitmq, kafka, sqs, azure-servicebus, pubsub
  4. pattern must be one of: publish-subscribe, work-queue, request-reply, saga
  5. guarantees must be one of: at-least-once, at-most-once, exactly-once
  6. If guarantees = exactly-once, verify target system supports it (Kafka, SQS FIFO, Azure Service Bus sessions)
  7. Source freshness: Confirm documentation links resolve and match claimed semantics (accessed NOW_ET).

Procedure

Tier 1 (≤2k tokens): Basic Queue/Topic Configuration

Scope: Single queue/topic with default reliability; common 80% case.

  1. Select topology:
  2. Publish-Subscribe: Topic/fanout exchange with multiple subscribers
  3. Work-Queue: Single queue with competing consumers for load balancing
  4. Request-Reply: Temporary reply-to queues or correlation IDs
  5. Saga: Choreography via event topic or orchestration via command queues

  6. Configure queue/topic:

  7. Kafka: Create topic with num.partitions based on throughput and ordering scope
  8. RabbitMQ: Declare exchange type (fanout, topic, direct) + queues with bindings
  9. SQS: Standard queue (best-effort ordering) or FIFO queue (ordering + deduplication)
  10. Azure Service Bus: Queue (point-to-point) or Topic/Subscription (pub-sub)
  11. Google Pub/Sub: Topic with push/pull subscriptions

  12. Set retention/TTL:

  13. Kafka: retention.ms (default 7 days)
  14. RabbitMQ: x-message-ttl and x-expires for queues
  15. SQS: MessageRetentionPeriod (1 min to 14 days)
  16. Azure Service Bus: DefaultMessageTimeToLive
  17. Pub/Sub: messageRetentionDuration (default 7 days)

  18. Emit basic producer/consumer snippets (idiomatic SDK usage).

Output: Minimal config + code for immediate deployment.


Tier 2 (≤6k tokens): Advanced Patterns (DLQ, Idempotency, Ordering, Retry)

Scope: Production-grade reliability with error handling.

  1. Dead-Letter Queue (DLQ):
  2. Kafka: No native DLQ; implement via exception handler writing to separate topic
  3. RabbitMQ: Set x-dead-letter-exchange on queue
  4. SQS: Configure RedrivePolicy with maxReceiveCount
  5. Azure Service Bus: Enable DeadLetteringOnMessageExpiration and MaxDeliveryCount
  6. Pub/Sub: Set deadLetterPolicy on subscription

  7. Idempotency:

  8. Producer: Include unique message-id or idempotency-key header
  9. Consumer: Store processed IDs in fast cache (Redis) or database with TTL
  10. Exactly-once (Kafka): Enable enable.idempotence=true + transactional producer (transactional.id)
  11. SQS FIFO: Use MessageDeduplicationId (5-minute dedup window)

  12. Ordering guarantees:

  13. Global ordering: Kafka single partition, SQS FIFO, Azure Service Bus sessions
  14. Partition-key ordering: Kafka partition by key, Pub/Sub ordering key
  15. None: Parallel consumers for max throughput

  16. Retry strategy:

  17. Exponential backoff: 1s → 2s → 4s → 8s → DLQ
  18. Visibility timeout (SQS): Extend during processing to prevent duplicate delivery
  19. RabbitMQ: Use x-retry plugin or TTL + DLX loop
  20. Kafka: Manual commit after success; rewind offset on transient errors

  21. Cite sources (accessed NOW_ET):

  22. Kafka idempotence: https://kafka.apache.org/documentation/#producerconfigs_enable.idempotence (accessed 2025-10-26T02:31:20)
  23. RabbitMQ DLQ: https://www.rabbitmq.com/dlx.html (accessed 2025-10-26T02:31:20)
  24. Enterprise Integration Patterns: https://www.enterpriseintegrationpatterns.com/patterns/messaging/DeadLetterChannel.html (accessed 2025-10-26T02:31:20)
  25. AWS SQS FIFO: https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/FIFO-queues.html (accessed 2025-10-26T02:31:20)

Output: Config + code with DLQ, idempotency, retry, and ordering.


Tier 3 (≤12k tokens): Complex Scenarios (Backpressure, Consumer Scaling, Monitoring)

Scope: High-throughput, multi-region, or regulated environments.

  1. Backpressure handling:
  2. Kafka: Tune fetch.min.bytes, fetch.max.wait.ms to batch efficiently; monitor consumer lag
  3. RabbitMQ: Use prefetch_count to limit unacknowledged messages per consumer
  4. SQS: Implement exponential backoff when ReceiveMessage returns empty
  5. Azure Service Bus: Set MaxConcurrentCalls on message receiver
  6. Pub/Sub: Use flow control settings (maxMessages, maxBytes)

  7. Consumer scaling:

  8. Kafka: Add consumers to group up to num.partitions; beyond that, add partitions
  9. RabbitMQ: Horizontal scaling via multiple workers on same queue
  10. SQS: Spawn workers based on ApproximateNumberOfMessages metric
  11. Azure Service Bus: Scale subscription consumers independently
  12. Autoscaling: Trigger on queue depth or consumer lag (e.g., KEDA for Kubernetes)

  13. Monitoring and alerting:

  14. Metrics to track:
    • Queue depth / consumer lag (critical)
    • Message throughput (published vs. consumed)
    • DLQ message count
    • Processing latency (end-to-end)
    • Error rate (exceptions, retries)
  15. Tools:
    • Kafka: Burrow, Confluent Control Center, Prometheus JMX exporter
    • RabbitMQ: Management UI, Prometheus plugin
    • SQS: CloudWatch metrics (ApproximateNumberOfMessagesVisible, ApproximateAgeOfOldestMessage)
    • Azure Service Bus: Azure Monitor metrics
    • Pub/Sub: Cloud Monitoring (Stackdriver)
  16. Alerts:

    • Consumer lag > 10k messages for >5 minutes
    • DLQ depth > 100
    • Processing latency p95 > 30s
  17. Multi-region / disaster recovery:

  18. Kafka: MirrorMaker 2 or cluster linking for replication
  19. RabbitMQ: Federation or Shovel plugin
  20. SQS: Cross-region replication not native; use Lambda or custom replicator
  21. Azure Service Bus: Geo-disaster recovery pairing
  22. Pub/Sub: Regional topics; replicate via Dataflow

  23. Security and compliance:

  24. Encryption in transit: TLS/SSL for all systems
  25. Encryption at rest: Kafka encrypted disks, SQS KMS, Azure Service Bus customer-managed keys
  26. Access control: Kafka ACLs, RabbitMQ user permissions, IAM policies (SQS, Pub/Sub), Azure RBAC
  27. Audit logging: Enable broker audit logs and consumer access logs

Output: Full architecture with scaling, monitoring, DR, and compliance.


Decision Rules

  • If guarantees = exactly-once and system does not support it: Warn user and downgrade to at-least-once with idempotency implementation.
  • If ordering_scope = global and throughput_estimate > 10k msg/sec: Suggest partition-key ordering to parallelize.
  • If pattern = saga and no compensation logic provided: Emit skeleton event handlers with TODO: implement compensation.
  • If DLQ depth exceeds threshold: Alert and recommend manual review or automated replay with fixes.
  • Abort if:
  • Required input (queue_system, pattern, guarantees) is missing or invalid.
  • Source documentation is unreachable or contradicts claimed semantics.

Output Contract

{
  queue_config: {
    system: string,               // e.g., "kafka"
    topology: {
      topics?: Array<{
        name: string,
        partitions: number,
        replication_factor: number,
        config: Record<string, any>
      }>,
      queues?: Array<{
        name: string,
        durable: boolean,
        dlq?: string,
        config: Record<string, any>
      }>,
      exchanges?: Array<{        // RabbitMQ
        name: string,
        type: string,
        bindings: Array<any>
      }>
    }
  },
  producer_code: string,          // Language-agnostic or specified language
  consumer_code: string,          // Includes error handling, retries, idempotency
  monitoring: {
    metrics: Array<string>,       // e.g., ["consumer_lag", "dlq_depth"]
    alerts: Array<{
      condition: string,
      threshold: number,
      action: string
    }>
  },
  sources: Array<{
    title: string,
    url: string,
    accessed: string              // ISO-8601 timestamp = NOW_ET
  }>
}

Required fields:
- queue_config.system
- queue_config.topology (non-empty)
- producer_code
- consumer_code
- monitoring.metrics (≥2)
- sources (≥2 for T2+)


Examples

Input:

queue_system: kafka
pattern: publish-subscribe
guarantees: exactly-once
throughput_estimate: 50k msg/sec
ordering_scope: partition-key

Output (Kafka Topic Config with Partitioning):

{
  "queue_config": {
    "system": "kafka",
    "topology": {
      "topics": [{
        "name": "orders.events",
        "partitions": 12,
        "replication_factor": 3,
        "config": {
          "min.insync.replicas": 2,
          "retention.ms": 604800000,
          "compression.type": "snappy"
        }
      }]
    }
  },
  "producer_code": "props.put(\"enable.idempotence\", true);\nprops.put(\"transactional.id\", \"order-producer-1\");\nProducerRecord<String, Order> record = new ProducerRecord<>(\"orders.events\", order.getCustomerId(), order);",
  "consumer_code": "props.put(\"isolation.level\", \"read_committed\");\nconsumer.subscribe(\"orders.events\");\nwhile(true) {\n  records = consumer.poll(100);\n  for (record : records) {\n    processOrder(record.value());\n  }\n  consumer.commitSync();\n}",
  "monitoring": {
    "metrics": ["consumer_lag", "partition_throughput"],
    "alerts": [{"condition": "lag > 10000", "threshold": 10000, "action": "scale_consumers"}]
  }
}

Quality Gates

  1. Token budgets:
  2. T1 response ≤ 2k tokens
  3. T2 response ≤ 6k tokens
  4. T3 response ≤ 12k tokens

  5. Safety:

  6. No hardcoded credentials in output
  7. Warn if encryption-at-rest is disabled in production scenarios

  8. Auditability:

  9. All T2+ outputs include ≥2 sources with access dates = NOW_ET
  10. Config includes version/schema metadata

  11. Determinism:

  12. Same inputs → same config structure (semantic equivalence)
  13. Non-deterministic: partition count may vary based on throughput heuristic

  14. Example constraints:

  15. Example code ≤30 lines
  16. Runnable or clear pseudo-code with language annotation

Resources

  • Apache Kafka Documentation (official): https://kafka.apache.org/documentation/ (accessed 2025-10-26T02:31:20)
  • RabbitMQ Getting Started (official): https://www.rabbitmq.com/getstarted.html (accessed 2025-10-26T02:31:20)
  • Enterprise Integration Patterns (Hohpe & Woolf): https://www.enterpriseintegrationpatterns.com/patterns/messaging/ (accessed 2025-10-26T02:31:20)
  • AWS SQS Developer Guide (official): https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/ (accessed 2025-10-26T02:31:20)
  • Azure Service Bus Messaging (official): https://learn.microsoft.com/en-us/azure/service-bus-messaging/ (accessed 2025-10-26T02:31:20)
  • Google Cloud Pub/Sub Documentation (official): https://cloud.google.com/pubsub/docs (accessed 2025-10-26T02:31:20)
  • Kafka Idempotence and Transactions (Confluent): https://www.confluent.io/blog/exactly-once-semantics-are-possible-heres-how-apache-kafka-does-it/ (accessed 2025-10-26T02:31:20)
  • RabbitMQ Dead Letter Exchanges: https://www.rabbitmq.com/dlx.html (accessed 2025-10-26T02:31:20)
  • SQS FIFO Queues: https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/FIFO-queues.html (accessed 2025-10-26T02:31:20)

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.