Implement GitOps workflows with ArgoCD and Flux for automated, declarative Kubernetes...
npx skills add josavicentevw/ai-agent-skills --skill "architecture"
Install specific skill from multi-skill repository
# Description
Design, evaluate, and document software architectures including system design, design patterns, architecture patterns, scalability planning, and technology selection. Use when designing systems, choosing architectures, evaluating design decisions, or when user mentions architecture, system design, or scalability.
# SKILL.md
name: architecture
description: Design, evaluate, and document software architectures including system design, design patterns, architecture patterns, scalability planning, and technology selection. Use when designing systems, choosing architectures, evaluating design decisions, or when user mentions architecture, system design, or scalability.
Architecture
A comprehensive architecture skill that helps design, evaluate, and document software architectures for robust, scalable, and maintainable systems.
Quick Start
Basic architecture workflow:
# Understand requirements (functional + non-functional)
# Identify constraints and trade-offs
# Design system components and relationships
# Document architecture decisions
# Validate against requirements
Core Capabilities
1. System Architecture Design
Design complete system architectures:
- Monolithic: Single deployable unit
- Microservices: Distributed services architecture
- Serverless: Event-driven, function-based
- Event-Driven: Asynchronous message-based
- Layered: Separation of concerns in layers
- Hexagonal: Ports and adapters pattern
- CQRS: Command Query Responsibility Segregation
- Event Sourcing: State as sequence of events
2. Design Patterns
Apply proven design patterns:
Creational Patterns:
- Singleton, Factory, Builder, Prototype, Abstract Factory
Structural Patterns:
- Adapter, Bridge, Composite, Decorator, Facade, Proxy
Behavioral Patterns:
- Observer, Strategy, Command, State, Template Method, Chain of Responsibility
3. Architecture Quality Attributes
Evaluate and optimize for:
- Performance: Response time, throughput, resource usage
- Scalability: Horizontal and vertical scaling
- Availability: Uptime, fault tolerance, disaster recovery
- Security: Authentication, authorization, encryption, data protection
- Maintainability: Code quality, modularity, testability
- Reliability: Error handling, resilience, redundancy
- Usability: User experience, API design
- Observability: Logging, monitoring, tracing
4. Technology Selection
Evaluate and recommend technologies:
- Databases: SQL vs NoSQL, selection criteria
- Message Queues: Kafka, RabbitMQ, SQS
- Caching: Redis, Memcached, CDN
- API Protocols: REST, GraphQL, gRPC
- Cloud Platforms: AWS, Azure, GCP
- Containerization: Docker, Kubernetes
5. Architecture Documentation
Document architecture effectively:
- C4 Model: Context, Container, Component, Code diagrams
- Architecture Decision Records (ADRs): Document key decisions
- Data Flow Diagrams: How data moves through system
- Sequence Diagrams: Component interactions
- Deployment Diagrams: Infrastructure and deployment
Architecture Patterns
Microservices Architecture
┌─────────────────────────────────────────────────┐
│ API Gateway / BFF │
└────────┬──────────┬──────────┬──────────────────┘
│ │ │
┌────▼───┐ ┌───▼────┐ ┌──▼──────┐
│ User │ │ Order │ │ Payment │
│Service │ │Service │ │ Service │
└────┬───┘ └───┬────┘ └──┬──────┘
│ │ │
┌────▼───┐ ┌──▼─────┐ ┌──▼──────┐
│ User │ │ Order │ │ Payment │
│ DB │ │ DB │ │ DB │
└────────┘ └────────┘ └─────────┘
│ │ │
└────┴─────────┴──────────┴──────┘
Message Bus
Characteristics:
- Independent deployment and scaling
- Polyglot persistence
- Decentralized data management
- Resilience through isolation
- Technology diversity
Trade-offs:
- ✅ Independent scaling
- ✅ Technology flexibility
- ✅ Fault isolation
- ❌ Distributed system complexity
- ❌ Data consistency challenges
- ❌ Operational overhead
Event-Driven Architecture
┌──────────┐ ┌──────────────┐ ┌──────────┐
│ Producer │─────▶│ Event Bus │─────▶│Consumer 1│
└──────────┘ │ (Kafka/SNS) │ └──────────┘
└───────┬──────┘
│
┌────▼──────┐
│Consumer 2 │
└───────────┘
Characteristics:
- Asynchronous communication
- Loose coupling between components
- Scalable event processing
- Event replay capability
Use Cases:
- Real-time data processing
- Microservices integration
- IoT systems
- Activity tracking
Layered Architecture
┌─────────────────────────────────┐
│ Presentation Layer │ ← Controllers, Views
├─────────────────────────────────┤
│ Business Logic Layer │ ← Services, Domain
├─────────────────────────────────┤
│ Data Access Layer │ ← Repositories, DAOs
├─────────────────────────────────┤
│ Database Layer │ ← Database
└─────────────────────────────────┘
Characteristics:
- Clear separation of concerns
- Each layer has specific responsibility
- Dependencies flow downward
- Easy to understand and maintain
Hexagonal Architecture (Ports & Adapters)
┌─────────────────────────┐
│ External Systems │
└───────────┬─────────────┘
│
┌───────────▼─────────────┐
│ Adapters │ ← HTTP, CLI, Message Queue
└───────────┬─────────────┘
│
┌───────────▼─────────────┐
│ Ports │ ← Interfaces
└───────────┬─────────────┘
│
┌───────────▼─────────────┐
│ Domain Logic │ ← Core Business Logic
└───────────┬─────────────┘
│
┌───────────▼─────────────┐
│ Ports │ ← Interfaces
└───────────┬─────────────┘
│
┌───────────▼─────────────┐
│ Adapters │ ← Database, APIs, File System
└───────────┬─────────────┘
│
┌───────────▼─────────────┐
│ External Systems │
└─────────────────────────┘
Characteristics:
- Domain logic independent of external concerns
- Testable in isolation
- Flexible adapter implementation
- Clear boundaries
Scalability Patterns
Horizontal Scaling
┌─────────────┐
│Load Balancer│
└──────┬──────┘
┌────────┼────────┐
┌────▼───┐ ┌─▼────┐ ┌─▼─────┐
│Server 1│ │Server│ │Server │
│ │ │ 2 │ │ 3 │
└────┬───┘ └─┬────┘ └─┬─────┘
└───────┴────────┘
│
┌──────▼──────┐
│ Database │
└─────────────┘
Techniques:
- Load balancing
- Stateless services
- Shared data layer
- Session management
Caching Strategy
Client ──▶ CDN ──▶ API Server ──▶ Redis ──▶ Database
(Static) (Cache) (Cache) (Source)
Cache Levels:
1. CDN: Static assets
2. Application Cache: Query results, computed data
3. Database Cache: Query cache
Cache Patterns:
- Cache-Aside: Application manages cache
- Read-Through: Cache loads data automatically
- Write-Through: Write to cache and DB
- Write-Behind: Async writes to DB
Database Scaling
Vertical Scaling:
- Increase server resources
- Limited by hardware
Horizontal Scaling:
- Replication: Master-Slave, Multi-Master
- Sharding: Partition data across servers
- CQRS: Separate read and write databases
Architecture Decision Framework
Decision Template
# Decision: [Title]
## Context
- What problem are we solving?
- What are the constraints?
- What are the requirements?
## Options Considered
### Option 1: [Name]
**Pros:**
- Pro 1
- Pro 2
**Cons:**
- Con 1
- Con 2
**Estimated Effort:** [Low/Medium/High]
**Risk Level:** [Low/Medium/High]
### Option 2: [Name]
[Same structure]
## Decision
We chose [Option X] because [reasoning].
## Consequences
- Positive: [benefits]
- Negative: [trade-offs]
- Risks: [what could go wrong]
- Mitigation: [how to address risks]
## Validation
How will we validate this decision?
Common Architecture Patterns
API Gateway Pattern
"""
API Gateway centralizes external requests and routes to services.
"""
class APIGateway:
def __init__(self):
self.user_service = UserService()
self.order_service = OrderService()
self.auth_service = AuthService()
async def handle_request(self, request: Request) -> Response:
# Authentication
if not await self.auth_service.authenticate(request):
return Response(status=401)
# Rate limiting
if not await self.rate_limiter.check(request.user_id):
return Response(status=429)
# Route to appropriate service
if request.path.startswith('/users'):
return await self.user_service.handle(request)
elif request.path.startswith('/orders'):
return await self.order_service.handle(request)
return Response(status=404)
Circuit Breaker Pattern
"""
Circuit breaker prevents cascading failures.
"""
class CircuitBreaker:
def __init__(self, failure_threshold=5, timeout=60):
self.failure_threshold = failure_threshold
self.timeout = timeout
self.failures = 0
self.last_failure_time = None
self.state = 'CLOSED' # CLOSED, OPEN, HALF_OPEN
async def call(self, func, *args, **kwargs):
if self.state == 'OPEN':
if self._should_attempt_reset():
self.state = 'HALF_OPEN'
else:
raise CircuitBreakerOpen('Service unavailable')
try:
result = await func(*args, **kwargs)
self._on_success()
return result
except Exception as e:
self._on_failure()
raise
def _on_success(self):
self.failures = 0
self.state = 'CLOSED'
def _on_failure(self):
self.failures += 1
self.last_failure_time = time.time()
if self.failures >= self.failure_threshold:
self.state = 'OPEN'
def _should_attempt_reset(self):
return (time.time() - self.last_failure_time) >= self.timeout
Repository Pattern
"""
Repository pattern abstracts data access.
"""
from abc import ABC, abstractmethod
from typing import List, Optional
class UserRepository(ABC):
@abstractmethod
async def find_by_id(self, user_id: int) -> Optional[User]:
pass
@abstractmethod
async def find_by_email(self, email: str) -> Optional[User]:
pass
@abstractmethod
async def save(self, user: User) -> User:
pass
@abstractmethod
async def delete(self, user_id: int) -> bool:
pass
class SQLUserRepository(UserRepository):
def __init__(self, db_session):
self.db = db_session
async def find_by_id(self, user_id: int) -> Optional[User]:
result = await self.db.execute(
"SELECT * FROM users WHERE id = ?", (user_id,)
)
row = result.fetchone()
return User.from_row(row) if row else None
async def save(self, user: User) -> User:
if user.id:
await self.db.execute(
"UPDATE users SET name = ?, email = ? WHERE id = ?",
(user.name, user.email, user.id)
)
else:
result = await self.db.execute(
"INSERT INTO users (name, email) VALUES (?, ?)",
(user.name, user.email)
)
user.id = result.lastrowid
return user
System Design Process
1. Requirements Gathering
Functional Requirements:
- What should the system do?
- What features are needed?
- What are the use cases?
Non-Functional Requirements:
- Performance: Latency, throughput targets
- Scale: Expected users, data volume
- Availability: Uptime requirements
- Security: Compliance, data protection
- Cost: Budget constraints
2. Capacity Planning
Users: 10 million
Daily Active Users: 1 million
Requests per second: 1M users × 10 requests/day / 86400 seconds ≈ 116 RPS
Peak traffic (3x average): 350 RPS
Data:
- Per user: 1 KB metadata + 100 KB content
- Total: 10M × 101 KB ≈ 1 TB
Bandwidth:
- Request size: 1 KB
- Response size: 10 KB
- Bandwidth: 350 RPS × 11 KB ≈ 3.85 MB/s ≈ 30 Mbps
3. High-Level Design
┌─────────┐
│ Client │
└────┬────┘
│
┌────▼──────────┐
│ CDN │ (Static content)
└────┬──────────┘
│
┌────▼──────────┐
│ Load Balancer │
└────┬──────────┘
│
┌────▼──────────┐
│ Web Servers │ (3+ instances)
└────┬──────────┘
│
┌────▼──────────┐
│ App Servers │ (5+ instances)
└────┬──────────┘
│
┌────▼──────────┬────────────┐
│ │ │
│ Cache │ Database │ Message
│ (Redis) │ (Master/ │ Queue
│ │ Slaves) │ (Kafka)
└──────────────┴────────────┴─────────┘
4. Detailed Component Design
Design each component with:
- Inputs and outputs
- Data models
- API contracts
- Error handling
- Monitoring
5. Identify Bottlenecks
- Single points of failure
- Performance bottlenecks
- Scaling limitations
- Data consistency issues
6. Optimization
- Caching strategy
- Database indexing
- Load balancing
- CDN for static content
- Async processing
- Connection pooling
Best Practices
- Start Simple: Begin with simplest architecture that works
- Design for Failure: Assume components will fail
- Loose Coupling: Minimize dependencies between components
- High Cohesion: Group related functionality
- Separation of Concerns: Each component has single responsibility
- Document Decisions: Use ADRs for important choices
- Consider Trade-offs: Every decision has pros and cons
- Plan for Scale: Design with growth in mind
- Security First: Build security in from the start
- Measure Everything: Observability is crucial
When to Use This Skill
Use this skill when:
- Designing new systems
- Evaluating architecture options
- Planning system migrations
- Addressing scalability issues
- Making technology decisions
- Documenting architecture
- Conducting architecture reviews
- Planning for growth
- Solving system design problems
- Training team on architecture patterns
Examples
See EXAMPLES.md for complete architecture examples including:
- E-commerce system design
- Social media platform
- Video streaming service
- Real-time analytics system
- Multi-tenant SaaS application
For architecture templates, see templates/.
For architecture decision records, see adr/.
# Supported AI Coding Agents
This skill is compatible with the SKILL.md standard and works with all major AI coding agents:
Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.