sofka-architecture-tobe

by @javiermontano-sofka in Tools

# Install this skill:

npx skills add javiermontano-sofka/sdf --skill "sofka-architecture-tobe"

Install specific skill from multi-skill repository

# Description

# SKILL.md

name: sofka-architecture-tobe
description: >
Target state (TO-BE) architecture design — C4 L2 containers, ADRs, nightmare scenario mitigations, MVP component, phased Strangler Fig migration.
Use when the user asks to "design the target architecture", "create a TO-BE architecture", "plan a migration strategy",
"define ADRs for a new system", "mitigate nightmare scenarios", or mentions Strangler Fig, C4 diagrams, saga pattern, anti-corruption layer, or legacy modernization.
model: opus
context: fork
allowed-tools:
- Read
- Write
- Edit
- Glob
- Grep
- Bash

TO-BE Architecture Design

Designs future-state architecture enabling legacy system gradual retirement while maintaining zero data loss and regulatory compliance. Produces C4 L2 container diagram, 6+ ADRs, nightmare scenario mitigations, MVP component design, and phased Strangler Fig migration plan.

Principio Rector

Diseñar el futuro sin entender el presente es ficción arquitectónica. La arquitectura TO-BE se construye sobre evidencia del AS-IS, decisiones del escenario aprobado, y restricciones validadas en feasibility. Cada container, cada servicio, cada patrón tiene un WHY documentado en un ADR. No se diseña para impresionar — se diseña para migrar.

Filosofía de Arquitectura Objetivo

Migración > revolución. Strangler Fig, no big-bang. Cada fase de migración es independientemente reversible y valiosa.
Nightmare-first design. Antes de celebrar el happy path, se modelan los 5 peores escenarios y se diseñan mitigaciones. Si no sobrevive el nightmare, no se construye.
El MVP prueba la arquitectura. El primer componente desplegado valida patterns, performance, y operational readiness. Si el MVP falla, la arquitectura se ajusta antes de escalar.

Inputs

The user provides a system or project name as $ARGUMENTS. Parse $1 as the system/project name used throughout all output artifacts.

Parameters:
- {MODO}: piloto-auto (default) | desatendido | supervisado | paso-a-paso
- piloto-auto: Auto para C4 y trade-offs, HITL para ADRs y nightmare scenarios.
- desatendido: Cero interrupciones. Arquitectura completa auto-generada.
- supervisado: Autónomo con checkpoint en ADRs y migration plan.
- paso-a-paso: Confirma cada layer, cada ADR, cada nightmare, y la migration.
- {FORMATO}: markdown (default) | html | dual
- {VARIANTE}: ejecutiva (~40% — C4 diagram + trade-offs + migration phases) | técnica (full, default)

Before generating architecture, detect codebase context:

!find . -name "*.ts" -o -name "*.java" -o -name "*.py" -o -name "*.go" -o -name "*.cs" -o -name "*.tf" | head -30

Load reference materials for detailed ADR templates and nightmare scenario patterns:

Read ${CLAUDE_SKILL_DIR}/references/adr-templates.md
Read ${CLAUDE_SKILL_DIR}/references/nightmare-scenarios.md

Assumptions & Limits

Requires: Approved scenario from prior discovery/analysis. Architecture is tailored to selected scenario constraints (Conservative/Moderate/Aggressive). Without prior approval, architecture design is premature.

Cloud-eligible: Assumes workloads are cloud-ready. If on-prem only, adjust: replace Kubernetes with VM-based deployment; replace managed cloud services with self-hosted equivalents.

Blueprint-level scope: Architecture is L2 container diagram + design decisions. Does NOT include class diagrams, API implementation code, or data schema DDL. Those are detailed design deliverables.

Performance unvalidated: Architecture proposes patterns (Saga, CQRS, Event Sourcing) but cannot validate performance without PoC. Include spike tasks to validate latency, throughput, consistency guarantees.

Team skill availability: Assumes team can adopt microservices patterns. If team has no microservices experience, architecture risk increases (flag in ADRs; add training phase).

Legacy system data accessibility: Assumes legacy systems expose data via API or database connector. If legacy is pure black box, Sidecar pattern becomes required (adds complexity/cost).

Conditional Logic per Scenario

IF Conservative scenario selected:
- Monolith-first for new services (not microservices)
- 2-3 services max
- Strangler Fig velocity: 10-20% replacement over 18 months
- ADRs emphasize stability & risk mitigation

IF Moderate scenario selected:
- Hybrid: 3-5 microservices + legacy ACL
- Strangler Fig velocity: 40-60% replacement over 12 months
- Balanced trade-offs

IF Aggressive scenario selected:
- Full microservices: one per DDD domain (6-10 services)
- Event-driven + CQRS + Event Sourcing from day one
- Strangler Fig velocity: 60%+ replacement over 9 months
- Assumes experienced team

IF budget is constrained:
- Defer Event Sourcing; start with CRUD + immutable audit log
- Skip Istio; use simpler networking (Linkerd or CNI-only)
- ADR trade-offs favor cost over operational sophistication

IF team expertise unavailable:
- Saga (Orchestration) instead of Choreography
- Training phase: +2-3 weeks
- Flag: "Team risk elevated; recommend external coaching"

Edge Cases & Workarounds

Client mandates specific technology: Override choice; document constraint in relevant ADR with trade-off analysis.

No cloud allowed: Replace K8s with VM-based deployment (Terraform + Ansible) or on-prem K8s (OpenShift). Replace managed services with self-hosted. Operational overhead increases.

Team has no microservices experience: Start with monolith + modular boundaries (DDD packages). Plan 2-week training phase. Add risk flag: "Learning curve increases deployment risk by 20%."

Legacy system has no API: Sidecar becomes critical path. Wrap legacy in process bridge; expose via message queue or REST. Data extraction via database-level CDC if schema accessible. Cost & complexity increase significantly.

Multiple legacy systems with conflicting schemas: Federated data model; ACL per system maintains logical separation. EDW aggregates for analytics (separate from OLTP). Data consistency becomes eventual.

Regulatory mandate for on-prem data residency: Data layer stays on-prem. Application layer can be cloud. Encrypted tunnel (VPN/DX) between layers. Document in Zero Trust and Data Storage ADRs.

Architecture Decision Framework

When to split a service vs keep monolithic:
- Core domain (per DDD) -> Dedicated microservice (owns data + logic)
- Supporting domain -> Shared service or library module
- Generic domain -> Buy/SaaS (payment gateway, SMS, auth vendor)

Service boundary heuristics:
- Align with DDD bounded contexts
- Team ownership: one team per service (2-pizza rule)
- Data ownership: no shared databases (polyglot persistence per service)
- Deploy independently: release without coordinating other services

When CQRS adds value vs over-engineering:
- USE CQRS IF: Read/write ratio >10:1, strong consistency NOT required for reads, audit trail is legal requirement
- SKIP CQRS IF: Simple CRUD (ratio ~1:1), consistency critical for all reads, team unfamiliar with event sourcing

Delivery Structure

1. C4 Level 2 Container Architecture

Produce a layered architecture diagram with these layers:

Consumer Layer: Web (SPA), Mobile, Partner APIs
API Gateway Layer: OAuth2/OIDC validation, rate limiting, mTLS, request routing/logging
Microservices Mesh: Domain services (Auth, Account, Transaction Orchestrator, Payment, Audit, Notification), service mesh (mTLS, circuit breaking), message bus (event streaming, saga choreography)
Anti-Corruption Layer: ACL adapters (protocol translation), sidecar pattern (legacy wrapping), data normalization
Legacy System Layer: Existing systems (gradual replacement via Strangler Fig)
Data Layer: Polyglot persistence (PostgreSQL for OLTP, Redis for cache/locks, Elasticsearch for audit/search, S3/Blob for data lake)
Infrastructure & Observability: Container orchestration (K8s), service mesh, logging/tracing/metrics, GitOps

2. Trade-off Matrix

Document trade-offs for each major decision:

Decision	TO-BE Choice	Alternative 1	Alternative 2	Trade-off
Service Communication	Event-Driven + REST	Pure REST	Pure Event-Driven	Complexity vs Resilience
Consistency Model	Saga + Local Transactions	Distributed Transactions	2PC	Latency vs Strong Consistency
Security	Zero Trust (mTLS + OAuth2)	Network Perimeter	API Keys Only	Ops Complexity vs Security
Legacy Integration	ACL + Sidecar + Strangler Fig	Big Bang	No Migration	Time-to-Modern vs Risk
Data Storage	Event Sourcing + CQRS	Traditional CRUD	Snapshot-only	Complexity vs Audit Trail
Deployment	Kubernetes + GitOps	Traditional VMs	Serverless	Ops Overhead vs Control

3. ADRs (6+ minimum)

Each ADR includes:
- Decision: What was chosen
- Alternatives Considered: Pros, cons, why rejected for each
- Trade-off: What we gain, what we lose, what we assume
- Consequences: Positive, negative, neutral

Minimum ADR topics:
1. Distributed transaction strategy (Saga vs 2PC)
2. Legacy integration pattern (ACL + Sidecar)
3. Security model (Zero Trust, mTLS)
4. Data storage & audit trail (Event Sourcing + CQRS)
5. Deployment platform (Kubernetes + GitOps)
6. Caching & session management (Redis)

See ${CLAUDE_SKILL_DIR}/references/adr-templates.md for detailed ADR templates with banking/enterprise examples.

4. Nightmare Scenarios (5 minimum)

For each scenario, document: Problem, Trigger Conditions, Mitigations (Defense in Depth), Monitoring & Early Detection (alert definitions), Acceptance Criteria (Go/No-Go).

Minimum scenarios:
1. Ghost Transaction — Payment commits in one service, fails in another. Mitigations: Saga, Outbox, idempotency keys, reconciliation, DLQ, immutable audit log.
2. Schema Drift — Legacy schema changes unannounced. Mitigations: Consumer-driven contracts, schema registry, data fingerprinting, ACL validation.
3. Auth Service Unavailable — OAuth2 down, all requests rejected. Mitigations: Token caching + grace period, multi-replica, circuit breaker, emergency mode, identity provider redundancy.
4. Cascade Failure — One service crashes, ripple effect takes down others. Mitigations: Circuit breaker, bulkhead, rate limiting + backpressure, service mesh retry, Kafka consumer lag monitoring.
5. Legacy System Corruption — Buggy ACL sends malformed data to legacy. Mitigations: Pre-flight validation, dry-run mode, rollback from snapshots, change control, post-transfer data validation.

See ${CLAUDE_SKILL_DIR}/references/nightmare-scenarios.md for detailed monitoring alert definitions and acceptance criteria per scenario.

5. MVP Component

Design the first deployable component (typically Authentication & Session Management):
- Architecture diagram (Client -> API Gateway -> AuthService -> Legacy Identity Provider)
- API contracts (OpenAPI): login, refresh, logout, verify
- Data model (PostgreSQL): users, user_roles, session_events
- Resilience patterns: idempotency, circuit breaker (LDAP fallback), saga (audit + session atomic), caching (Redis TTL)

6. Phased Migration (Strangler Fig)

Phase 1: Assessment & Wrapping (Months 1-2)
- Document legacy integrations, data flows, implicit rules
- Build sidecar pattern and ACL adapters
- Establish monitoring/logging baseline

Phase 2: Modern Service Introduction (Months 3-5)
- Deploy MVP (Auth Service) with canary (10%)
- Maintain legacy as fallback; parallel processing + result comparison
- Shadow mode: process all requests, compare results, don't apply to production

Phase 3: Capability Migration (Months 6-12)
- Migrate remaining domain services
- Deploy saga pattern for distributed transactions
- Implement event streaming (Kafka) for real-time consistency
- Reconciliation service for daily batch validation

Phase 4: Legacy Sunset (Months 13+)
- Stop writing to legacy (read-only mode)
- Archive historical data to data lake
- Maintain ACL for compliance/audit queries only
- Decommission legacy infrastructure

Migration Risk Mitigation:

Phase	Risk	Mitigation
1	Incorrect legacy understanding	Detailed discovery + technical archaeology
2	Canary impacts production	Shadow mode: compare but don't apply
3	Data inconsistency	Reconciliation service + nightly batch
4	Legacy data inaccessibility	Data lake backup + read-only ACL

RACI Matrix

Activity	Delivery Team	Client	Shared
Architecture Design & ADRs	R,A	C	I
Legacy System Documentation	C	R,A	I
Microservice Development	R	C	A
Testing & QA	C	R,A	I
Deployment & Infrastructure	R	C	A
Security Review & mTLS Setup	R	C	A
Migration Planning & Execution	A	R	C
Training & Knowledge Transfer	R	A	C
Production Support (6 months)	R	C	A

Legend: R=Responsible, A=Accountable, C=Consulted, I=Informed

Validation Gate

Functional

[ ] Auth Service: 5,000 concurrent logins with <100ms P99
[ ] Transactions: <100ms P99, 99.9% success rate
[ ] Sagas: compensation completes within 30 seconds
[ ] ACL: >=99.5% data consistency validation
[ ] MVP: zero data loss during PoC deployment

Resilience

[ ] System operational during single microservice failure
[ ] Circuit breakers activate within 5 seconds
[ ] Auth failure triggers graceful degradation (cached tokens)
[ ] Legacy failure isolated by ACL (no cascade)
[ ] Message queue backlog clears within 5 minutes post-recovery

Security

[ ] All service-to-service encrypted via mTLS
[ ] OAuth2 token validation at gateway + service boundary
[ ] No plaintext passwords in logs/event streams
[ ] Audit trail captures auth events
[ ] Secrets via vault (no hardcoded credentials)

Operational

[ ] All services deployable via GitOps
[ ] Incident response time <15 minutes
[ ] Observability operational (logs, metrics, traces)
[ ] Runbook for top 10 failure scenarios
[ ] On-call escalation playbook established

Rejection Criteria (Blockers)

Any unaccounted data loss
Service-to-service communication without mTLS
Audit trail gaps
Orphaned sagas unable to compensate
ACL translation errors affecting >0.1% of transactions
Authentication latency >500ms

Output Format Protocol

`{VARIANTE}`	`{FORMATO}`	Archivo generado	Contenido
técnica	markdown	`04_Arquitectura_TO-BE_Deep.md`	C4 L2, trade-offs, 6+ ADRs, 5+ nightmares, MVP, migration, RACI
técnica	html	`04_Arquitectura_TO-BE_Deep.html`	Mismo contenido, HTML con estilos brand
técnica	dual	Ambos archivos	Markdown + HTML
ejecutiva	markdown	`04_Arquitectura_TO-BE_Ejecutiva.md`	C4 diagram, trade-off matrix, migration phases (~40%)
ejecutiva	html	`04_Arquitectura_TO-BE_Ejecutiva.html`	Mismo contenido ejecutivo, HTML con estilos brand
ejecutiva	dual	Ambos archivos	Markdown + HTML

Cross-References

metodologia-software-architecture: Internal structure of each service in the TO-BE architecture
metodologia-solutions-architecture: Integration patterns, channel architecture, observability stack
metodologia-infrastructure-architecture: K8s clusters, network topology, HA/DR, cost optimization
metodologia-devsecops-architecture: Pipeline design, security gates, GitOps deployment
metodologia-enterprise-architecture: Capability mapping, technology radar alignment, governance

Output Artifact

Primary: 04_Arquitectura_TO-BE_Deep.html — C4 L2 diagram, trade-off matrices, nightmare scenarios with mitigations, MVP component design, ADRs, migration roadmap, acceptance criteria, RACI matrix.

Autor: Javier Montaño | Última actualización: 12 de marzo de 2026

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.