sofka-bi-architecture

by @javiermontano-sofka in Tools

# Install this skill:

npx skills add javiermontano-sofka/sdf --skill "sofka-bi-architecture"

Install specific skill from multi-skill repository

# Description

# SKILL.md

name: sofka-bi-architecture
description: >
BI solution design — semantic layers, dashboard patterns, self-service analytics, KPI frameworks.
Use when the user asks to "design BI architecture", "build a KPI framework", "set up self-service analytics",
"design dashboard hierarchy", "create a semantic layer", or mentions metric trees, drill-down patterns, or reporting strategy.
model: opus
context: fork
allowed-tools:
- Read
- Write
- Edit
- Glob
- Grep
- Bash

BI Architecture: Business Intelligence Solution Design & Analytics Strategy

BI architecture defines how organizations consume data for decision-making — KPI frameworks, semantic layers, dashboard hierarchies, self-service analytics, and governance. This skill produces BI documentation that enables teams to deliver trustworthy, scalable, and accessible analytics.

Principio Rector

Un dashboard sin semántica es un gráfico bonito sin significado. La arquitectura BI diseña cómo las organizaciones consumen datos para decidir — desde el semantic layer que define qué significan las métricas hasta los dashboards que las presentan a la audiencia correcta.

Filosofía BI

Semantic layer = contrato de significado. Cada métrica tiene una definición única, una fórmula, y un owner. Sin semantic layer, cada equipo calcula revenue diferente.
Self-service con guardrails. Los usuarios deben poder explorar datos sin pedir tickets, pero dentro de un framework de gobernanza que asegure calidad y seguridad.
Menos dashboards, más decisiones. La meta no es 100 dashboards — es que los decisores actúen con confianza. KPI trees > dashboard sprawl.

Inputs

The user provides a system or project name as $ARGUMENTS. Parse $1 as the system/project name used throughout all output artifacts.

Parameters:
- {MODO}: piloto-auto (default) | desatendido | supervisado | paso-a-paso
- piloto-auto: Auto para KPI framework y semantic layer, HITL para dashboard hierarchy y self-service strategy.
- desatendido: Cero interrupciones. Arquitectura BI documentada automáticamente. Supuestos documentados.
- supervisado: Autónomo con checkpoint en semantic layer design y platform selection.
- paso-a-paso: Confirma cada KPI tree, metric definition, dashboard tier, y governance policy.
- {FORMATO}: markdown (default) | html | dual
- {VARIANTE}: ejecutiva (~40% — S1 KPI framework + S3 reporting + S6 governance) | técnica (full 6 sections, default)

Before generating architecture, detect the analytics context:

!find . -name "*.sql" -o -name "*.yml" -o -name "*.yaml" -o -name "*.lookml" -o -name "*.json" | head -30

Use detected tools (Looker, Tableau, Power BI, Metabase, dbt, etc.) to tailor recommendations.

If reference materials exist, load them:

Read ${CLAUDE_SKILL_DIR}/references/bi-patterns.md

When to Use

Designing KPI frameworks and metric hierarchies for an organization
Building semantic layers that define business metrics consistently
Architecting dashboard hierarchies from executive to operational levels
Enabling self-service analytics with governed data access
Establishing visualization standards and chart selection guidelines
Evaluating BI platform choices and migration strategies

When NOT to Use

Data pipeline design and orchestration → use data-engineering skill
dbt transformations and data modeling → use analytics-engineering skill
ML model serving and prediction systems → use data-science-architecture skill
Data quality rules and validation → use data-quality skill

Delivery Structure: 6 Sections

S1: KPI & Metric Framework

Defines the organization's metric hierarchy — what to measure, how metrics relate, ownership.

Includes:
- Metric tree decomposition (north star metric → supporting metrics → input metrics)
- Leading vs lagging indicator classification
- Metric ownership mapping (which team owns which metric)
- Metric definitions (formula, data source, grain, dimensions, filters)
- Target setting framework (benchmarks, thresholds, alerting rules)
- Metric lifecycle (proposed → active → deprecated → retired)

Key decisions:
- Top-down vs bottom-up metric design: executive alignment vs team autonomy
- Single source of truth: one definition per metric, enforced in semantic layer
- Metric granularity: too many metrics = noise; too few = blind spots — aim for 5-7 KPIs per department, 15-25 supporting metrics
- Alert thresholds: static vs dynamic baselines for anomaly detection

S2: Semantic Layer Design

Establishes the translation layer between raw data and business concepts.

Headless BI / metrics layer tool comparison:

Criterion	dbt Metrics / MetricFlow	Cube	Looker (LookML)	AtScale
Approach	Transformation-layer metrics	Headless BI API server	BI-native semantic model	Virtual OLAP cube
Best for	dbt-centric teams	Multi-tool API consumers	Looker-committed orgs	Enterprise, Excel users
Multi-tool serving	Via Semantic Layer API	Native REST/GraphQL/SQL API	Looker only (or via API)	XMLA, SQL, REST
Caching	Warehouse-native	Pre-aggregation engine	PDTs + in-memory	Aggregate tables + cache
AI/NL query readiness	Growing (dbt + LLM integrations)	Strong (structured API)	Looker + Gemini	AtScale + Copilot
Choose when	Already using dbt, want metrics-as-code	Need tool-agnostic API layer	Looker is primary BI tool	Large enterprise, heavy Excel

Includes:
- Metric definitions in semantic layer (measures, dimensions, filters)
- Dimension modeling (conformed dimensions, SCDs, hierarchies)
- Business glossary integration (term definitions, synonyms, ownership)
- Calculation logic (derived metrics, ratios, period-over-period, running totals)
- Access control at semantic layer (row-level security, column masking)

Key decisions:
- Tool-native vs tool-agnostic semantic layer: tool-agnostic preferred when multiple BI tools consume the same metrics
- Push-down vs materialize: compute on query for flexibility, precomputed aggregations for performance
- Metric drift prevention: define metrics once in the semantic layer, never in individual dashboards — duplicated definitions are the #1 cause of "my numbers don't match"
- Caching strategy: query cache for repeated queries, aggregate tables for heavy groupings, extract schedules for slow sources

S3: Reporting Architecture

Designs the dashboard hierarchy — from executive summaries to operational detail.

Dashboard performance budget (non-negotiable targets):
- Initial render: < 2 seconds (above 3s, users abandon)
- Query execution: < 5 seconds for interactive queries; < 30s for complex aggregations
- Data freshness SLA: define per dashboard tier — L1 executive: daily; L2 department: hourly; L3 operational: near-real-time; L4 ad-hoc: on-demand
- Embed load time: < 2 seconds for customer-facing embedded analytics

Includes:
- Dashboard hierarchy (L1: executive, L2: department, L3: operational, L4: ad-hoc)
- Drill-down patterns (summary to detail, cross-dashboard navigation)
- Refresh strategies (real-time, near-real-time, scheduled, on-demand)
- Report distribution (push: email/Slack alerts; pull: self-service portal)
- Embedded analytics patterns (JavaScript/React SDKs for in-app embedding; multi-tenant row-level security; white-label theming; SDK preferred over iframe for SSO and interactivity)
- Reverse ETL (sync churn scores to CRM, push segments to ad platforms, deliver metrics to Slack — define sync frequency, conflict resolution, idempotency per destination)

Key decisions:
- Push vs pull reporting: proactive alerts vs on-demand exploration
- Real-time dashboard viability: most BI tools are not built for sub-second refresh — use materialized views or streaming aggregates; reserve real-time for operations centers
- Dashboard count governance: audit quarterly, archive dashboards with zero views in 90 days
- Parameterized vs static: dynamic filters vs fixed executive views

S4: Self-Service Analytics

Enables business users to explore data independently with guardrails.

Self-service guardrails — governed vs ungoverned zones:

Zone	Access	Data	Compute	Governance
Certified / Governed	All users	Curated marts, semantic layer	Shared warehouse	Metric definitions locked, row-level security enforced
Exploratory / Sandbox	Analysts, power users	Staging + marts, limited raw	Dedicated warehouse with quotas	Labeled "exploratory", not for executive reporting
Raw / Ungoverned	Data engineers only	All sources	Isolated compute	Full audit logging, no self-service access

Includes:
- Data access tiers (curated datasets, governed sandbox, raw access for power users)
- User segmentation (consumer, explorer, analyst, data engineer)
- Training and enablement plan (onboarding, documentation, office hours)
- SQL access patterns (query editor, saved queries, shared snippets)
- Feedback loop (user requests, prioritization, data team response SLA: acknowledge in 1 business day, resolve in 5)

Key decisions:
- Openness vs governance: broad access increases adoption but risks misinterpretation
- Certified vs exploratory content: clearly labeled "trusted" dashboards vs experiments
- Compute guardrails: query timeouts (5 min default), row limits (1M rows for exports), cost quotas per user/group
- Data literacy investment: training budget correlates directly with self-service success

S5: Visualization Standards

Defines chart selection, accessibility, color palettes, and interactivity guidelines.

Includes:
- Chart selection guide (bar for comparison, line for trend, scatter for correlation, table for exact values — match chart type to analytical question)
- Color palette standards (categorical max 8 colors, sequential for continuous, diverging for positive/negative; WCAG 2.1 AA contrast ratio >= 4.5:1)
- Accessibility requirements (color-blind safe palettes, screen reader labels, alt text for all charts)
- Interactivity patterns (tooltips, filters, drill-down, cross-filtering)
- Layout templates (grid systems, information hierarchy, white space)
- Anti-patterns: pie charts for 10+ categories, 3D charts, truncated axes, dual axes without clear labeling, too many KPIs on one screen
- Information hierarchy: lead with big-picture KPIs, then detailed breakdowns; pair every metric with a target, prior period, or benchmark; always display active date range and units

Key decisions:
- Standardization vs flexibility: enforced templates vs creative freedom — enforce for L1-L2 dashboards, allow flexibility for L3-L4
- Dark mode support: design token approach for theme switching
- Print/export: PDF-optimized layouts for board presentations

S6: BI Platform & Governance

BI platform decision matrix:

Criterion	Looker	Power BI	Tableau	Superset	Metabase
Best for	Governed metrics-as-code	Microsoft ecosystem	Visual exploration	OSS, engineer-friendly	Simple self-service
Semantic layer	LookML (native, strong)	Composite model / DAX	Limited (extract-based)	SQL Lab (basic)	Questions (basic)
Embedded analytics	Good (Looker Embed SDK)	Power BI Embedded (strong)	Tableau Embedded	Superb (API-first)	Good (iframe + SDK)
Cost model	Per-user ($$$)	Per-user or capacity ($$)	Per-user ($$$)	Free (infra cost only)	Free tier + Pro ($)
Choose when	Strong governance needed	Already on Microsoft stack	Heavy visual exploration	Budget-constrained, technical team	Small team, quick start

Cost control patterns:
- Query quotas: set per-user/group daily query limits; alert at 80% consumption
- Materialized views: precompute heavy aggregations that serve 80%+ of dashboard queries
- Extract schedules: move expensive queries to off-peak compute windows
- License optimization: audit monthly active users vs licensed seats; downgrade inactive users after 60 days

Includes:
- Platform evaluation criteria (features, cost, scalability, ecosystem, team skills)
- Access control design (RBAC, row-level security, data classification enforcement)
- Usage monitoring (dashboard views, query patterns, adoption metrics)
- Change management (dashboard versioning, promotion workflows, testing)
- Disaster recovery (backup, export, platform migration contingency)

Key decisions:
- Single platform vs best-of-breed: simplicity vs specialization — single platform covers 80% of needs for most organizations
- License model: per-user vs per-capacity vs consumption-based — model total cost at 2x current user count
- Migration strategy: phased with parallel running preferred; big bang only for small teams

Trade-off Matrix

Decision	Enables	Constrains	Threshold
Centralized Semantic Layer	Single source of truth	Bottleneck on central team	50+ report consumers
Self-Service Analytics	Business agility, reduced data team load	Metric misinterpretation risk	Mature data culture with governance
Real-Time Dashboards	Immediate visibility	Infrastructure cost, complexity	Operations centers, trading floors
Embedded Analytics	In-context decisions, product differentiation	Maintenance complexity	SaaS products, customer-facing analytics
Push Reporting (Alerts)	Proactive awareness	Alert fatigue if poorly tuned	KPI monitoring, anomaly detection
Governed Sandbox	Safe exploration, innovation	Compute cost, data duplication	Teams transitioning to self-service

Assumptions

Data warehouse or lakehouse exists with curated data models
Business stakeholders have defined strategic objectives and KPIs
Data team exists to maintain semantic layer and governance
Analytics users have basic data literacy (or training is planned)
Budget covers BI platform licensing and infrastructure

Limits

Focuses on analytics consumption, not data pipeline engineering
Does not design transformation logic (dbt models, SQL)
Does not address data quality upstream of consumption
Dashboard design requires domain expertise; architecture enables but does not replace business context

Edge Cases

Startup with No Existing BI:
Start simple: one dashboard tool, shared spreadsheet for metric definitions, basic access control. Semantic layer becomes valuable at 10+ reports. Avoid over-engineering.

Multi-Tool Environment:
Organizations running Tableau, Power BI, and Looker simultaneously. Enforce single semantic layer regardless of consumption tool, or accept duplication with clear ownership boundaries.

Executive Resistance to Self-Service:
Design tiered approach: L1 curated for executives, L3-L4 self-service for analysts. Earn trust incrementally with certified content labels.

High-Frequency Data (Sub-Second):
Real-time dashboards require streaming architecture, materialized views, and careful cost management. Most BI tools cap at 1-minute refresh. Use Grafana or custom dashboards for true real-time.

Regulated Reporting (SOX, HIPAA):
Requires audit trails, version control for reports, access logging, and certified reports with change management workflows. No ad-hoc modifications to compliance dashboards.

Validation Gate

Before finalizing delivery, verify:

[ ] KPI hierarchy traces from north star to operational metrics
[ ] Every metric has a single, documented definition with owner
[ ] Semantic layer enforces consistent calculations across tools
[ ] Dashboard hierarchy covers executive, department, and operational levels
[ ] Performance budget met (render < 2s, query < 5s, freshness SLA defined)
[ ] Self-service access has guardrails (query timeouts, row limits, cost quotas)
[ ] Visualization standards address WCAG 2.1 AA accessibility
[ ] Platform selection documented with cost model at 2x current scale
[ ] Usage monitoring planned to track adoption and optimize licenses
[ ] Dashboard governance prevents sprawl (quarterly audit, archive policy)

Output Format Protocol

Format	Default	Description
`markdown`	✅	Rich Markdown + Mermaid diagrams. Token-efficient.
`html`	On demand	Branded HTML (Design System). Visual impact.
`dual`	On demand	Both formats.

Default output is Markdown with embedded Mermaid diagrams. HTML generation requires explicit {FORMATO}=html parameter.

Output Artifact

Primary: A-01_BI_Architecture.html — KPI framework, semantic layer design, dashboard hierarchy, self-service strategy, visualization standards, platform governance.

Secondary: Metric catalog (.md), chart selection guide, dashboard template library, access control matrix.

Autor: Javier Montaño | Última actualización: 12 de marzo de 2026

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

⚡ Amp 🚀 Antigravity 🤖 Claude Code 🦀 Clawdbot 📝 Codex ▶️ Cursor 🤖 Droid 💎 Gemini CLI 🐙 GitHub Copilot 🪿 Goose 📊 Kilo Code 🔧 Kiro CLI 💻 OpenCode 🦘 Roo Code 🌲 Trae 🏄 Windsurf

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.