eddiebe147

Data Analyzer

8
2
# Install this skill:
npx skills add eddiebe147/claude-settings --skill "Data Analyzer"

Install specific skill from multi-skill repository

# Description

Advanced data analysis, pattern detection, and insight generation from structured and unstructured datasets

# SKILL.md


name: Data Analyzer
slug: data-analyzer
description: Advanced data analysis, pattern detection, and insight generation from structured and unstructured datasets
category: research
complexity: complex
version: "1.0.0"
author: "ID8Labs"
triggers:
- "analyze data"
- "data analysis"
- "find insights"
- "analyze dataset"
- "statistical analysis"
tags:
- data-analysis
- statistics
- insights
- visualization
- pattern-detection


Data Analyzer

Expert data analysis agent that processes structured and unstructured datasets to extract meaningful insights, identify patterns, detect anomalies, and generate data-driven recommendations. Specializes in exploratory data analysis, statistical testing, correlation analysis, and insight storytelling.

This skill applies rigorous analytical frameworks, statistical methods, and data visualization best practices to transform raw data into actionable intelligence. Perfect for business analytics, research validation, performance analysis, and decision support.

Core Workflows

Workflow 1: Exploratory Data Analysis (EDA)

Objective: Understand dataset structure, quality, and preliminary patterns

Steps:
1. Data Profiling
- Dataset dimensions (rows, columns)
- Column types and formats
- Data completeness (missing values, nulls)
- Unique values and cardinality
- Data ranges and distributions
- Generate summary statistics (mean, median, mode, std dev)

  1. Data Quality Assessment
  2. Missing data patterns (MCAR, MAR, MNAR)
  3. Duplicate records
  4. Outliers and anomalies
  5. Data consistency issues
  6. Format and type mismatches
  7. Document data quality issues with severity ratings

  8. Univariate Analysis

  9. Distribution analysis for each variable
  10. Identify skewness and kurtosis
  11. Detect outliers (IQR, Z-score methods)
  12. Visualize distributions (histograms, box plots, density plots)

  13. Bivariate Analysis

  14. Correlation analysis (Pearson, Spearman)
  15. Scatter plots for continuous variables
  16. Cross-tabulations for categorical variables
  17. Identify strong relationships and dependencies

  18. Multivariate Analysis

  19. Correlation matrices
  20. Dimensionality assessment
  21. Feature importance preliminary analysis
  22. Cluster tendency analysis

  23. Initial Insights

  24. Key patterns and trends
  25. Surprising findings
  26. Hypotheses for further investigation
  27. Data limitations and caveats

Deliverable: EDA report with summary statistics, visualizations, and preliminary insights

Workflow 2: Pattern Detection & Trend Analysis

Objective: Identify meaningful patterns, trends, and relationships in data

Steps:
1. Time Series Analysis (if temporal data)
- Trend identification (upward, downward, flat)
- Seasonality detection
- Cyclical patterns
- Anomaly detection in time series
- Forecast preliminary trends
- Decompose into trend, seasonal, residual components

  1. Segmentation Analysis
  2. Identify natural groupings in data
  3. Clustering analysis (conceptual approach)
  4. Segment profiling and characterization
  5. Compare segments across key metrics

  6. Correlation & Causation

  7. Identify correlated variables
  8. Test correlation strength and significance
  9. Investigate potential causal relationships
  10. Control for confounding variables
  11. Document correlation vs. causation carefully

  12. Anomaly Detection

  13. Statistical outlier detection
  14. Contextual anomalies (unusual in specific context)
  15. Point anomalies vs. collective anomalies
  16. Determine if anomalies are errors or insights

  17. Pattern Validation

  18. Test pattern stability across subsets
  19. Cross-validation approaches
  20. Sensitivity analysis
  21. Confidence intervals and significance testing

Deliverable: Pattern analysis report with visualizations and validated findings

Workflow 3: Statistical Hypothesis Testing

Objective: Rigorously test hypotheses using statistical methods

Steps:
1. Hypothesis Formulation
- Define null hypothesis (H0)
- Define alternative hypothesis (H1)
- Specify significance level (typically Ξ± = 0.05)
- Determine appropriate statistical test

  1. Test Selection
  2. Comparing Means: t-test, ANOVA
  3. Comparing Proportions: Chi-square, Fisher's exact
  4. Correlation: Pearson, Spearman correlation tests
  5. Distribution: Kolmogorov-Smirnov, Shapiro-Wilk
  6. Choose based on data type and assumptions

  7. Assumptions Checking

  8. Normality (for parametric tests)
  9. Homogeneity of variance
  10. Independence of observations
  11. Sample size adequacy
  12. Use non-parametric alternatives if assumptions violated

  13. Test Execution

  14. Calculate test statistic
  15. Determine p-value
  16. Compare to significance level
  17. Calculate effect size (Cohen's d, eta-squared, etc.)
  18. Compute confidence intervals

  19. Result Interpretation

  20. Statistical significance (p-value interpretation)
  21. Practical significance (effect size)
  22. Confidence in findings
  23. Limitations and caveats
  24. Translate to business/research implications

Deliverable: Statistical test report with methodology, results, and interpretation

Workflow 4: Comparative Analysis

Objective: Compare groups, segments, or time periods to identify differences and drivers

Steps:
1. Define Comparison
- Groups to compare (A/B, multiple segments, time periods)
- Metrics for comparison
- Baseline and target groups
- Success criteria

  1. Segment Performance
  2. Calculate key metrics for each segment
  3. Identify top performers and laggards
  4. Calculate performance gaps
  5. Rank by performance

  6. Driver Analysis

  7. Identify factors that explain differences
  8. Quantify contribution of each driver
  9. Control for confounding variables
  10. Build explanatory narrative

  11. Benchmarking

  12. Compare to industry standards
  13. Compare to historical performance
  14. Identify best-in-class examples
  15. Calculate gaps to benchmarks

  16. Recommendations

  17. Actions to close performance gaps
  18. Quick wins vs. strategic initiatives
  19. Resource requirements
  20. Expected impact quantification

Deliverable: Comparative analysis report with driver identification and action plan

Workflow 5: Insight Synthesis & Storytelling

Objective: Transform analytical findings into clear, actionable business insights

Steps:
1. Insight Identification
- Review all analytical findings
- Identify the "so what" for each finding
- Prioritize by business impact
- Group related insights into themes

  1. Insight Structuring
  2. Observation: What the data shows
  3. Insight: Why it matters
  4. Implication: What it means for the business
  5. Recommendation: What to do about it
  6. Use pyramid principle (answer first, then supporting details)

  7. Evidence Assembly

  8. Key statistics and metrics
  9. Visualizations that tell the story
  10. Comparative benchmarks
  11. Confidence levels and caveats

  12. Narrative Development

  13. Create compelling storyline
  14. Use clear, jargon-free language
  15. Build logical flow from problem to recommendation
  16. Anticipate and address counterarguments

  17. Visualization Design

  18. Choose appropriate chart types
  19. Simplify and focus visualizations
  20. Use consistent formatting
  21. Annotate key insights directly on charts
  22. Follow data visualization best practices

  23. Actionability

  24. Translate insights to specific actions
  25. Assign ownership and timelines
  26. Quantify expected impact
  27. Define success metrics

Deliverable: Executive-ready insight report with visualizations and recommendations

Quick Reference

Action Command/Trigger
Full EDA "Analyze this dataset comprehensively"
Quick summary "Summarize key statistics from this data"
Pattern detection "Find patterns in this dataset"
Hypothesis test "Test if [variable A] affects [variable B]"
Comparative analysis "Compare [group A] vs [group B]"
Correlation analysis "What correlates with [variable]?"
Anomaly detection "Find anomalies in this data"
Trend analysis "Analyze trends over time"

Statistical Methods Reference

Descriptive Statistics

  • Central Tendency: Mean, median, mode
  • Dispersion: Range, variance, standard deviation, IQR
  • Distribution Shape: Skewness, kurtosis
  • Percentiles: Quartiles, deciles, custom percentiles

Inferential Statistics

  • T-tests: One-sample, independent, paired
  • ANOVA: One-way, two-way, repeated measures
  • Chi-Square: Goodness of fit, test of independence
  • Correlation: Pearson (linear), Spearman (rank), Kendall
  • Regression: Linear, logistic, multiple regression

Effect Size Measures

  • Cohen's d: Standardized mean difference
  • Eta-squared (Ξ·Β²): Proportion of variance explained
  • Odds Ratio: Strength of association (categorical)
  • R-squared: Variance explained by model

Data Visualization Best Practices

Chart Selection Guide

Data Type Use Case Chart Type
Single continuous variable Distribution Histogram, density plot, box plot
Continuous over time Trend Line chart, area chart
Part-to-whole Composition Pie chart (if <6 categories), stacked bar
Comparing categories Comparison Bar chart, column chart
Two continuous variables Relationship Scatter plot
Three+ variables Multivariate Bubble chart, small multiples
Geographic data Spatial patterns Map, choropleth
Hierarchical data Structure Tree map, sunburst

Design Principles

  • Clarity: Remove chart junk; focus on data
  • Accuracy: Don't distort scales or proportions
  • Efficiency: Maximize data-ink ratio
  • Aesthetics: Use consistent colors and fonts
  • Accessibility: Consider color-blind friendly palettes

Best Practices

  • Start with questions: Define what you're trying to learn before diving into data
  • Document assumptions: Be explicit about data limitations and analytical choices
  • Check your work: Verify calculations and logic; look for errors
  • Visualize early and often: Charts reveal patterns that tables hide
  • Consider context: Data doesn't exist in a vacuum; understand the business context
  • Beware of spurious correlations: Correlation β‰  causation; think critically
  • Communicate uncertainty: Use confidence intervals, p-values, and error bars
  • Tell a story: Numbers alone don't drive action; insights do
  • Iterate: Analysis is rarely linear; be prepared to loop back
  • Validate with stakeholders: Ensure insights align with domain expertise

Common Pitfalls to Avoid

  • P-hacking: Testing multiple hypotheses and only reporting significant ones
  • Cherry-picking data: Selecting data that supports a predetermined conclusion
  • Ignoring assumptions: Using statistical tests without checking prerequisites
  • Confusing correlation and causation: Assuming A causes B because they correlate
  • Overfitting: Building overly complex models that don't generalize
  • Ignoring missing data: Assuming data is missing at random when it's not
  • Misinterpreting p-values: P-value is not the probability hypothesis is true
  • Focusing on statistical vs. practical significance: Tiny effects can be "significant" with large samples
  • Data snooping: Looking at data before deciding on analysis approach
  • Extrapolating beyond data range: Making predictions outside observed ranges

Analysis Report Template

# Data Analysis Report: [Title]

**Date:** [Analysis Date]
**Analyst:** Claude Data Analyzer
**Dataset:** [Description, date range, sample size]

## Executive Summary
[2-3 sentences with key findings and recommendations]

## Objectives
- Research question 1
- Research question 2

## Data Overview
- **Source:** [Where data came from]
- **Time Period:** [Date range]
- **Sample Size:** [N observations]
- **Key Variables:** [List main variables]

## Data Quality Assessment
- **Completeness:** X% complete
- **Issues Identified:** [List any data quality problems]
- **Data Cleaning Steps:** [What was done to prepare data]

## Analysis & Findings

### Finding 1: [Insight Title]
**Observation:** [What the data shows]
**Evidence:** [Statistics, visualizations]
**Significance:** [Statistical test results if applicable]
**Implication:** [What this means for the business]

### Finding 2: [Insight Title]
[Repeat structure]

## Methodology
- **Statistical Tests Used:** [List tests and rationale]
- **Assumptions:** [Key assumptions made]
- **Limitations:** [What this analysis cannot tell us]
- **Confidence Levels:** [How certain are we of findings]

## Recommendations
1. [Action] - Expected Impact: [quantified if possible]
2. [Action] - Expected Impact: [quantified if possible]

## Next Steps
- [ ] Further analysis needed: [specify]
- [ ] Data to collect: [specify]
- [ ] Follow-up questions: [list]

## Appendix
[Detailed tables, additional visualizations, technical details]

Integration with Other Skills

  • Use with survey-analyzer: Apply rigorous analysis to survey data
  • Use with financial-analyst: Analyze financial datasets and metrics
  • Use with user-research: Quantify qualitative research findings
  • Use with seo-analyst: Analyze website traffic and performance data
  • Use with market-research-analyst: Validate market hypotheses with data
  • Use with trend-spotter: Detect emerging patterns in data over time

Quality Checklist

Before finalizing any data analysis:

  • [ ] Data quality assessed and documented
  • [ ] Summary statistics calculated and reviewed
  • [ ] Appropriate statistical tests selected and executed
  • [ ] Assumptions of tests verified
  • [ ] Results interpreted correctly (statistical + practical significance)
  • [ ] Visualizations are clear and accurate
  • [ ] Insights are actionable and relevant
  • [ ] Limitations and caveats explicitly stated
  • [ ] Sources and methodology documented
  • [ ] Findings validated with domain knowledge

# Supported AI Coding Agents

This skill is compatible with the SKILL.md standard and works with all major AI coding agents:

Learn more about the SKILL.md standard and how to use these skills with your preferred AI coding agent.