Analyzing Data
by jesseotremblay
Performs comprehensive data analysis including statistical analysis, visualization, pattern detection, and report generation. Use when the user asks to analyze data, find patterns, generate insights, create visualizations, or mentions data analysis, statistics, or data science tasks.
Skill Details
Repository Files
4 files in this skill directory
name: analyzing-data description: Performs comprehensive data analysis including statistical analysis, visualization, pattern detection, and report generation. Use when the user asks to analyze data, find patterns, generate insights, create visualizations, or mentions data analysis, statistics, or data science tasks.
Data Analyzer
This skill performs comprehensive data analysis with statistical methods, visualizations, and automated reporting.
When to Use This Skill
Invoke this skill when the user:
- Asks to analyze a dataset
- Wants statistical insights
- Needs data visualization
- Requests pattern detection
- Mentions data analysis, statistics, or data science
- Wants to generate analysis reports
Analysis Workflow
Step 1: Data Understanding
Initial Assessment:
- Identify data format (CSV, JSON, Excel, etc.)
- Determine data size and structure
- Understand business context
- Clarify analysis objectives
Use the analysis script:
python scripts/analyze.py data.csv --explore
Step 2: Data Quality Check
Validation:
- Data loads successfully
- Required columns present
- Data types appropriate
- Missing values identified
- Outliers detected
- Duplicates checked
Quality Report:
python scripts/analyze.py data.csv --quality-report
Step 3: Statistical Analysis
Perform analysis based on data type and objectives.
For Descriptive Statistics:
- Mean, median, mode
- Standard deviation, variance
- Quartiles and ranges
- Distribution shape
For Correlation Analysis:
- Pearson correlation
- Spearman rank correlation
- Covariance matrix
For Advanced Analysis: See REFERENCE.md for:
- Hypothesis testing procedures
- Regression analysis methods
- Time series analysis
- Clustering algorithms
Step 4: Visualization
Create appropriate visualizations:
Univariate Analysis:
- Histograms for distributions
- Box plots for outliers
- Bar charts for categories
Bivariate Analysis:
- Scatter plots for relationships
- Line charts for trends
- Heatmaps for correlations
Multivariate Analysis:
- Pair plots
- 3D visualizations
- Dimensionality reduction plots
Generate visualizations:
python scripts/analyze.py data.csv --visualize --output-dir ./charts
Step 5: Report Generation
Create analysis report using templates from FORMS.md:
cat FORMS.md # View available report templates
python scripts/analyze.py data.csv --report executive-summary
Analysis Types
Pattern 1: Exploratory Data Analysis (EDA)
Objective: Understand data characteristics and relationships
Steps:
- Load and preview data
- Generate summary statistics
- Check distributions
- Identify correlations
- Detect outliers
- Document insights
Quick EDA:
python scripts/analyze.py data.csv --eda
Pattern 2: Comparative Analysis
Objective: Compare groups or time periods
Steps:
- Define groups/periods
- Calculate group statistics
- Test for significant differences
- Visualize comparisons
- Interpret results
See REFERENCE.md section "Statistical Testing" for test selection.
Pattern 3: Trend Analysis
Objective: Identify patterns over time
Steps:
- Prepare time series data
- Check for seasonality
- Calculate moving averages
- Fit trend lines
- Forecast future values
See REFERENCE.md section "Time Series Methods" for details.
Pattern 4: Predictive Modeling
Objective: Build models to predict outcomes
Steps:
- Feature engineering
- Train/test split
- Model selection
- Training and validation
- Performance evaluation
See REFERENCE.md section "Machine Learning" for model details.
Data Type Handling
Numerical Data:
- Summary statistics
- Distribution analysis
- Correlation analysis
- Regression modeling
Categorical Data:
- Frequency tables
- Cross-tabulations
- Chi-square tests
- Category encoding
Time Series Data:
- Trend decomposition
- Seasonality detection
- Autocorrelation
- Forecasting
Text Data:
- Frequency analysis
- Sentiment analysis
- Topic modeling
- See REFERENCE.md section "Text Analytics"
Common Issues and Solutions
Issue: Missing Values
- Strategy 1: Remove rows (if <5% missing)
- Strategy 2: Impute with mean/median/mode
- Strategy 3: Use advanced imputation (KNN, MICE)
- See REFERENCE.md section "Missing Data Handling"
Issue: Outliers
- Detection: IQR method, Z-score, isolation forest
- Action: Remove, cap, or transform
- Context: Business rules may define valid outliers
Issue: Imbalanced Data
- Resampling techniques
- Class weights
- Synthetic data generation (SMOTE)
Issue: High Dimensionality
- Feature selection
- PCA or t-SNE
- Domain knowledge filtering
Output Formats
The skill can generate reports in multiple formats:
Executive Summary:
- Key findings (3-5 bullets)
- Critical metrics
- Recommendations
- See FORMS.md template "Executive Summary"
Technical Report:
- Methodology
- Detailed results
- Statistical tests
- Visualizations
- See FORMS.md template "Technical Report"
Dashboard Format:
- Interactive visualizations
- Key metrics at a glance
- Drill-down capability
Generate specific format:
python scripts/analyze.py data.csv --format executive
python scripts/analyze.py data.csv --format technical
python scripts/analyze.py data.csv --format dashboard
Validation Checklist
Before finalizing analysis:
- Data quality verified
- Appropriate methods selected
- Assumptions validated
- Results interpreted correctly
- Visualizations clear and labeled
- Report matches requested format
- Recommendations actionable
Analysis Scope
Quick Analysis (5-10 min):
- Basic statistics
- Simple visualizations
- Key findings only
Standard Analysis (20-40 min):
- Comprehensive statistics
- Multiple visualizations
- Correlation analysis
- Formatted report
Deep Analysis (1-2 hours):
- Advanced modeling
- Hypothesis testing
- Multiple methodologies
- Executive + technical reports
Ask user for preferred scope if unclear.
Example Analysis
Input: sales_data.csv with columns: date, product, region, quantity, revenue
Output:
Key Findings
- Revenue increased 23% year-over-year
- Product A accounts for 45% of total revenue
- Western region shows strongest growth (31%)
- Seasonal peak in Q4 (38% of annual sales)
Statistical Summary
- Mean daily revenue: $12,450
- Median daily revenue: $11,200
- Standard deviation: $3,890
- 95% of days: $5,000 - $20,000
Visualizations Generated
- Revenue trend line (2023-2024)
- Product revenue pie chart
- Regional comparison bar chart
- Seasonal pattern heatmap
Recommendations
- Increase inventory for Product A in Q4
- Investigate Western region success factors
- Plan marketing campaigns for Q2-Q3 (slower periods)
Advanced Features
For complex scenarios, this skill integrates with:
REFERENCE.md sections:
- Statistical Methods Library
- Machine Learning Algorithms
- Time Series Techniques
- Text Analytics Methods
FORMS.md templates:
- Executive Summary Template
- Technical Report Template
- Dashboard Layout Template
Scripts:
scripts/analyze.py- Main analysis enginescripts/visualize.py- Visualization generatorscripts/report.py- Report formatter
Getting Started
Simple analysis:
python scripts/analyze.py your_data.csv
With options:
python scripts/analyze.py your_data.csv \
--explore \
--visualize \
--report executive \
--output-dir ./results
Help:
python scripts/analyze.py --help
For detailed methodology and advanced techniques, see REFERENCE.md. For report templates and output examples, see FORMS.md.
Related Skills
Xlsx
Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Analyzing Financial Statements
This skill calculates key financial ratios and metrics from financial statement data for investment analysis
Data Storytelling
Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.
Kpi Dashboard Design
Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.
Dbt Transformation Patterns
Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.
Sql Optimization Patterns
Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.
Anndata
This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.
Xlsx
Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.
