Ai Data Analyst

by NicktheQuickFTW

data

Performs comprehensive data analysis, visualization, and statistical modeling using Python. Use when analyzing datasets, performing statistical tests, creating visualizations, doing exploratory data analysis, or generating publication-quality analytical reports.

Skill Details

Repository Files

1 file in this skill directory


name: ai-data-analyst description: Performs comprehensive data analysis, visualization, and statistical modeling using Python. Use when analyzing datasets, performing statistical tests, creating visualizations, doing exploratory data analysis, or generating publication-quality analytical reports.

<when_to_use> Use this skill when you need to:

  • Analyze datasets to understand patterns, trends, or relationships
  • Perform statistical tests or build predictive models
  • Create data visualizations (charts, graphs, dashboards) to communicate findings
  • Do exploratory data analysis (EDA) to understand data structure and quality
  • Clean, transform, or merge datasets for analysis
  • Generate reproducible analysis with documented methodology and code </when_to_use>

<key_capabilities> Unlike point-solution data analysis tools:

  • Full Python ecosystem: Access to pandas, numpy, scikit-learn, statsmodels, matplotlib, seaborn, plotly, and more
  • Runs locally: Your data stays on your machine; no uploads to third-party services
  • Reproducible: All analysis is code-based and version controllable
  • Customizable: Extend with any Python library or custom analysis logic
  • Publication-quality output: Generate professional charts and reports
  • Statistical rigor: Access to comprehensive statistical and ML libraries </key_capabilities>

<required_inputs>

  • Data sources: CSV files, Excel files, JSON, Parquet, or database connections
  • Analysis goals: Questions to answer or hypotheses to test
  • Variables of interest: Specific columns, metrics, or dimensions to focus on
  • Output preferences: Chart types, report format, statistical tests needed
  • Context: Business domain, data dictionary, or known data quality issues </required_inputs>

<out_of_scope>

  • Real-time streaming data analysis (use appropriate streaming tools)
  • Extremely large datasets requiring distributed computing (use Spark/Dask instead)
  • Production ML model deployment (use ML ops tools and infrastructure)
  • Live dashboarding (use BI tools like Tableau/Looker for operational dashboards) </out_of_scope>

2. Data Cleaning and Transformation

  • Handle missing values (impute, drop, or flag)
  • Address outliers if needed (cap, transform, or document)
  • Create derived variables if needed
  • Normalize or scale variables for modeling
  • Split data if doing train/test analysis

3. Analysis Execution

  • Choose appropriate analytical methods
  • Check statistical assumptions
  • Execute analysis with proper parameters
  • Calculate confidence intervals and effect sizes
  • Perform sensitivity analyses if appropriate

4. Visualization

  • Create exploratory visualizations
  • Generate publication-quality final charts
  • Ensure all charts have clear labels and titles
  • Use appropriate color schemes and styling
  • Save in high-resolution formats

5. Reporting

  • Write clear summary of methods used
  • Present key findings with supporting evidence
  • Explain practical significance of results
  • Document limitations and assumptions
  • Provide actionable recommendations

6. Reproducibility

  • Test that script runs from clean environment
  • Document all dependencies
  • Add comments explaining non-obvious code
  • Include instructions for running analysis

Code Structure

  • Write self-contained scripts that can be re-run by others
  • Use clear variable names and add comments for complex logic
  • Separate concerns: data loading, cleaning, analysis, visualization
  • Save intermediate results to files when analysis is multi-stage

Data Handling

  • Never modify source data files – work on copies or in-memory dataframes
  • Document data transformations clearly in code comments
  • Handle missing values explicitly and document approach
  • Validate data quality before analysis (check for nulls, outliers, duplicates)

Visualization Best Practices

  • Choose appropriate chart types for the data and question
  • Use clear labels, titles, and legends on all charts
  • Apply appropriate color schemes (colorblind-friendly when possible)
  • Include sample sizes and confidence intervals where relevant
  • Save visualizations in high-resolution formats (PNG 300 DPI, SVG for vector graphics)

Statistical Analysis

  • State assumptions for statistical tests clearly
  • Check assumptions before applying tests (normality, homoscedasticity, etc.)
  • Report effect sizes not just p-values
  • Use appropriate corrections for multiple comparisons
  • Explain practical significance in addition to statistical significance

<required_artifacts>

  • Analysis script(s): Well-documented Python code performing the analysis
  • Visualizations: Charts saved as high-quality image files (PNG/SVG)
  • Analysis report: Markdown or text document summarizing:
    • Research question and methodology
    • Data description and quality assessment
    • Key findings with supporting statistics
    • Visualizations with interpretations
    • Limitations and caveats
    • Recommendations or next steps
  • Requirements file: requirements.txt with all dependencies
  • Sample data (if appropriate and non-sensitive): Small sample for reproducibility </required_artifacts>
# Create virtual environment
python3 -m venv venv
source venv/bin/activate  # or `venv\Scripts\activate` on Windows

# Install dependencies
pip install -r requirements.txt

# Run analysis script
python analysis.py

# Check outputs generated
ls -lh outputs/

<success_criteria> The skill is complete when:

  • Analysis script runs without errors from clean environment
  • All required visualizations are generated in high quality
  • Report clearly explains methodology, findings, and limitations
  • Results are interpretable and actionable
  • Code is well-documented and reproducible </success_criteria>

<safety_and_escalation>

  • Data privacy: Never analyze or share data containing PII without proper authorization
  • Statistical validity: If sample sizes are too small for reliable inference, call this out explicitly
  • Causal claims: Avoid implying causation from correlational analysis; be explicit about limitations
  • Model limitations: Document when models may not generalize or when predictions should not be trusted
  • Data quality: If data quality issues could materially affect conclusions, flag this prominently </safety_and_escalation>

Related Skills

Xlsx

Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas

data

Clickhouse Io

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

datacli

Clickhouse Io

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

datacli

Analyzing Financial Statements

This skill calculates key financial ratios and metrics from financial statement data for investment analysis

data

Data Storytelling

Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.

data

Kpi Dashboard Design

Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.

designdata

Dbt Transformation Patterns

Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.

testingdocumenttool

Sql Optimization Patterns

Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.

designdata

Anndata

This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.

arttooldata

Xlsx

Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.

tooldata

Skill Information

Category:Data
Last Updated:1/27/2026