Csv Data Summary
by NeverSight
Analyzes CSV files, generates comprehensive summary statistics, identifies data patterns, and creates visualizations using Python and pandas. Automatically adapts analysis based on data type (sales, customer, financial, survey, operational).
Skill Details
name: csv-data-summary description: Analyzes CSV files, generates comprehensive summary statistics, identifies data patterns, and creates visualizations using Python and pandas. Automatically adapts analysis based on data type (sales, customer, financial, survey, operational).
CSV Data Summary Skill
This skill helps you analyze CSV files and generate comprehensive summaries with statistical insights and visualizations. It automatically detects the type of data you're working with and adapts the analysis accordingly.
Use Cases
- Quick data exploration and understanding
- Identifying data quality issues (missing values, outliers)
- Discovering patterns and correlations in datasets
- Creating visual summaries for reports and presentations
- Time-series analysis when date columns are present
- Categorical data distribution analysis
Prerequisites
You'll need Python with the following libraries:
pip install pandas>=2.0.0 matplotlib>=3.7.0 seaborn>=0.12.0
When to Use This Skill
Use this skill whenever you need to:
- Understand the structure and content of a CSV file
- Get summary statistics for numeric columns
- Identify missing data and data quality issues
- Visualize distributions and correlations
- Analyze time-series trends
- Get a comprehensive overview of categorical variables
How It Works
The skill automatically:
- Loads and inspects the CSV file
- Identifies data structure - column types, date columns, numeric columns, categories
- Adapts analysis based on data type:
- Sales/E-commerce data: Time-series trends, revenue analysis, product performance
- Customer data: Distribution analysis, segmentation, geographic patterns
- Financial data: Trend analysis, statistical summaries, correlations
- Operational data: Time-series, performance metrics, distributions
- Survey data: Frequency analysis, cross-tabulations, distributions
- Generates visualizations relevant to the specific dataset:
- Time-series plots (if date/timestamp columns exist)
- Correlation heatmaps (if multiple numeric columns exist)
- Category distributions (if categorical columns exist)
- Histograms for numeric distributions
- Provides comprehensive output including:
- Data overview (rows, columns, types)
- Key statistics and metrics
- Missing data analysis
- Multiple relevant visualizations
- Actionable insights
Python Implementation
Basic Usage
from analyze import summarize_csv
# Analyze any CSV file
summary = summarize_csv('your_data.csv')
print(summary)
The script will automatically generate:
- A comprehensive text summary
- Multiple visualization files (PNG format)
Example Output
============================================================
š DATA OVERVIEW
============================================================
Rows: 5,000 | Columns: 8
š DATA TYPES:
⢠order_date: object
⢠total_revenue: float64
⢠customer_segment: object
...
š DATA QUALITY:
ā No missing values - dataset is complete!
š NUMERICAL ANALYSIS:
[Summary statistics for all numeric columns]
š CORRELATIONS:
[Correlation matrix showing relationships]
š
TIME SERIES ANALYSIS:
Date range: 2024-01-05 to 2024-04-11
Span: 97 days
š VISUALIZATIONS CREATED:
ā correlation_heatmap.png
ā time_series_analysis.png
ā distributions.png
ā categorical_distributions.png
Command Line Usage
You can run the analysis from the command line:
# Analyze a specific CSV file
python scripts/analyze.py path/to/your/data.csv
# Use the sample data
python scripts/analyze.py resources/sample.csv
Understanding the Output
Data Overview
- Shows the dimensions of your dataset (rows Ć columns)
- Lists all column names
- Shows data type for each column
Data Quality
- Reports missing values by column
- Shows percentage of missing data
- Helps identify data cleaning needs
Numerical Analysis
- Provides descriptive statistics (mean, std, min, max, quartiles)
- Shows correlations between numeric columns
- Creates correlation heatmap visualization
Categorical Analysis
- Shows frequency distribution for each categorical variable
- Displays top 10 values per category
- Creates bar charts for categorical distributions
Time Series Analysis
- Automatically detected when date/time columns are present
- Shows date range and span
- Creates trend plots for numeric metrics over time
- Calculates daily/periodic aggregations
Visualizations Generated
The skill automatically creates relevant visualizations:
-
Correlation Heatmap (
correlation_heatmap.png)- Shows relationships between numeric variables
- Color-coded for easy interpretation
- Only generated when 2+ numeric columns exist
-
Time Series Analysis (
time_series_analysis.png)- Trend lines for numeric metrics over time
- Only generated when date/time columns exist
- Shows up to 3 key metrics
-
Distributions (
distributions.png)- Histograms for numeric columns
- Shows up to 4 numeric variables
- Helps identify outliers and data shape
-
Categorical Distributions (
categorical_distributions.png)- Bar charts for categorical variables
- Shows top 10 values per category
- Up to 4 categorical variables
Tips and Best Practices
- Clean column names: Use lowercase and underscores for better readability
- Date formats: Ensure date columns contain 'date' or 'time' in the name
- Numeric data: Ensure numeric columns are properly typed (not strings)
- Large files: The skill handles large files efficiently with pandas
- Missing data: Review the data quality section carefully before analysis
Troubleshooting
Issue: Date columns not detected
- Ensure column names contain 'date' or 'time'
- Check date format is recognizable (YYYY-MM-DD, MM/DD/YYYY, etc.)
Issue: Numeric columns treated as text
- Check for non-numeric characters in the data
- Clean data or use pandas type conversion
Issue: Too many visualizations
- The script automatically limits visualizations to the most relevant ones
- Focus on the first few metrics of each type
Issue: Import errors
- Ensure all dependencies are installed:
pip install -r requirements.txt - Check Python version (3.8+ recommended)
Advanced Usage
Customizing the Analysis
You can modify analyze.py to:
- Add custom metrics specific to your domain
- Change visualization styles and colors
- Adjust the number of categories shown
- Add domain-specific insights
Integration with Other Tools
The script outputs:
- Plain text summary (easy to parse)
- PNG images (ready for reports)
- Can be extended to output JSON, HTML, or PDF reports
Additional Resources
Differences from Excel Sheet Reference Skill
This skill focuses on:
- Data analysis and visualization (not Excel formula creation)
- CSV file format (not Excel workbooks)
- Statistical insights (not cross-sheet references)
- Python pandas (not openpyxl)
Use the excel-sheet-reference skill when you need to:
- Create Excel files with multiple sheets
- Use cross-sheet formulas (VLOOKUP, COUNTIFS, etc.)
- Maintain data in Excel format with formulas
Related Skills
Xlsx
Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Analyzing Financial Statements
This skill calculates key financial ratios and metrics from financial statement data for investment analysis
Data Storytelling
Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.
Kpi Dashboard Design
Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.
Dbt Transformation Patterns
Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.
Sql Optimization Patterns
Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.
Anndata
This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.
Xlsx
Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.
