Csv Data Visualizer
by ailabs-393
This skill should be used when working with CSV files to create interactive data visualizations, generate statistical plots, analyze data distributions, create dashboards, or perform automatic data profiling. It provides comprehensive tools for exploratory data analysis using Plotly for interactive visualizations.
Skill Details
Repository Files
7 files in this skill directory
name: csv-data-visualizer description: This skill should be used when working with CSV files to create interactive data visualizations, generate statistical plots, analyze data distributions, create dashboards, or perform automatic data profiling. It provides comprehensive tools for exploratory data analysis using Plotly for interactive visualizations.
CSV Data Visualizer
Overview
This skill enables comprehensive data visualization and analysis for CSV files. It provides three main capabilities: (1) creating individual interactive visualizations using Plotly, (2) automatic data profiling with statistical summaries, and (3) generating multi-plot dashboards. The skill is optimized for exploratory data analysis, statistical reporting, and creating presentation-ready visualizations.
When to Use This Skill
Invoke this skill when users request:
- "Visualize this CSV data"
- "Create a histogram/scatter plot/box plot from this data"
- "Show me the distribution of [column]"
- "Generate a dashboard for this dataset"
- "Profile this CSV file" or "Analyze this data"
- "Create a correlation heatmap"
- "Show trends over time"
- "Compare [variable] across [categories]"
Core Capabilities
1. Individual Visualizations
Create specific chart types for detailed analysis using the visualize_csv.py script.
Available Chart Types:
Statistical Plots:
# Histogram - distribution of numeric data
python3 scripts/visualize_csv.py data.csv --histogram column_name --bins 30
# Box plot - show quartiles and outliers
python3 scripts/visualize_csv.py data.csv --boxplot column_name
# Box plot grouped by category
python3 scripts/visualize_csv.py data.csv --boxplot salary --group-by department
# Violin plot - distribution with probability density
python3 scripts/visualize_csv.py data.csv --violin column_name --group-by category
Relationship Analysis:
# Scatter plot with automatic trend line
python3 scripts/visualize_csv.py data.csv --scatter height weight
# Scatter plot with color and size encoding
python3 scripts/visualize_csv.py data.csv --scatter x y --color category --size value
# Correlation heatmap for all numeric columns
python3 scripts/visualize_csv.py data.csv --correlation
Time Series:
# Line chart for single variable
python3 scripts/visualize_csv.py data.csv --line date sales
# Multiple variables on same chart
python3 scripts/visualize_csv.py data.csv --line date "sales,revenue,profit"
Categorical Data:
# Bar chart (counts categories automatically)
python3 scripts/visualize_csv.py data.csv --bar category
# Pie chart for composition
python3 scripts/visualize_csv.py data.csv --pie region
Output Formats: Specify output file with desired format extension:
# Interactive HTML (default)
python3 scripts/visualize_csv.py data.csv --histogram age -o output.html
# Static image formats
python3 scripts/visualize_csv.py data.csv --scatter x y -o plot.png
python3 scripts/visualize_csv.py data.csv --correlation -o heatmap.pdf
python3 scripts/visualize_csv.py data.csv --bar category -o chart.svg
2. Automatic Data Profiling
Generate comprehensive data quality and statistical reports using the data_profile.py script.
Text Report (default):
python3 scripts/data_profile.py data.csv
HTML Report:
python3 scripts/data_profile.py data.csv -f html -o report.html
JSON Report:
python3 scripts/data_profile.py data.csv -f json -o profile.json
What the Profiler Provides:
- File information (size, dimensions)
- Dataset overview (shape, memory usage, duplicates)
- Column-by-column analysis (types, missing data, unique values)
- Missing data patterns and completeness
- Statistical summary for numeric columns (mean, std, quartiles, skewness, kurtosis)
- Categorical column analysis (frequency counts, most/least common values)
- Data quality checks (high missing data, duplicate rows, constant columns, high cardinality)
When to Use Profiling: Always recommend running data profiling BEFORE creating visualizations when:
- User is unfamiliar with the dataset
- Data quality is unknown
- Need to identify appropriate visualization types
- Exploring a new dataset for the first time
3. Multi-Plot Dashboards
Create comprehensive dashboards with multiple visualizations using the create_dashboard.py script.
Automatic Dashboard: Analyzes data types and automatically creates appropriate visualizations:
python3 scripts/create_dashboard.py data.csv
Custom output location:
python3 scripts/create_dashboard.py data.csv -o my_dashboard.html
Control number of plots:
python3 scripts/create_dashboard.py data.csv --max-plots 9
Custom Dashboard from Config: Create a JSON configuration file specifying exact plots:
python3 scripts/create_dashboard.py data.csv --config config.json
Dashboard Config Format:
{
"title": "Sales Analysis Dashboard",
"plots": [
{"type": "histogram", "column": "revenue"},
{"type": "box", "column": "revenue", "group_by": "region"},
{"type": "scatter", "column": "advertising", "group_by": "revenue"},
{"type": "bar", "column": "product_category"},
{"type": "correlation"}
]
}
Dashboard Plot Types:
histogram: Distribution of numeric columnbox: Box plot, optionally grouped by categoryscatter: Relationship between two numeric columnsbar: Count of categorical valuescorrelation: Heatmap of numeric correlations
Workflow Decision Tree
Use this decision tree to determine the appropriate approach:
User provides CSV file
│
├─ "Profile this data" / "Analyze this data" / Unfamiliar dataset
│ └─> Run data_profile.py first
│ Then offer visualization options based on findings
│
├─ "Create dashboard" / "Overview of the data" / Multiple visualizations needed
│ ├─ User knows exact plots wanted
│ │ └─> Create JSON config → run create_dashboard.py with config
│ └─ User wants automatic dashboard
│ └─> Run create_dashboard.py (auto mode)
│
└─ Specific visualization requested ("histogram", "scatter plot", etc.)
└─> Use visualize_csv.py with appropriate flag
Best Practices
Starting Analysis
- Always profile first for unfamiliar datasets:
python3 scripts/data_profile.py data.csv - Review the profiling output to understand:
- Column data types and ranges
- Missing data patterns
- Data quality issues
- Statistical distributions
Choosing Visualizations
Consult references/visualization_guide.md for detailed guidance. Quick reference:
- Distribution: Histogram, box plot, violin plot
- Relationship: Scatter plot, correlation heatmap
- Time series: Line chart
- Categories: Bar chart (preferred) or pie chart (use sparingly)
- Comparison: Box plot grouped by category
Creating Dashboards
- Automatic dashboard: Good for initial exploration
- Custom dashboard: Better for presentations or specific analysis goals
- Limit plots: Keep to 6-9 plots maximum for readability
- Logical grouping: Group related visualizations together
Output Considerations
- HTML: Best for interactive exploration (zoom, pan, hover tooltips)
- PNG/PDF: Best for reports and presentations
- SVG: Best for publications requiring vector graphics
Dependencies
The scripts require these Python packages:
pip install pandas plotly numpy
For static image export (PNG, PDF, SVG), also install:
pip install kaleido
Example Workflows
Exploratory Data Analysis
# 1. Profile the data
python3 scripts/data_profile.py sales_data.csv -f html -o profile.html
# 2. Create automatic dashboard
python3 scripts/create_dashboard.py sales_data.csv -o dashboard.html
# 3. Dive deeper with specific plots
python3 scripts/visualize_csv.py sales_data.csv --scatter price sales --color region
python3 scripts/visualize_csv.py sales_data.csv --boxplot revenue --group-by product
Report Generation
# Create specific visualizations for report
python3 scripts/visualize_csv.py data.csv --histogram age -o fig1_distribution.png
python3 scripts/visualize_csv.py data.csv --scatter income age -o fig2_correlation.png
python3 scripts/visualize_csv.py data.csv --bar category -o fig3_categories.png
# Generate data summary
python3 scripts/data_profile.py data.csv -f html -o data_summary.html
Interactive Dashboard
# Create custom dashboard for presentation
# 1. First, create config.json with desired plots
# 2. Generate dashboard
python3 scripts/create_dashboard.py data.csv --config config.json -o presentation_dashboard.html
Troubleshooting
"Column not found" errors:
- Run data profiling to see exact column names
- CSV columns are case-sensitive
- Check for leading/trailing spaces in column names
Empty or incorrect visualizations:
- Verify data types (numeric vs categorical)
- Check for missing data in plotted columns
- Ensure sufficient non-null values exist
Script execution errors:
- Verify dependencies are installed:
pip list | grep plotly - Check Python version: Python 3.6+ required
- For image export issues, install kaleido:
pip install kaleido
Resources
scripts/
visualize_csv.py: Main visualization script with all chart typesdata_profile.py: Automatic data profiling and quality analysiscreate_dashboard.py: Multi-plot dashboard generator
references/
visualization_guide.md: Comprehensive guide for choosing appropriate chart types, best practices, and common patterns
Related Skills
Xlsx
Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Analyzing Financial Statements
This skill calculates key financial ratios and metrics from financial statement data for investment analysis
Data Storytelling
Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.
Kpi Dashboard Design
Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.
Dbt Transformation Patterns
Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.
Sql Optimization Patterns
Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.
Anndata
This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.
Xlsx
Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.
