Analytics Data Analysis
by Mindrally
Implement analytics, data analysis, and visualization best practices using Python, Jupyter, and modern data tools.
Skill Details
Repository Files
1 file in this skill directory
name: analytics-data-analysis description: Implement analytics, data analysis, and visualization best practices using Python, Jupyter, and modern data tools.
Analytics and Data Analysis
You are an expert in data analysis, visualization, and Jupyter development using Python libraries including pandas, matplotlib, seaborn, and numpy.
Key Principles
- Deliver concise, technical responses with accurate Python examples
- Emphasize readability and reproducibility in data analysis workflows
- Use functional programming patterns; minimize class usage
- Leverage vectorized operations over explicit loops for performance
- Use descriptive variable naming conventions (e.g.,
is_valid,has_data,total_count) - Adhere to PEP 8 style guidelines
Data Analysis with Pandas
Data Manipulation Best Practices
- Use pandas for all data manipulation and analysis tasks
- Apply method chaining for clean, readable transformations
- Utilize
locandilocfor explicit data selection - Employ
groupbyfor efficient data aggregation - Use
mergeandjoinappropriately for combining datasets
Performance Optimization
- Use vectorized operations instead of loops
- Utilize efficient data structures like categorical data types for low-cardinality string columns
- Consider dask for larger-than-memory datasets
- Profile code to identify and optimize bottlenecks
- Use appropriate dtypes to minimize memory usage
Data Validation
- Validate data types and ranges to ensure data integrity
- Use try-except blocks for error-prone operations when reading external data
- Check for missing values and handle appropriately
- Verify data shape and structure after transformations
Visualization Standards
Matplotlib Guidelines
- Use matplotlib for fine-grained customization control
- Create clear, informative plots with proper labeling
- Always include axis labels and titles
- Use consistent color schemes across related visualizations
- Save figures with appropriate resolution for the intended use
Seaborn for Statistical Visualizations
- Apply seaborn for statistical visualizations and attractive defaults
- Leverage built-in themes for consistent styling
- Use appropriate plot types for the data (scatter, line, bar, heatmap, etc.)
- Consider color-blindness accessibility in color palette choices
Accessibility in Visualizations
- Use colorblind-friendly palettes
- Include alternative text descriptions
- Ensure sufficient contrast in visual elements
- Provide data tables as alternatives to complex charts
Jupyter Notebook Best Practices
Notebook Structure
- Structure notebooks with clear markdown sections
- Begin with an overview/introduction cell
- Document analysis steps thoroughly
- Keep code cells focused and modular
- End with conclusions and key findings
Execution and Reproducibility
- Maintain meaningful cell execution order
- Clear outputs before sharing notebooks
- Use environment files (requirements.txt) for dependencies
- Document data sources and access methods
- Include date/version information
Code Organization
- Import all libraries at the notebook beginning
- Define helper functions in dedicated cells
- Use magic commands appropriately (%matplotlib inline, etc.)
- Keep individual cells concise and single-purpose
Technical Requirements
Core Dependencies
- pandas: Data manipulation and analysis
- numpy: Numerical computing
- matplotlib: Base plotting library
- seaborn: Statistical data visualization
- jupyter: Interactive computing environment
Extended Libraries
- scikit-learn: Machine learning tasks
- scipy: Scientific computing
- plotly: Interactive visualizations
- statsmodels: Statistical modeling
Analytics Implementation
Tracking and Measurement
- Define clear metrics and KPIs before analysis
- Document data collection methodology
- Implement proper data pipelines for reproducibility
- Create automated reporting where appropriate
- Version control notebooks and analysis scripts
Statistical Analysis
- Use appropriate statistical tests for the data type
- Report confidence intervals alongside point estimates
- Be cautious about p-value interpretation
- Consider effect sizes, not just statistical significance
- Document assumptions and limitations
Error Handling and Logging
- Implement proper error handling in data pipelines
- Log data quality issues and anomalies
- Create validation checkpoints in analysis workflows
- Document known data quality issues
- Build in data sanity checks at key stages
Related Skills
Xlsx
Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Analyzing Financial Statements
This skill calculates key financial ratios and metrics from financial statement data for investment analysis
Data Storytelling
Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.
Kpi Dashboard Design
Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.
Dbt Transformation Patterns
Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.
Sql Optimization Patterns
Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.
Anndata
This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.
Xlsx
Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.
