Discover and use data skills to extend Claude's capabilities
665 Data Skills Available
Generates architecture, database, and system diagrams using Mermaid syntax. Creates visual representations of system architecture, database schemas, component relationships, and data flows.
Convert natural language queries to SQL. Use for database queries, data analysis, and reporting.
Explores data in a Bauplan lakehouse safely using the Bauplan Python SDK. Use to inspect namespaces, tables, schemas, samples, and profiling queries; and to export larger result sets to files. Read-only exploration only; no writes or pipeline runs.
Use when interpreting Culture Index surveys, CI profiles, behavioral assessments, or personality data. Supports individual interpretation, team composition (gas/brake/glue), burnout detection, profile comparison, hiring profiles, manager coaching, interview transcript analysis for trait prediction, candidate debrief, onboarding planning, and conflict mediation. Handles PDF vision or JSON input.
Attributed C-Sets as algebraic databases. Category-theoretic data structures generalizing graphs and dataframes with Gay.jl color integration.
Best practices for Matplotlib data visualization, plotting, and creating publication-quality figures in Python
Best practices for Pandas data manipulation, analysis, and DataFrame operations in Python
Data analysis best practices with pandas, numpy, matplotlib, seaborn, and Jupyter notebooks.
Expert patterns for Segment Customer Data Platform including Analytics.js, server-side tracking, tracking plans with Protocols, identity resolution, destinations configuration, and data governance best practices. Use when "segment, analytics.js, customer data platform, cdp, tracking plan, event tracking, identify track page, data routing, segment, cdp, analytics, tracking, data-pipeline, customer-data" mentioned.
Expert in measuring what matters in communities. Covers health metrics, engagement analytics, sentiment analysis, cohort tracking, and reporting. Knows that good data drives good decisions, and bad metrics drive bad behavior. Use when "community metrics, community analytics, measure community, community health, engagement metrics, community reporting, " mentioned.
Generate and compare ydata-profiling EDA reports with sampling, consistent random seeds, and HTML outputs; often follows duckdb-parquet-lab-workflow when data is queried from Parquet.
Use DuckDB to query Parquet files, inspect metadata, join tables, and convert results to pandas for analysis; commonly precedes ydata-eda-profiling for EDA on extracted tables.
Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. Use when the assistant needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas
Use this skill for processing and analyzing large tabular datasets (billions of rows) that exceed available RAM. Vaex excels at out-of-core DataFrame operations, lazy evaluation, fast aggregations, efficient visualization of big data, and machine learning on large datasets. Apply when users need to work with large CSV/HDF5/Arrow/Parquet files, perform fast statistics on massive datasets, create visualizations of big data, or build ML pipelines that do not fit in memory.
UMAP dimensionality reduction. Fast nonlinear manifold learning for 2D/3D visualization, clustering preprocessing (HDBSCAN), supervised/parametric UMAP, for high-dimensional data.
Analyze policy impacts for congressional districts and representatives' constituents. Use when the user mentions a specific district (NY-17, CA-52), a representative's name, or asks about geographic policy impacts at district level. Provides HuggingFace district datasets.
Perform comprehensive exploratory data analysis on scientific data files across 200+ file formats. This skill should be used when analyzing any scientific data file to understand its structure, content, quality, and characteristics. Automatically detects file type and generates detailed markdown reports with format-specific analysis, quality metrics, and downstream analysis recommendations. Covers chemistry, bioinformatics, microscopy, spectroscopy, proteomics, metabolomics, and general scientif
Analyze messy and unstructured Excel files to identify data quality issues, detect format inconsistencies, find missing values, and generate comprehensive analysis reports. Use when Claude needs to work with Excel files (.xlsx, .xls) for data quality assessment, structure analysis, or when users request data auditing, cleaning recommendations, or statistical summaries of spreadsheet data.
This skill should be used when running Phase 4 of the /ds workflow to review methodology, data quality, and statistical validity. Provides structured review checklists, confidence scoring, and issue identification for data analysis validation.
Workflow for multi-step financial research requiring multiple data sources. Use for company comparisons, due diligence, comprehensive analysis, or complex financial questions.
Generates architecture, database, and system diagrams using Mermaid syntax. Creates visual representations of system architecture, database schemas, component relationships, and data flows.
Plot timestamped logs as graphs. Use when user wants to visualize log data, plot numeric values over time, count events, track time deltas between events, compare multiple log files, or get statistics from logs.
获取A股数据(baostock)并缓存到本地CSV文件,避免MCP返回大量数据占用上下文。触发场景:(1)获取超过100条的K线数据 (2)需要多次查询同一股票数据 (3)需要用grep/awk分析数据 (4)用户提到"保存数据"或"缓存数据
Aggregate and centralize performance metrics from applications, systems, databases, caches, and services. Use when consolidating monitoring data from multiple sources. Trigger with phrases like "aggregate metrics", "centralize monitoring", or "collect performance data".
Automate data cleaning, transformation, and validation for ML tasks.
Performs ChIP-specific biological validation. It calculates metrics unique to protein-binding assays, such as Cross-correlation (NSC/RSC) and FRiP. Use this when you have filtered the BAM file and called peaks for ChIP-seq data. Do NOT use this skill for ATAC-seq data or general alignment statistics.
Analyzes and optimizes SQL/NoSQL queries for performance. Use when reviewing query performance, optimizing slow queries, analyzing EXPLAIN output, suggesting indexes, identifying N+1 problems, recommending query rewrites, or improving database access patterns. Supports PostgreSQL, MySQL, SQLite, MongoDB, Redis, DynamoDB, and Elasticsearch.
Weighted pandas DataFrames for survey microdata analysis - inequality, poverty, and distributional calculations
Create evidence synthesis matrices for systematic reviews. Use when: (1) Organizing extracted data, (2) Comparing study characteristics, (3) Identifying patterns across studies, (4) Preparing synthesis for manuscripts.
Guide selection and interpretation of statistical hypothesis tests. Use when: (1) Choosing appropriate test for research data, (2) Checking assumptions before analysis, (3) Interpreting test results correctly, (4) Reporting statistical findings, (5) Troubleshooting assumption violations.
Create publication-quality data visualizations. Use when: (1) Presenting results, (2) Exploratory data analysis, (3) Manuscript preparation, (4) Grant proposals, (5) Presentations.
Deep EDN template analyzer for Logseq database graphs. Analyzes template structure, counts classes/properties, finds orphaned items, checks quality, and compares variants. Use when analyzing template files, finding issues, or comparing different template versions.
Generates data cleaning pipelines for pandas/polars with handling for missing values, duplicates, outliers, type conversions, and data validation. Use when user asks to "clean data", "generate data pipeline", "handle missing values", or "remove duplicates from dataset".
Analyzes and optimizes database queries for PostgreSQL, MySQL, MongoDB with EXPLAIN plans, index suggestions, and N+1 query detection. Use when user asks to "optimize query", "analyze EXPLAIN plan", "fix slow queries", or "suggest database indexes".