Sql Query Optimizer

by a5c-ai

data

Analyzes and optimizes SQL queries across different data warehouse platforms (Snowflake, BigQuery, Redshift, Databricks) with platform-specific recommendations.

Skill Details

Repository Files

2 files in this skill directory


name: sql-query-optimizer description: Analyzes and optimizes SQL queries across different data warehouse platforms (Snowflake, BigQuery, Redshift, Databricks) with platform-specific recommendations. version: 1.0.0 category: SQL Optimization skill-id: SK-DEA-004 allowed-tools: Read, Grep, Glob, Bash, WebFetch

SQL Query Optimizer

Analyzes and optimizes SQL queries across different data warehouse platforms with platform-specific recommendations.

Overview

This skill examines SQL queries to identify performance bottlenecks, suggest optimizations, and provide platform-specific recommendations for Snowflake, BigQuery, Redshift, and Databricks. It analyzes query execution plans, recommends indexes/clustering keys, and identifies anti-patterns.

Capabilities

  • Query execution plan analysis - Parse and analyze EXPLAIN outputs
  • Index recommendations - Suggest clustering keys, sort keys, partition keys
  • Join optimization - Identify inefficient join patterns and suggest improvements
  • Subquery elimination - Convert correlated subqueries to CTEs or joins
  • CTE optimization - Materialize vs reference optimization
  • Window function optimization - Frame and partition optimization
  • Predicate pushdown validation - Verify filter pushdown effectiveness
  • Clustering key recommendations - Platform-specific clustering strategies
  • Materialized view suggestions - Identify candidates for materialized views
  • Platform-specific optimizations - Snowflake, BigQuery, Redshift, Databricks

Input Schema

{
  "query": {
    "type": "string",
    "description": "The SQL query to analyze",
    "required": true
  },
  "platform": {
    "type": "string",
    "enum": ["snowflake", "bigquery", "redshift", "databricks", "postgres"],
    "required": true,
    "description": "Target data warehouse platform"
  },
  "tableStatistics": {
    "type": "object",
    "description": "Table statistics including row counts, column cardinality",
    "properties": {
      "tables": {
        "type": "array",
        "items": {
          "name": "string",
          "rowCount": "number",
          "sizeGB": "number",
          "columns": "array"
        }
      }
    }
  },
  "executionPlan": {
    "type": "object",
    "description": "Query execution plan (EXPLAIN output)"
  },
  "queryHistory": {
    "type": "object",
    "description": "Historical query performance metrics"
  },
  "optimizationGoals": {
    "type": "array",
    "items": {
      "type": "string",
      "enum": ["latency", "cost", "throughput", "scan_reduction"]
    },
    "default": ["latency", "cost"]
  }
}

Output Schema

{
  "optimizedQuery": {
    "type": "string",
    "description": "The optimized SQL query"
  },
  "improvements": {
    "type": "array",
    "items": {
      "type": {
        "type": "string",
        "enum": ["join", "predicate", "aggregation", "cte", "window", "scan", "index"]
      },
      "description": "string",
      "impact": "high|medium|low",
      "lineNumber": "number",
      "originalCode": "string",
      "optimizedCode": "string"
    }
  },
  "indexRecommendations": {
    "type": "array",
    "items": {
      "table": "string",
      "type": "clustering|sort|partition|index",
      "columns": "array",
      "rationale": "string",
      "ddl": "string"
    }
  },
  "estimatedImprovement": {
    "scanReduction": {
      "type": "number",
      "description": "Percentage reduction in data scanned"
    },
    "timeReduction": {
      "type": "number",
      "description": "Percentage reduction in execution time"
    },
    "costReduction": {
      "type": "number",
      "description": "Percentage reduction in query cost"
    }
  },
  "antiPatterns": {
    "type": "array",
    "items": {
      "pattern": "string",
      "severity": "high|medium|low",
      "location": "string",
      "suggestion": "string"
    }
  },
  "platformSpecificNotes": {
    "type": "array",
    "items": "string"
  }
}

Usage Examples

Basic Query Optimization

{
  "query": "SELECT * FROM orders o JOIN customers c ON o.customer_id = c.id WHERE o.created_at > '2024-01-01'",
  "platform": "snowflake"
}

With Execution Plan Analysis

{
  "query": "SELECT customer_id, SUM(amount) FROM orders GROUP BY customer_id",
  "platform": "bigquery",
  "executionPlan": {
    "stages": [...],
    "totalBytesProcessed": 1073741824
  },
  "optimizationGoals": ["cost", "scan_reduction"]
}

With Table Statistics

{
  "query": "SELECT ... complex query ...",
  "platform": "redshift",
  "tableStatistics": {
    "tables": [
      {
        "name": "orders",
        "rowCount": 10000000,
        "sizeGB": 50,
        "columns": [
          {"name": "order_id", "cardinality": 10000000},
          {"name": "customer_id", "cardinality": 500000}
        ]
      }
    ]
  }
}

Platform-Specific Optimizations

Snowflake

Optimization Description
Clustering keys Recommend micro-partition clustering
Result cache Identify queries benefiting from caching
Query acceleration Suggest QUERY_ACCELERATION_MAX_SCALE_FACTOR
Warehouse sizing Right-size warehouse recommendations

BigQuery

Optimization Description
Partitioning DATE/TIMESTAMP partitioning recommendations
Clustering Up to 4 clustering columns
BI Engine Identify BI Engine-eligible queries
Slots Estimate slot usage optimization

Redshift

Optimization Description
Sort keys COMPOUND vs INTERLEAVED recommendations
Distribution KEY, EVEN, ALL distribution strategies
Compression Column encoding recommendations
Vacuum VACUUM and ANALYZE recommendations

Databricks

Optimization Description
Z-ordering Multi-column Z-order recommendations
Delta cache Caching strategy recommendations
Photon Photon-eligible query patterns
Adaptive execution AQE configuration suggestions

Common Anti-Patterns Detected

Query Structure

Anti-Pattern Impact Fix
SELECT * High Specify columns explicitly
Correlated subqueries High Convert to JOIN or CTE
DISTINCT on large datasets Medium Use GROUP BY or window functions
Non-SARGable predicates High Rewrite for index usage

Join Issues

Anti-Pattern Impact Fix
Cartesian products Critical Add join conditions
Implicit joins Medium Use explicit JOIN syntax
Wrong join order High Reorder by selectivity
Missing indexes on join keys High Add clustering/sort keys

Aggregation Issues

Anti-Pattern Impact Fix
GROUP BY ordinal Low Use column names
Aggregating before filter High Filter first, then aggregate
Over-grouping Medium Reduce GROUP BY columns

Integration Points

MCP Server Integration

  • Snowflake MCP - Real-time execution plan analysis
  • BigQuery MCP - Cost estimation and slot analysis
  • Redshift MCP - Query execution and statistics

Related Skills

  • Data Quality Profiler (SK-DEA-005) - Table statistics gathering
  • dbt Project Analyzer (SK-DEA-003) - Model query optimization

Applicable Processes

  • Query Optimization (query-optimization.js)
  • Data Warehouse Setup (data-warehouse-setup.js)
  • BI Dashboard Development (bi-dashboard.js)
  • OBT Creation (obt-creation.js)

References

Version History

  • 1.0.0 - Initial release with multi-platform support

Related Skills

Xlsx

Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas

data

Clickhouse Io

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

datacli

Clickhouse Io

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

datacli

Analyzing Financial Statements

This skill calculates key financial ratios and metrics from financial statement data for investment analysis

data

Data Storytelling

Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.

data

Kpi Dashboard Design

Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.

designdata

Dbt Transformation Patterns

Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.

testingdocumenttool

Sql Optimization Patterns

Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.

designdata

Anndata

This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.

arttooldata

Xlsx

Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.

tooldata

Skill Information

Category:Data
Version:1.0.0
Allowed Tools:Read, Grep, Glob, Bash, WebFetch
Last Updated:1/24/2026