Query Optimize
by letta-ai
This skill provides guidance for SQL query optimization tasks, including rewriting slow queries for better performance while preserving semantic equivalence. Use this skill when asked to optimize, improve performance of, or rewrite SQL queries, particularly when dealing with correlated subqueries, complex joins, or queries that need CTEs/window functions.
Skill Details
Repository Files
1 file in this skill directory
name: query-optimize description: This skill provides guidance for SQL query optimization tasks, including rewriting slow queries for better performance while preserving semantic equivalence. Use this skill when asked to optimize, improve performance of, or rewrite SQL queries, particularly when dealing with correlated subqueries, complex joins, or queries that need CTEs/window functions.
Query Optimize
Overview
This skill guides the optimization of SQL queries to improve performance while maintaining semantic equivalence with the original query. It covers common optimization patterns, verification strategies, and pitfalls to avoid.
Workflow
Step 1: Fully Read and Understand Input Files
Before starting any optimization work:
- Read the complete original query - If tool responses truncate the query, request the full content explicitly. Never proceed with partial understanding.
- Read the complete database schema - Understand all relevant tables, columns, and relationships. Request continuation if truncated.
- Document existing indexes - Note which indexes exist (if any) as this affects optimization strategy.
- Understand the query requirements - Identify LIMIT clauses, ORDER BY specifications, and HAVING conditions.
Common Pitfall: Starting optimization without seeing the full original query leads to semantic differences in the output.
Step 2: Analyze Performance Bottlenecks
Identify common performance issues:
- Correlated subqueries - Subqueries that reference outer query columns execute once per row
- Repeated computations - Same subquery appearing multiple times
- Missing indexes - Full table scans where indexes could help
- Inefficient joins - Cartesian products or unnecessary joins
- Redundant DISTINCT operations - DISTINCT on already-unique results
Document which specific parts of the query are problematic before rewriting.
Step 3: Choose Optimization Strategy
Common optimization patterns:
- CTE (Common Table Expressions) - Extract repeated subqueries into WITH clauses
- Window functions - Replace correlated subqueries with ROW_NUMBER(), RANK(), etc.
- Materialized subqueries - Pre-compute expensive operations once
- Join optimization - Reorder joins, use appropriate join types
- Index suggestions - Recommend indexes if schema modifications are permitted
Document semantic equivalence: Explicitly map each part of the original query to the optimized version. For example:
- Original:
COUNT(*) DESC, s.synsetid ASCin subquery - Optimized:
ROW_NUMBER() OVER (PARTITION BY wordid ORDER BY sense_count DESC, synsetid ASC)
Step 4: Implement the Optimized Query
When writing the optimized query:
-
Preserve all semantic requirements:
- Same columns in output
- Same filtering conditions (WHERE, HAVING)
- Same ordering (ORDER BY)
- Same row limits (LIMIT)
-
Use readable formatting:
- Indent CTEs and subqueries consistently
- Add comments for complex transformations
- Use meaningful CTE names
-
Verify the write operation - After writing the solution file, always read it back to confirm the complete content was written correctly.
Common Pitfall: File writes may appear truncated in tool responses. Always verify by reading the file back.
Step 5: Verification Strategy
Verification must be comprehensive and systematic:
Use Database-Native Timing
Prefer database-native timing over shell commands:
- SQLite: Use
.timer onbefore running queries - PostgreSQL: Use
EXPLAIN ANALYZE - MySQL: Use
SET profiling = 1
Shell time commands often fail or give unreliable results for query timing.
Output Comparison Methods
-
Full output comparison (preferred for small results):
-- Save original output .output original_output.txt SELECT * FROM (<original_query>); -- Save optimized output .output optimized_output.txt SELECT * FROM (<optimized_query>); -- Compare with diff -
Checksum comparison (for large results):
SELECT COUNT(*), SUM(hash_column) FROM (<query>); -
Spot-check verification (supplement, not replacement):
- Verify rows at beginning, middle, and end
- Check edge cases (boundary values in HAVING conditions)
- Test with different LIMIT/OFFSET combinations
Common Pitfall: Only checking a few rows is insufficient. Use checksums or full comparison for complete verification.
Edge Cases to Verify
- Rows at exact threshold of filtering conditions (e.g., exactly 2 synsets when HAVING requires >= 2)
- Tie-breaking behavior in ORDER BY
- NULL handling in joins and aggregations
- Empty result sets for boundary conditions
Step 6: Final Validation
Before concluding:
- Read and display the final solution file in full
- Confirm performance improvement with timing measurements
- Confirm output equivalence with comprehensive comparison
- Document any assumptions made during optimization
Common Optimization Patterns
Correlated Subquery to Window Function
Before (slow - executes subquery per row):
SELECT w.word,
(SELECT syn.synsetid
FROM senses s JOIN synsets syn ON s.synsetid = syn.synsetid
WHERE s.wordid = w.wordid
ORDER BY COUNT(*) DESC LIMIT 1) AS top_synset
FROM words w
After (fast - single pass with window function):
WITH ranked AS (
SELECT w.wordid, syn.synsetid,
ROW_NUMBER() OVER (PARTITION BY w.wordid
ORDER BY COUNT(*) DESC, syn.synsetid ASC) AS rn
FROM words w
JOIN senses s ON w.wordid = s.wordid
JOIN synsets syn ON s.synsetid = syn.synsetid
GROUP BY w.wordid, syn.synsetid
)
SELECT w.word, r.synsetid AS top_synset
FROM words w
JOIN ranked r ON w.wordid = r.wordid AND r.rn = 1
Repeated Subquery to CTE
Before (computes same thing twice):
SELECT (SELECT COUNT(*) FROM orders WHERE user_id = u.id) AS order_count,
(SELECT COUNT(*) FROM orders WHERE user_id = u.id) * price AS total
FROM users u
After (computes once):
WITH user_orders AS (
SELECT user_id, COUNT(*) AS order_count
FROM orders
GROUP BY user_id
)
SELECT uo.order_count, uo.order_count * u.price AS total
FROM users u
JOIN user_orders uo ON u.id = uo.user_id
Efficiency Guidelines
- Avoid repetitive SQL file creation - Create a parameterized test harness early
- Define verification approach upfront - Plan how equivalence will be proven before starting
- Use a single comprehensive test script rather than multiple ad-hoc queries
- If a verification attempt times out, try a smaller sample or use checksum-based comparison
Checklist Before Completion
- Original query fully read and understood
- Schema fully understood with all relevant tables
- Optimized query preserves semantic equivalence
- Solution file read back and verified complete
- Performance improvement confirmed with timing
- Output equivalence verified comprehensively (not just spot checks)
- Edge cases tested
Related Skills
Team Composition Analysis
This skill should be used when the user asks to "plan team structure", "determine hiring needs", "design org chart", "calculate compensation", "plan equity allocation", or requests organizational design and headcount planning for a startup.
Startup Financial Modeling
This skill should be used when the user asks to "create financial projections", "build a financial model", "forecast revenue", "calculate burn rate", "estimate runway", "model cash flow", or requests 3-5 year financial planning for a startup.
Startup Metrics Framework
This skill should be used when the user asks about "key startup metrics", "SaaS metrics", "CAC and LTV", "unit economics", "burn multiple", "rule of 40", "marketplace metrics", or requests guidance on tracking and optimizing business performance metrics.
Market Sizing Analysis
This skill should be used when the user asks to "calculate TAM", "determine SAM", "estimate SOM", "size the market", "calculate market opportunity", "what's the total addressable market", or requests market sizing analysis for a startup or business opportunity.
Anndata
This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.
Geopandas
Python library for working with geospatial vector data including shapefiles, GeoJSON, and GeoPackage files. Use when working with geographic data for spatial analysis, geometric operations, coordinate transformations, spatial joins, overlay operations, choropleth mapping, or any task involving reading/writing/analyzing vector geographic data. Supports PostGIS databases, interactive maps, and integration with matplotlib/folium/cartopy. Use for tasks like buffer analysis, spatial joins between dat
Market Research Reports
Generate comprehensive market research reports (50+ pages) in the style of top consulting firms (McKinsey, BCG, Gartner). Features professional LaTeX formatting, extensive visual generation with scientific-schematics and generate-image, deep integration with research-lookup for data gathering, and multi-framework strategic analysis including Porter's Five Forces, PESTLE, SWOT, TAM/SAM/SOM, and BCG Matrix.
Plotly
Interactive scientific and statistical data visualization library for Python. Use when creating charts, plots, or visualizations including scatter plots, line charts, bar charts, heatmaps, 3D plots, geographic maps, statistical distributions, financial charts, and dashboards. Supports both quick visualizations (Plotly Express) and fine-grained customization (graph objects). Outputs interactive HTML or static images (PNG, PDF, SVG).
Excel Analysis
Analyze Excel spreadsheets, create pivot tables, generate charts, and perform data analysis. Use when analyzing Excel files, spreadsheets, tabular data, or .xlsx files.
Neurokit2
Comprehensive biosignal processing toolkit for analyzing physiological data including ECG, EEG, EDA, RSP, PPG, EMG, and EOG signals. Use this skill when processing cardiovascular signals, brain activity, electrodermal responses, respiratory patterns, muscle activity, or eye movements. Applicable for heart rate variability analysis, event-related potentials, complexity measures, autonomic nervous system assessment, psychophysiology research, and multi-modal physiological signal integration.
