Database Query Optimizer
by Dexploarer
Analyzes and optimizes database queries for PostgreSQL, MySQL, MongoDB with EXPLAIN plans, index suggestions, and N+1 query detection. Use when user asks to "optimize query", "analyze EXPLAIN plan", "fix slow queries", or "suggest database indexes".
Skill Details
Repository Files
1 file in this skill directory
name: database-query-optimizer description: Analyzes and optimizes database queries for PostgreSQL, MySQL, MongoDB with EXPLAIN plans, index suggestions, and N+1 query detection. Use when user asks to "optimize query", "analyze EXPLAIN plan", "fix slow queries", or "suggest database indexes". allowed-tools: [Read, Write, Bash]
Database Query Optimizer
Analyzes database queries, interprets EXPLAIN plans, suggests indexes, and detects common performance issues like N+1 queries.
When to Use
- "Optimize my database query"
- "Analyze EXPLAIN plan"
- "Why is my query slow?"
- "Suggest indexes"
- "Fix N+1 queries"
- "Improve database performance"
Instructions
1. PostgreSQL Query Analysis
Run EXPLAIN:
EXPLAIN ANALYZE
SELECT u.name, COUNT(p.id) as post_count
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id, u.name
ORDER BY post_count DESC
LIMIT 10;
Interpret EXPLAIN output:
QUERY PLAN
-----------------------------------------------------------
Limit (cost=1234.56..1234.58 rows=10 width=40) (actual time=45.123..45.125 rows=10 loops=1)
-> Sort (cost=1234.56..1345.67 rows=44444 width=40) (actual time=45.122..45.123 rows=10 loops=1)
Sort Key: (count(p.id)) DESC
Sort Method: top-N heapsort Memory: 25kB
-> HashAggregate (cost=1000.00..1200.00 rows=44444 width=40) (actual time=40.456..42.789 rows=45000 loops=1)
Group Key: u.id
-> Hash Left Join (cost=100.00..900.00 rows=50000 width=32) (actual time=1.234..35.678 rows=100000 loops=1)
Hash Cond: (p.user_id = u.id)
-> Seq Scan on posts p (cost=0.00..500.00 rows=50000 width=4) (actual time=0.010..10.234 rows=50000 loops=1)
-> Hash (cost=75.00..75.00 rows=2000 width=32) (actual time=1.200..1.200 rows=2000 loops=1)
Buckets: 2048 Batches: 1 Memory Usage: 125kB
-> Seq Scan on users u (cost=0.00..75.00 rows=2000 width=32) (actual time=0.005..0.678 rows=2000 loops=1)
Filter: (created_at > '2024-01-01'::date)
Rows Removed by Filter: 500
Planning Time: 0.234 ms
Execution Time: 45.234 ms
Key metrics to analyze:
- cost: Estimated cost (first number = startup, second = total)
- rows: Estimated rows returned
- width: Average row size in bytes
- actual time: Real execution time (ms)
- loops: Number of times node executed
Red flags:
- Sequential Scan on large tables
- High cost values
- Rows estimate far from actual
- Multiple loops
- Slow execution time
2. Optimization Strategies
Add Index:
-- Create index on filtered column
CREATE INDEX idx_users_created_at ON users(created_at);
-- Create index on join column
CREATE INDEX idx_posts_user_id ON posts(user_id);
-- Composite index for specific query pattern
CREATE INDEX idx_users_created_name ON users(created_at, name);
-- Partial index for common filter
CREATE INDEX idx_users_recent ON users(created_at) WHERE created_at > '2024-01-01';
-- Covering index (includes all needed columns)
CREATE INDEX idx_users_covering ON users(id, name, created_at);
Rewrite Query:
-- ❌ BAD: Subquery in SELECT
SELECT
u.name,
(SELECT COUNT(*) FROM posts WHERE user_id = u.id) as post_count
FROM users u;
-- ✅ GOOD: Use JOIN
SELECT
u.name,
COUNT(p.id) as post_count
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
GROUP BY u.id, u.name;
-- ❌ BAD: OR conditions
SELECT * FROM users WHERE email = 'test@example.com' OR username = 'test';
-- ✅ GOOD: Use UNION (can use separate indexes)
SELECT * FROM users WHERE email = 'test@example.com'
UNION
SELECT * FROM users WHERE username = 'test';
-- ❌ BAD: Function on indexed column
SELECT * FROM users WHERE LOWER(email) = 'test@example.com';
-- ✅ GOOD: Functional index or avoid function
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
-- Or just:
SELECT * FROM users WHERE email = 'test@example.com';
3. N+1 Query Detection
Problem:
# Python/SQLAlchemy example
# ❌ N+1 Query Problem
users = User.query.all() # 1 query
for user in users:
posts = user.posts # N queries (one per user)
print(f"{user.name}: {len(posts)} posts")
# Total: 1 + N queries
Solution:
# ✅ Eager Loading
users = User.query.options(joinedload(User.posts)).all() # 1 query
for user in users:
posts = user.posts # No additional query
print(f"{user.name}: {len(posts)} posts")
# Total: 1 query
Node.js/Sequelize:
// ❌ N+1 Problem
const users = await User.findAll();
for (const user of users) {
const posts = await user.getPosts(); // N queries
}
// ✅ Solution: Include associations
const users = await User.findAll({
include: [{ model: Post }] // 1 query with JOIN
});
Rails/ActiveRecord:
# ❌ N+1 Problem
users = User.all
users.each do |user|
puts user.posts.count # N queries
end
# ✅ Solution: includes
users = User.includes(:posts)
users.each do |user|
puts user.posts.count # No additional queries
end
4. Index Suggestions
Automated analysis:
-- PostgreSQL: Find missing indexes
SELECT schemaname, tablename, attname, n_distinct, correlation
FROM pg_stats
WHERE schemaname = 'public'
AND n_distinct > 100
AND correlation < 0.5
ORDER BY n_distinct DESC;
-- Find tables with sequential scans
SELECT schemaname, tablename, seq_scan, seq_tup_read,
idx_scan, idx_tup_fetch
FROM pg_stat_user_tables
WHERE seq_scan > 0
AND seq_tup_read / seq_scan > 10000
ORDER BY seq_tup_read DESC;
-- Unused indexes
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexrelname NOT LIKE 'pg_toast%'
ORDER BY pg_relation_size(indexrelid) DESC;
MySQL:
-- Missing indexes
SELECT * FROM sys.schema_unused_indexes;
-- Duplicate indexes
SELECT * FROM sys.schema_redundant_indexes;
-- Table scan queries
SELECT * FROM sys.statements_with_full_table_scans
LIMIT 10;
5. Query Optimization Checklist
Python Script:
#!/usr/bin/env python3
import psycopg2
import re
class QueryOptimizer:
def __init__(self, conn):
self.conn = conn
def analyze_query(self, query):
"""Analyze query and provide optimization suggestions."""
suggestions = []
# Check for SELECT *
if re.search(r'SELECT\s+\*', query, re.IGNORECASE):
suggestions.append("❌ Avoid SELECT *. Specify only needed columns.")
# Check for missing WHERE clause
if re.search(r'FROM\s+\w+', query, re.IGNORECASE) and \
not re.search(r'WHERE', query, re.IGNORECASE):
suggestions.append("⚠️ No WHERE clause. Consider adding filters.")
# Check for OR in WHERE
if re.search(r'WHERE.*\sOR\s', query, re.IGNORECASE):
suggestions.append("⚠️ OR conditions may prevent index usage. Consider UNION.")
# Check for functions on indexed columns
if re.search(r'WHERE\s+\w+\([^\)]+\)\s*=', query, re.IGNORECASE):
suggestions.append("❌ Functions on columns prevent index usage.")
# Check for LIKE with leading wildcard
if re.search(r'LIKE\s+[\'"]%', query, re.IGNORECASE):
suggestions.append("❌ LIKE with leading % cannot use index.")
# Run EXPLAIN
cursor = self.conn.cursor()
try:
cursor.execute(f"EXPLAIN ANALYZE {query}")
plan = cursor.fetchall()
# Check for sequential scans
plan_str = str(plan)
if 'Seq Scan' in plan_str:
suggestions.append("❌ Sequential scan detected. Consider adding index.")
# Check for high cost
cost_match = re.search(r'cost=(\d+\.\d+)', plan_str)
if cost_match:
cost = float(cost_match.group(1))
if cost > 10000:
suggestions.append(f"⚠️ High query cost: {cost:.2f}")
return {
'suggestions': suggestions,
'explain_plan': plan
}
finally:
cursor.close()
def suggest_indexes(self, query):
"""Suggest indexes based on query pattern."""
indexes = []
# Find WHERE conditions
where_matches = re.findall(r'WHERE\s+(\w+)\s*[=<>]', query, re.IGNORECASE)
for col in where_matches:
indexes.append(f"CREATE INDEX idx_{col} ON table_name({col});")
# Find JOIN conditions
join_matches = re.findall(r'ON\s+\w+\.(\w+)\s*=\s*\w+\.(\w+)', query, re.IGNORECASE)
for col1, col2 in join_matches:
indexes.append(f"CREATE INDEX idx_{col1} ON table_name({col1});")
indexes.append(f"CREATE INDEX idx_{col2} ON table_name({col2});")
# Find ORDER BY
order_matches = re.findall(r'ORDER BY\s+(\w+)', query, re.IGNORECASE)
for col in order_matches:
indexes.append(f"CREATE INDEX idx_{col} ON table_name({col});")
return list(set(indexes))
# Usage
conn = psycopg2.connect("dbname=mydb user=postgres")
optimizer = QueryOptimizer(conn)
query = """
SELECT u.name, u.email, COUNT(p.id)
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id
ORDER BY COUNT(p.id) DESC
LIMIT 10
"""
result = optimizer.analyze_query(query)
for suggestion in result['suggestions']:
print(suggestion)
print("\nSuggested indexes:")
for index in optimizer.suggest_indexes(query):
print(index)
6. MongoDB Optimization
Analyze Query:
db.users.find({
created_at: { $gt: ISODate("2024-01-01") },
status: "active"
}).sort({ created_at: -1 }).explain("executionStats")
Check for issues:
// Check execution stats
const stats = db.users.find({ status: "active" }).explain("executionStats");
// Red flags:
// - totalDocsExamined >> nReturned (scanning many docs)
// - COLLSCAN stage (no index used)
// - High executionTimeMillis
// Create index
db.users.createIndex({ status: 1, created_at: -1 });
// Compound index for specific query
db.users.createIndex({ status: 1, created_at: -1, name: 1 });
7. ORM Query Optimization
Django:
# ❌ N+1 Problem
users = User.objects.all()
for user in users:
print(user.profile.bio) # N queries
# ✅ select_related (for ForeignKey/OneToOne)
users = User.objects.select_related('profile').all()
# ✅ prefetch_related (for ManyToMany/reverse ForeignKey)
users = User.objects.prefetch_related('posts').all()
# ❌ Loading all records
users = User.objects.all() # Loads everything into memory
# ✅ Use iterator for large datasets
for user in User.objects.iterator(chunk_size=1000):
process(user)
# ❌ Multiple queries
active_users = User.objects.filter(is_active=True).count()
inactive_users = User.objects.filter(is_active=False).count()
# ✅ Single aggregation
from django.db.models import Count, Q
stats = User.objects.aggregate(
active=Count('id', filter=Q(is_active=True)),
inactive=Count('id', filter=Q(is_active=False))
)
TypeORM:
// ❌ N+1 Problem
const users = await userRepository.find();
for (const user of users) {
const posts = await postRepository.find({ where: { userId: user.id } });
}
// ✅ Use relations
const users = await userRepository.find({
relations: ['posts', 'profile']
});
// ✅ Query Builder for complex queries
const users = await userRepository
.createQueryBuilder('user')
.leftJoinAndSelect('user.posts', 'post')
.where('user.created_at > :date', { date: '2024-01-01' })
.andWhere('post.status = :status', { status: 'published' })
.getMany();
// Use select to limit columns
const users = await userRepository
.createQueryBuilder('user')
.select(['user.id', 'user.name', 'user.email'])
.getMany();
8. Performance Monitoring
PostgreSQL:
-- Top slow queries
SELECT
query,
calls,
total_time,
mean_time,
max_time
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 10;
-- Table bloat
SELECT
schemaname,
tablename,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size,
pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename) - pg_relation_size(schemaname||'.'||tablename)) AS external_size
FROM pg_tables
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 10;
MySQL:
-- Slow queries
SELECT * FROM mysql.slow_log
ORDER BY query_time DESC
LIMIT 10;
-- Table statistics
SELECT
TABLE_NAME,
TABLE_ROWS,
DATA_LENGTH,
INDEX_LENGTH,
DATA_FREE
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = 'your_database'
ORDER BY DATA_LENGTH DESC;
Best Practices
DO:
- Add indexes on foreign keys
- Use EXPLAIN regularly
- Monitor slow query log
- Use connection pooling
- Implement pagination
- Cache frequent queries
- Use appropriate data types
- Regular VACUUM/ANALYZE
DON'T:
- Use SELECT *
- Over-index (slows writes)
- Use LIKE with leading %
- Use functions on indexed columns
- Ignore N+1 queries
- Load entire tables
- Skip query analysis
- Use OR excessively
Checklist
- Slow queries identified
- EXPLAIN plans analyzed
- Indexes added where needed
- N+1 queries fixed
- Query rewrites implemented
- Monitoring setup
- Connection pool configured
- Caching implemented
Related Skills
Xlsx
Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Analyzing Financial Statements
This skill calculates key financial ratios and metrics from financial statement data for investment analysis
Data Storytelling
Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.
Kpi Dashboard Design
Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.
Dbt Transformation Patterns
Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.
Sql Optimization Patterns
Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.
Anndata
This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.
Xlsx
Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.
