name: database-query-optimizer description: Analyzes and optimizes database queries for PostgreSQL, MySQL, MongoDB with EXPLAIN plans, index suggestions, and N+1 query detection. Use when user asks to "optimize query", "analyze EXPLAIN plan", "fix slow queries", or "suggest database indexes". allowed-tools: [Read, Write, Bash]

Database Query Optimizer

Analyzes database queries, interprets EXPLAIN plans, suggests indexes, and detects common performance issues like N+1 queries.

When to Use

"Optimize my database query"
"Analyze EXPLAIN plan"
"Why is my query slow?"
"Suggest indexes"
"Fix N+1 queries"
"Improve database performance"

Instructions

1. PostgreSQL Query Analysis

Run EXPLAIN:

EXPLAIN ANALYZE
SELECT u.name, COUNT(p.id) as post_count
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id, u.name
ORDER BY post_count DESC
LIMIT 10;

Interpret EXPLAIN output:

QUERY PLAN
-----------------------------------------------------------
Limit  (cost=1234.56..1234.58 rows=10 width=40) (actual time=45.123..45.125 rows=10 loops=1)
  ->  Sort  (cost=1234.56..1345.67 rows=44444 width=40) (actual time=45.122..45.123 rows=10 loops=1)
        Sort Key: (count(p.id)) DESC
        Sort Method: top-N heapsort  Memory: 25kB
        ->  HashAggregate  (cost=1000.00..1200.00 rows=44444 width=40) (actual time=40.456..42.789 rows=45000 loops=1)
              Group Key: u.id
              ->  Hash Left Join  (cost=100.00..900.00 rows=50000 width=32) (actual time=1.234..35.678 rows=100000 loops=1)
                    Hash Cond: (p.user_id = u.id)
                    ->  Seq Scan on posts p  (cost=0.00..500.00 rows=50000 width=4) (actual time=0.010..10.234 rows=50000 loops=1)
                    ->  Hash  (cost=75.00..75.00 rows=2000 width=32) (actual time=1.200..1.200 rows=2000 loops=1)
                          Buckets: 2048  Batches: 1  Memory Usage: 125kB
                          ->  Seq Scan on users u  (cost=0.00..75.00 rows=2000 width=32) (actual time=0.005..0.678 rows=2000 loops=1)
                                Filter: (created_at > '2024-01-01'::date)
                                Rows Removed by Filter: 500
Planning Time: 0.234 ms
Execution Time: 45.234 ms

Key metrics to analyze:

cost: Estimated cost (first number = startup, second = total)
rows: Estimated rows returned
width: Average row size in bytes
actual time: Real execution time (ms)
loops: Number of times node executed

Red flags:

Sequential Scan on large tables
High cost values
Rows estimate far from actual
Multiple loops
Slow execution time

2. Optimization Strategies

Add Index:

-- Create index on filtered column
CREATE INDEX idx_users_created_at ON users(created_at);

-- Create index on join column
CREATE INDEX idx_posts_user_id ON posts(user_id);

-- Composite index for specific query pattern
CREATE INDEX idx_users_created_name ON users(created_at, name);

-- Partial index for common filter
CREATE INDEX idx_users_recent ON users(created_at) WHERE created_at > '2024-01-01';

-- Covering index (includes all needed columns)
CREATE INDEX idx_users_covering ON users(id, name, created_at);

Rewrite Query:

-- ❌ BAD: Subquery in SELECT
SELECT
    u.name,
    (SELECT COUNT(*) FROM posts WHERE user_id = u.id) as post_count
FROM users u;

-- ✅ GOOD: Use JOIN
SELECT
    u.name,
    COUNT(p.id) as post_count
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
GROUP BY u.id, u.name;

-- ❌ BAD: OR conditions
SELECT * FROM users WHERE email = 'test@example.com' OR username = 'test';

-- ✅ GOOD: Use UNION (can use separate indexes)
SELECT * FROM users WHERE email = 'test@example.com'
UNION
SELECT * FROM users WHERE username = 'test';

-- ❌ BAD: Function on indexed column
SELECT * FROM users WHERE LOWER(email) = 'test@example.com';

-- ✅ GOOD: Functional index or avoid function
CREATE INDEX idx_users_email_lower ON users(LOWER(email));
-- Or just:
SELECT * FROM users WHERE email = 'test@example.com';

3. N+1 Query Detection

Problem:

# Python/SQLAlchemy example
# ❌ N+1 Query Problem
users = User.query.all()  # 1 query
for user in users:
    posts = user.posts  # N queries (one per user)
    print(f"{user.name}: {len(posts)} posts")
# Total: 1 + N queries

Solution:

# ✅ Eager Loading
users = User.query.options(joinedload(User.posts)).all()  # 1 query
for user in users:
    posts = user.posts  # No additional query
    print(f"{user.name}: {len(posts)} posts")
# Total: 1 query

Node.js/Sequelize:

// ❌ N+1 Problem
const users = await User.findAll();
for (const user of users) {
  const posts = await user.getPosts();  // N queries
}

// ✅ Solution: Include associations
const users = await User.findAll({
  include: [{ model: Post }]  // 1 query with JOIN
});

Rails/ActiveRecord:

# ❌ N+1 Problem
users = User.all
users.each do |user|
  puts user.posts.count  # N queries
end

# ✅ Solution: includes
users = User.includes(:posts)
users.each do |user|
  puts user.posts.count  # No additional queries
end

4. Index Suggestions

Automated analysis:

-- PostgreSQL: Find missing indexes
SELECT schemaname, tablename, attname, n_distinct, correlation
FROM pg_stats
WHERE schemaname = 'public'
  AND n_distinct > 100
  AND correlation < 0.5
ORDER BY n_distinct DESC;

-- Find tables with sequential scans
SELECT schemaname, tablename, seq_scan, seq_tup_read,
       idx_scan, idx_tup_fetch
FROM pg_stat_user_tables
WHERE seq_scan > 0
  AND seq_tup_read / seq_scan > 10000
ORDER BY seq_tup_read DESC;

-- Unused indexes
SELECT schemaname, tablename, indexname, idx_scan
FROM pg_stat_user_indexes
WHERE idx_scan = 0
  AND indexrelname NOT LIKE 'pg_toast%'
ORDER BY pg_relation_size(indexrelid) DESC;

MySQL:

-- Missing indexes
SELECT * FROM sys.schema_unused_indexes;

-- Duplicate indexes
SELECT * FROM sys.schema_redundant_indexes;

-- Table scan queries
SELECT * FROM sys.statements_with_full_table_scans
LIMIT 10;

5. Query Optimization Checklist

Python Script:

#!/usr/bin/env python3
import psycopg2
import re

class QueryOptimizer:
    def __init__(self, conn):
        self.conn = conn

    def analyze_query(self, query):
        """Analyze query and provide optimization suggestions."""
        suggestions = []

        # Check for SELECT *
        if re.search(r'SELECT\s+\*', query, re.IGNORECASE):
            suggestions.append("❌ Avoid SELECT *. Specify only needed columns.")

        # Check for missing WHERE clause
        if re.search(r'FROM\s+\w+', query, re.IGNORECASE) and \
           not re.search(r'WHERE', query, re.IGNORECASE):
            suggestions.append("⚠️  No WHERE clause. Consider adding filters.")

        # Check for OR in WHERE
        if re.search(r'WHERE.*\sOR\s', query, re.IGNORECASE):
            suggestions.append("⚠️  OR conditions may prevent index usage. Consider UNION.")

        # Check for functions on indexed columns
        if re.search(r'WHERE\s+\w+\([^\)]+\)\s*=', query, re.IGNORECASE):
            suggestions.append("❌ Functions on columns prevent index usage.")

        # Check for LIKE with leading wildcard
        if re.search(r'LIKE\s+[\'"]%', query, re.IGNORECASE):
            suggestions.append("❌ LIKE with leading % cannot use index.")

        # Run EXPLAIN
        cursor = self.conn.cursor()
        try:
            cursor.execute(f"EXPLAIN ANALYZE {query}")
            plan = cursor.fetchall()

            # Check for sequential scans
            plan_str = str(plan)
            if 'Seq Scan' in plan_str:
                suggestions.append("❌ Sequential scan detected. Consider adding index.")

            # Check for high cost
            cost_match = re.search(r'cost=(\d+\.\d+)', plan_str)
            if cost_match:
                cost = float(cost_match.group(1))
                if cost > 10000:
                    suggestions.append(f"⚠️  High query cost: {cost:.2f}")

            return {
                'suggestions': suggestions,
                'explain_plan': plan
            }
        finally:
            cursor.close()

    def suggest_indexes(self, query):
        """Suggest indexes based on query pattern."""
        indexes = []

        # Find WHERE conditions
        where_matches = re.findall(r'WHERE\s+(\w+)\s*[=<>]', query, re.IGNORECASE)
        for col in where_matches:
            indexes.append(f"CREATE INDEX idx_{col} ON table_name({col});")

        # Find JOIN conditions
        join_matches = re.findall(r'ON\s+\w+\.(\w+)\s*=\s*\w+\.(\w+)', query, re.IGNORECASE)
        for col1, col2 in join_matches:
            indexes.append(f"CREATE INDEX idx_{col1} ON table_name({col1});")
            indexes.append(f"CREATE INDEX idx_{col2} ON table_name({col2});")

        # Find ORDER BY
        order_matches = re.findall(r'ORDER BY\s+(\w+)', query, re.IGNORECASE)
        for col in order_matches:
            indexes.append(f"CREATE INDEX idx_{col} ON table_name({col});")

        return list(set(indexes))

# Usage
conn = psycopg2.connect("dbname=mydb user=postgres")
optimizer = QueryOptimizer(conn)

query = """
SELECT u.name, u.email, COUNT(p.id)
FROM users u
LEFT JOIN posts p ON u.id = p.user_id
WHERE u.created_at > '2024-01-01'
GROUP BY u.id
ORDER BY COUNT(p.id) DESC
LIMIT 10
"""

result = optimizer.analyze_query(query)
for suggestion in result['suggestions']:
    print(suggestion)

print("\nSuggested indexes:")
for index in optimizer.suggest_indexes(query):
    print(index)

6. MongoDB Optimization

Analyze Query:

db.users.find({
  created_at: { $gt: ISODate("2024-01-01") },
  status: "active"
}).sort({ created_at: -1 }).explain("executionStats")

Check for issues:

// Check execution stats
const stats = db.users.find({ status: "active" }).explain("executionStats");

// Red flags:
// - totalDocsExamined >> nReturned (scanning many docs)
// - COLLSCAN stage (no index used)
// - High executionTimeMillis

// Create index
db.users.createIndex({ status: 1, created_at: -1 });

// Compound index for specific query
db.users.createIndex({ status: 1, created_at: -1, name: 1 });

7. ORM Query Optimization

Django:

# ❌ N+1 Problem
users = User.objects.all()
for user in users:
    print(user.profile.bio)  # N queries

# ✅ select_related (for ForeignKey/OneToOne)
users = User.objects.select_related('profile').all()

# ✅ prefetch_related (for ManyToMany/reverse ForeignKey)
users = User.objects.prefetch_related('posts').all()

# ❌ Loading all records
users = User.objects.all()  # Loads everything into memory

# ✅ Use iterator for large datasets
for user in User.objects.iterator(chunk_size=1000):
    process(user)

# ❌ Multiple queries
active_users = User.objects.filter(is_active=True).count()
inactive_users = User.objects.filter(is_active=False).count()

# ✅ Single aggregation
from django.db.models import Count, Q
stats = User.objects.aggregate(
    active=Count('id', filter=Q(is_active=True)),
    inactive=Count('id', filter=Q(is_active=False))
)

TypeORM:

// ❌ N+1 Problem
const users = await userRepository.find();
for (const user of users) {
  const posts = await postRepository.find({ where: { userId: user.id } });
}

// ✅ Use relations
const users = await userRepository.find({
  relations: ['posts', 'profile']
});

// ✅ Query Builder for complex queries
const users = await userRepository
  .createQueryBuilder('user')
  .leftJoinAndSelect('user.posts', 'post')
  .where('user.created_at > :date', { date: '2024-01-01' })
  .andWhere('post.status = :status', { status: 'published' })
  .getMany();

// Use select to limit columns
const users = await userRepository
  .createQueryBuilder('user')
  .select(['user.id', 'user.name', 'user.email'])
  .getMany();

8. Performance Monitoring

PostgreSQL:

-- Top slow queries
SELECT
  query,
  calls,
  total_time,
  mean_time,
  max_time
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 10;

-- Table bloat
SELECT
  schemaname,
  tablename,
  pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename)) AS size,
  pg_size_pretty(pg_total_relation_size(schemaname||'.'||tablename) - pg_relation_size(schemaname||'.'||tablename)) AS external_size
FROM pg_tables
ORDER BY pg_total_relation_size(schemaname||'.'||tablename) DESC
LIMIT 10;

MySQL:

-- Slow queries
SELECT * FROM mysql.slow_log
ORDER BY query_time DESC
LIMIT 10;

-- Table statistics
SELECT
  TABLE_NAME,
  TABLE_ROWS,
  DATA_LENGTH,
  INDEX_LENGTH,
  DATA_FREE
FROM information_schema.TABLES
WHERE TABLE_SCHEMA = 'your_database'
ORDER BY DATA_LENGTH DESC;

Best Practices

DO:

Add indexes on foreign keys
Use EXPLAIN regularly
Monitor slow query log
Use connection pooling
Implement pagination
Cache frequent queries
Use appropriate data types
Regular VACUUM/ANALYZE

DON'T:

Use SELECT *
Over-index (slows writes)
Use LIKE with leading %
Use functions on indexed columns
Ignore N+1 queries
Load entire tables
Skip query analysis
Use OR excessively

Database Query Optimizer

Skill Details

Repository Files

Database Query Optimizer

When to Use

Instructions

1. PostgreSQL Query Analysis

2. Optimization Strategies

3. N+1 Query Detection

4. Index Suggestions

5. Query Optimization Checklist

6. MongoDB Optimization

7. ORM Query Optimization

8. Performance Monitoring

Best Practices

Checklist

Related Skills

Xlsx

Clickhouse Io

Clickhouse Io

Analyzing Financial Statements

Data Storytelling

Kpi Dashboard Design

Dbt Transformation Patterns

Sql Optimization Patterns

Anndata

Xlsx

Skill Information