Reporting Optimization
by NeverSight
>
Skill Details
name: reporting-optimization description: > High Performance Reporting and Query Optimization Trigger: When optimizing database queries for reporting. license: Apache-2.0 metadata: author: poletron version: "1.0" scope: [root] auto_invoke: "Working with reporting optimization"
When to Use
Use this skill when:
- Optimizing slow reporting queries
- Building dashboards with heavy aggregations
- Working with large datasets
- Implementing caching strategies
Decision Tree
Need query optimization? → Use EXPLAIN ANALYZE
Need cached aggregations? → Use Materialized View
Need complex breakdown? → Use CTEs
Need row ranking? → Use Window Functions
Need partial coverage? → Use Partial Index
Reporting & Optimization Guidelines
Reporting queries often scan large datasets. Inefficient queries can degrade the performance of the entire system. Follow these guidelines to ensure speed and stability.
1. Query Optimization
1.1 SELECT Efficiency
- No
SELECT *: Fetching unnecessary columns increases I/O and network payload. Select only required fields. - SARGable Queries: Ensure predicates (WHERE clauses) are "Search ARGument ABLE" to perform Index Seeks instead of Scans.
- Bad:
WHERE YEAR(created_at) = 2023(Function on column prevents index use). - Good:
WHERE created_at >= '2023-01-01' AND created_at < '2024-01-01'.
- Bad:
1.2 CTEs vs Temporary Tables
- Common Table Expressions (CTEs): Use
WITHclauses for readability and to break down complex logic. In modern PostgreSQL (12+), CTEs are materialized when beneficial, making them performant. - Temporary Tables: For extremely complex multi-step processing involving heavy intermediate indexing, use
CREATE TEMP TABLE.
1.3 Subqueries
- Avoid Correlated Subqueries in the
SELECTlist that run once per row. Rewrite them asJOINs orLATERALjoins.- Bad:
SELECT id, (SELECT count(*) FROM orders WHERE user_id = u.id) FROM users u - Good:
SELECT u.id, count(o.id) FROM users u LEFT JOIN orders o ON u.id = o.user_id GROUP BY u.id
- Bad:
2. Aggregation Strategies
2.1 Materialized Views
- For dashboards requiring heavy aggregation (Count, Sum, Avg) over millions of rows, do not query the live transactional table every time.
- Use Materialized Views to cache the result:
CREATE MATERIALIZED VIEW mv_daily_sales AS SELECT day, SUM(total) FROM sales GROUP BY day; - Refresh explicitly:
REFRESH MATERIALIZED VIEW CONCURRENTLY mv_daily_sales;.
2.2 Window Functions
- Use Window Functions (
ROW_NUMBER(),RANK(),LEAD(),LAG()) for complex reporting intra-row logic instead of self-joins.
3. Safety Guardrails
3.1 Timeouts
- Set
statement_timeoutfor reporting roles to prevent a runaway query from locking resources indefinitely.
3.2 Result Limits
- Always verify the estimated cardinality. If a report could return 1M+ rows, enforce
LIMITor pagination logic.
4. Query Profiling
4.1 EXPLAIN ANALYZE
Before deploying complex queries, use EXPLAIN ANALYZE to understand the execution plan.
- Look for Seq Scans on large tables (potential missing index).
- Check Actual vs Estimated Rows for significant discrepancies (stale statistics).
- Identify Nested Loops with high row counts that could be rewritten as Hash Joins.
EXPLAIN ANALYZE
SELECT u.id, COUNT(o.id)
FROM USER u
LEFT JOIN ORDER o ON u.id = o.user_id
WHERE u.is_active = TRUE
GROUP BY u.id;
4.2 Partial Indexes
Create indexes that only cover a subset of rows to save space and speed up specific queries.
- Example: Index only active users.
CREATE INDEX idx_user_active ON USER(email) WHERE is_active = TRUE; - Use for columns with low cardinality where only specific values are frequently queried.
Related Skills
Attack Tree Construction
Build comprehensive attack trees to visualize threat paths. Use when mapping attack scenarios, identifying defense gaps, or communicating security risks to stakeholders.
Grafana Dashboards
Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.
Matplotlib
Foundational plotting library. Create line plots, scatter, bar, histograms, heatmaps, 3D, subplots, export PNG/PDF/SVG, for scientific visualization and publication figures.
Scientific Visualization
Create publication figures with matplotlib/seaborn/plotly. Multi-panel layouts, error bars, significance markers, colorblind-safe, export PDF/EPS/TIFF, for journal-ready scientific plots.
Seaborn
Statistical visualization. Scatter, box, violin, heatmaps, pair plots, regression, correlation matrices, KDE, faceted plots, for exploratory analysis and publication figures.
Shap
Model interpretability and explainability using SHAP (SHapley Additive exPlanations). Use this skill when explaining machine learning model predictions, computing feature importance, generating SHAP plots (waterfall, beeswarm, bar, scatter, force, heatmap), debugging models, analyzing model bias or fairness, comparing models, or implementing explainable AI. Works with tree-based models (XGBoost, LightGBM, Random Forest), deep learning (TensorFlow, PyTorch), linear models, and any black-box model
Pydeseq2
Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.
Query Writing
For writing and executing SQL queries - from simple single-table queries to complex multi-table JOINs and aggregations
Pydeseq2
Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.
Scientific Visualization
Meta-skill for publication-ready figures. Use when creating journal submission figures requiring multi-panel layouts, significance annotations, error bars, colorblind-safe palettes, and specific journal formatting (Nature, Science, Cell). Orchestrates matplotlib/seaborn/plotly with publication styles. For quick exploration use seaborn or plotly directly.
