Optimizing Performance
by CloudAI-X
Guides performance optimization, profiling techniques, and bottleneck identification. Use when improving application speed, reducing resource usage, or diagnosing performance issues.
Skill Details
Repository Files
1 file in this skill directory
name: optimizing-performance description: Guides performance optimization, profiling techniques, and bottleneck identification. Use when improving application speed, reducing resource usage, or diagnosing performance issues. license: MIT compatibility: opencode metadata: category: quality audience: developers
Optimizing Performance
Strategies for identifying, analyzing, and resolving performance bottlenecks.
When to Use This Skill
- Application is running slowly
- High resource consumption (CPU, memory)
- Database queries are slow
- API response times are high
- Need to scale for more users
- Preparing for load testing
Performance Optimization Philosophy
The Golden Rules
- Measure first - Never optimize without data
- Optimize the right thing - Find the actual bottleneck
- Keep it simple - Complexity often hurts performance
- Test after - Verify the optimization worked
- Document trade-offs - Performance often costs readability
The 80/20 Rule
80% of performance problems come from 20% of the code.
Focus on:
├── Hot paths (frequently executed code)
├── I/O operations (database, network, disk)
├── Memory allocation patterns
└── Algorithm complexity
Profiling Techniques
Types of Profiling
| Type | What It Measures | Tools |
|---|---|---|
| CPU Profiling | Time spent in functions | pprof, py-spy, Chrome DevTools |
| Memory Profiling | Allocation patterns, leaks | Valgrind, memory_profiler, Chrome |
| I/O Profiling | Disk/network operations | strace, perf, Wireshark |
| Database Profiling | Query performance | EXPLAIN, slow query log, APM |
Profiling Workflow
1. Establish baseline
└─ Measure current performance with realistic load
2. Identify hotspots
└─ Profile to find where time/resources are spent
3. Form hypothesis
└─ Why is this slow? What would make it faster?
4. Implement fix
└─ Make ONE change at a time
5. Measure again
└─ Did it help? By how much?
6. Repeat
└─ Until performance goals are met
Common Profiling Commands
# Node.js
node --prof app.js
node --prof-process isolate-*.log > profile.txt
# Python
python -m cProfile -s cumtime app.py
py-spy record -o profile.svg -- python app.py
# Go
go test -cpuprofile cpu.prof -memprofile mem.prof -bench .
go tool pprof cpu.prof
# Database (PostgreSQL)
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com';
Common Bottleneck Patterns
N+1 Query Problem
BAD (N+1 queries):
SELECT * FROM posts; -- 1 query
SELECT * FROM users WHERE id=1; -- N queries
SELECT * FROM users WHERE id=2;
...
GOOD (2 queries):
SELECT * FROM posts;
SELECT * FROM users WHERE id IN (1, 2, 3, ...);
Detection: High query count relative to data returned Fix: Eager loading, batch fetching, JOINs
Unbounded Operations
BAD:
SELECT * FROM logs; -- Returns millions of rows
GOOD:
SELECT * FROM logs
WHERE created_at > NOW() - INTERVAL '1 day'
LIMIT 100;
Detection: Memory spikes, timeouts Fix: Pagination, limits, streaming
Synchronous Blocking
BAD (blocking):
result1 = fetch_api_1() -- Wait 200ms
result2 = fetch_api_2() -- Wait 200ms
return combine(result1, result2) -- Total: 400ms
GOOD (parallel):
[result1, result2] = await Promise.all([
fetch_api_1(),
fetch_api_2()
]) -- Total: ~200ms
Detection: Sequential I/O in traces Fix: Parallel execution, async/await
Excessive Allocation
BAD (allocates in loop):
for item in large_list:
result = [] # Allocates each iteration
result.append(transform(item))
GOOD (pre-allocate):
result = []
for item in large_list:
result.append(transform(item))
BEST (generator):
def transform_all(items):
for item in items:
yield transform(item)
Detection: GC pressure, memory profiling Fix: Object pooling, pre-allocation, generators
Optimization Techniques
Database Optimization
| Technique | When to Use | Impact |
|---|---|---|
| Indexing | Slow WHERE/JOIN queries | High |
| Query optimization | Complex queries | High |
| Connection pooling | Many short connections | Medium |
| Read replicas | Read-heavy workloads | High |
| Caching | Repeated queries | Very High |
| Denormalization | Complex JOINs | Medium |
Index Guidelines
-- Create index for frequently queried columns
CREATE INDEX idx_users_email ON users(email);
-- Composite index for multiple column queries
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at);
-- Check if index is used
EXPLAIN ANALYZE SELECT * FROM users WHERE email = 'test@example.com';
Caching Strategies
| Strategy | Use Case | Invalidation |
|---|---|---|
| Cache-aside | General purpose | Manual or TTL |
| Write-through | Strong consistency | On write |
| Write-behind | Write-heavy | Async batched |
| Read-through | Read-heavy | On miss |
Cache-aside pattern:
1. Check cache
2. If miss, query database
3. Store in cache
4. Return result
Memory Optimization
| Technique | When to Use |
|---|---|
| Object pooling | Frequent allocation of same type |
| Lazy loading | Large objects not always needed |
| Streaming | Processing large datasets |
| Weak references | Cache that can be evicted |
| Data structure choice | Right structure for access pattern |
Frontend Performance
Core Web Vitals
| Metric | Target | What It Measures |
|---|---|---|
| LCP (Largest Contentful Paint) | < 2.5s | Load performance |
| INP (Interaction to Next Paint) | < 200ms | Interactivity |
| CLS (Cumulative Layout Shift) | < 0.1 | Visual stability |
Frontend Optimization Checklist
Loading Performance:
☐ Code splitting (lazy load routes/components)
☐ Tree shaking (remove unused code)
☐ Minification (JS, CSS)
☐ Compression (gzip, brotli)
☐ Image optimization (WebP, srcset, lazy loading)
☐ CDN for static assets
Runtime Performance:
☐ Virtualized lists for large data
☐ Debounce/throttle event handlers
☐ Memoization of expensive computations
☐ Avoid layout thrashing (batch DOM reads/writes)
☐ Use CSS transforms for animations
☐ Web Workers for heavy computation
Bundle Optimization
# Analyze bundle size
npx webpack-bundle-analyzer stats.json
npx source-map-explorer bundle.js
# Identify large dependencies
npx depcheck
API Performance
Response Time Targets
| Percentile | Target | User Experience |
|---|---|---|
| p50 | < 100ms | Fast |
| p95 | < 500ms | Acceptable |
| p99 | < 1s | Tolerable |
API Optimization Techniques
| Technique | Benefit |
|---|---|
| Response compression | Reduce transfer size |
| Pagination | Limit response size |
| Field selection | Return only needed data |
| ETags/Caching headers | Reduce redundant requests |
| Connection keep-alive | Reduce handshake overhead |
| HTTP/2 | Multiplexing, header compression |
Batch Endpoints
BAD (multiple requests):
GET /users/1
GET /users/2
GET /users/3
GOOD (batch):
POST /users/batch
{ "ids": [1, 2, 3] }
Monitoring and Alerting
Key Metrics to Track
| Category | Metrics |
|---|---|
| Latency | p50, p95, p99 response times |
| Throughput | Requests per second |
| Errors | Error rate, error types |
| Saturation | CPU, memory, connections |
Alerting Thresholds
Critical (page immediately):
- Error rate > 5%
- p99 latency > 5s
- Service down
Warning (notify during hours):
- Error rate > 1%
- p95 latency > 2s
- Resource utilization > 80%
Logging for Performance
# Log slow operations
import time
import logging
def timed_operation(func):
def wrapper(*args, **kwargs):
start = time.time()
result = func(*args, **kwargs)
duration = time.time() - start
if duration > 1.0: # Log if > 1 second
logging.warning(f"{func.__name__} took {duration:.2f}s")
return result
return wrapper
Performance Testing
Load Testing Tools
| Tool | Use Case |
|---|---|
| k6 | Modern, scriptable load testing |
| JMeter | Complex scenarios, GUI |
| Locust | Python-based, distributed |
| Artillery | YAML config, easy to start |
| wrk | Simple HTTP benchmarking |
Load Test Example (k6)
import http from 'k6/http';
import { check, sleep } from 'k6';
export const options = {
stages: [
{ duration: '1m', target: 50 }, // Ramp up
{ duration: '5m', target: 50 }, // Stay at 50 users
{ duration: '1m', target: 0 }, // Ramp down
],
thresholds: {
http_req_duration: ['p(95)<500'], // 95% under 500ms
http_req_failed: ['rate<0.01'], // Error rate < 1%
},
};
export default function () {
const res = http.get('https://api.example.com/users');
check(res, { 'status is 200': (r) => r.status === 200 });
sleep(1);
}
Anti-Patterns to Avoid
- Premature optimization - Optimize only proven bottlenecks
- Optimizing without measuring - Guessing wastes time
- Over-caching - Cache invalidation is hard
- Ignoring database - Often the real bottleneck
- Complex micro-optimizations - Usually not worth it
- Not testing under load - Production behavior differs
- Ignoring cold starts - First request matters too
- Over-engineering - Simpler is often faster
Quick Reference
PROFILING FLOW:
Measure → Identify → Hypothesize → Fix → Measure → Repeat
COMMON BOTTLENECKS:
N+1 queries → Eager loading
Unbounded data → Pagination
Blocking I/O → Parallelization
Excessive allocation → Object pooling
DATABASE:
Index frequently queried columns
Use EXPLAIN ANALYZE
Add caching layer
CACHING:
Cache-aside for general use
TTL for time-based invalidation
Invalidate on write for consistency
TARGETS:
p50 < 100ms
p95 < 500ms
p99 < 1s
TOOLS:
CPU: pprof, py-spy
Memory: valgrind, memory_profiler
Load: k6, locust
DB: EXPLAIN, slow query log
Related Skills
Attack Tree Construction
Build comprehensive attack trees to visualize threat paths. Use when mapping attack scenarios, identifying defense gaps, or communicating security risks to stakeholders.
Grafana Dashboards
Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.
Matplotlib
Foundational plotting library. Create line plots, scatter, bar, histograms, heatmaps, 3D, subplots, export PNG/PDF/SVG, for scientific visualization and publication figures.
Scientific Visualization
Create publication figures with matplotlib/seaborn/plotly. Multi-panel layouts, error bars, significance markers, colorblind-safe, export PDF/EPS/TIFF, for journal-ready scientific plots.
Seaborn
Statistical visualization. Scatter, box, violin, heatmaps, pair plots, regression, correlation matrices, KDE, faceted plots, for exploratory analysis and publication figures.
Shap
Model interpretability and explainability using SHAP (SHapley Additive exPlanations). Use this skill when explaining machine learning model predictions, computing feature importance, generating SHAP plots (waterfall, beeswarm, bar, scatter, force, heatmap), debugging models, analyzing model bias or fairness, comparing models, or implementing explainable AI. Works with tree-based models (XGBoost, LightGBM, Random Forest), deep learning (TensorFlow, PyTorch), linear models, and any black-box model
Pydeseq2
Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.
Query Writing
For writing and executing SQL queries - from simple single-table queries to complex multi-table JOINs and aggregations
Pydeseq2
Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.
Scientific Visualization
Meta-skill for publication-ready figures. Use when creating journal submission figures requiring multi-panel layouts, significance annotations, error bars, colorblind-safe palettes, and specific journal formatting (Nature, Science, Cell). Orchestrates matplotlib/seaborn/plotly with publication styles. For quick exploration use seaborn or plotly directly.
