Analyze Performance
by r3bl-org
Establish performance baselines and detect regressions using flamegraph analysis. Use when optimizing performance-critical code, investigating performance issues, or before creating commits with performance-sensitive changes.
Skill Details
Repository Files
2 files in this skill directory
name: analyze-performance description: Establish performance baselines and detect regressions using flamegraph analysis. Use when optimizing performance-critical code, investigating performance issues, or before creating commits with performance-sensitive changes.
Performance Regression Analysis with Flamegraphs
When to Use
- Optimizing performance-critical code
- Detecting performance regressions after changes
- Establishing performance baselines for reference
- Investigating performance issues or slow code paths
- Before creating commits with performance-sensitive changes
- When user says "check performance", "analyze flamegraph", "detect regressions", etc.
Instructions
Follow these steps to analyze performance and detect regressions:
Step 1: Generate Current Flamegraph
Run the automated benchmark script to collect current performance data:
./run.fish run-examples-flamegraph-fold --benchmark
What this does:
- Runs an 8-second continuous workload stress test
- Samples at 999Hz for high precision
- Tests the rendering pipeline with realistic load
- Generates flamegraph data in:
tui/flamegraph-benchmark.perf-folded
Implementation details:
- The benchmark script is in
script-lib.fish - Uses an automated testing script that stress tests the rendering pipeline
- Simulates real-world usage patterns
Step 2: Compare with Baseline
Compare the newly generated flamegraph with the baseline:
Baseline file:
tui/flamegraph-benchmark-baseline.perf-folded
Current file:
tui/flamegraph-benchmark.perf-folded
The baseline file contains:
- Performance snapshot of the "current best" performance state
- Typically saved when performance is optimal
- Committed to git for historical reference
Step 3: Analyze Differences
Compare the two flamegraph files to identify regressions or improvements:
Key metrics to analyze:
-
Hot path changes
- Which functions appear more/less frequently?
- New hot paths that weren't in baseline?
-
Sample count changes
- Increased samples = function taking more time
- Decreased samples = optimization working!
-
Call stack depth changes
- Deeper stacks might indicate unnecessary abstraction
- Shallower stacks might indicate inlining working
-
New allocations or I/O
- Look for memory allocation hot paths
- Unexpected I/O operations
Step 4: Prepare Regression Report
Create a comprehensive report analyzing the performance changes:
Report structure:
# Performance Regression Analysis
## Summary
[Overall performance verdict: regression, improvement, or neutral]
## Hot Path Changes
- Function X: 1500 → 2200 samples (+47%) ⚠️ REGRESSION
- Function Y: 800 → 600 samples (-25%) ✅ IMPROVEMENT
- Function Z: NEW in current (300 samples) 🔍 INVESTIGATE
## Top 5 Most Expensive Functions
### Baseline
1. render_loop: 3500 samples
2. paint_buffer: 2100 samples
3. diff_algorithm: 1800 samples
...
### Current
1. render_loop: 3600 samples (+3%)
2. paint_buffer: 2500 samples (+19%) ⚠️
3. diff_algorithm: 1700 samples (-6%) ✅
...
## Regressions Detected
[List of functions with significant increases]
## Improvements Detected
[List of functions with significant decreases]
## Recommendations
[What should be investigated or optimized]
Step 5: Present to User
Present the regression report to the user with:
- ✅ Clear summary (regression, improvement, or neutral)
- 📊 Key metrics with percentage changes
- ⚠️ Highlighted regressions that need attention
- 🎯 Specific recommendations for optimization
- 📈 Overall performance trend
Optional: Update Baseline
When to update the baseline:
Only update when you've achieved a new "best" performance state:
- After successful optimization work
- All tests pass
- Behavior is correct
- Ready to lock in this performance as the new reference
How to update:
# Replace baseline with current
cp tui/flamegraph-benchmark.perf-folded tui/flamegraph-benchmark-baseline.perf-folded
# Commit the new baseline
git add tui/flamegraph-benchmark-baseline.perf-folded
git commit -m "perf: Update performance baseline after optimization"
See baseline-management.md for detailed guidance on when and how to update baselines.
Understanding Flamegraph Format
The .perf-folded files contain stack traces with sample counts:
main;render_loop;paint_buffer;draw_cell 45
main;render_loop;diff_algorithm;compare 30
Format:
- Semicolon-separated call stack (deepest function last)
- Space + sample count at end
- More samples = more time spent in that stack
Performance Optimization Workflow
1. Make code change
↓
2. Run: ./run.fish run-examples-flamegraph-fold --benchmark
↓
3. Analyze flamegraph vs baseline
↓
4. ┌─ Performance improved?
│ ├─ YES → Update baseline, commit
│ └─ NO → Investigate regressions, optimize
└→ Repeat
Additional Performance Tools
For more granular performance analysis, consider:
cargo bench
Run benchmarks for specific functions:
cargo bench
When to use:
- Micro-benchmarks for specific functions
- Tests marked with
#[bench] - Precise timing measurements
cargo flamegraph
Generate visual flamegraph SVG:
cargo flamegraph
When to use:
- Visual analysis of call stacks
- Identifying hot paths visually
- Sharing performance analysis
Requirements:
flamegraphcrate installed- Profiling symbols enabled
Manual Profiling
For deep investigation:
# Profile with perf
perf record -F 999 --call-graph dwarf ./target/release/app
# Generate flamegraph
perf script | stackcollapse-perf.pl | flamegraph.pl > flame.svg
Common Performance Issues to Look For
When analyzing flamegraphs, watch for:
1. Allocations in Hot Paths
render_loop;Vec::push;alloc::grow 500 samples ⚠️
Problem: Allocating in tight loops Fix: Pre-allocate or use capacity hints
2. Excessive Cloning
process_data;String::clone 300 samples ⚠️
Problem: Unnecessary data copies
Fix: Use references or Cow<str>
3. Deep Call Stacks
a;b;c;d;e;f;g;h;i;j;k;l;m 50 samples ⚠️
Problem: Too much abstraction or recursion Fix: Flatten, inline, or optimize
4. I/O in Critical Paths
render_loop;write;syscall 200 samples ⚠️
Problem: Blocking I/O in rendering Fix: Buffer or defer I/O
Reporting Results
After performance analysis:
- ✅ No regressions → "Performance analysis complete: no regressions detected!"
- ⚠️ Regressions found → Provide detailed report with function names and percentages
- 🎯 Improvements found → Celebrate and document what worked!
- 📊 Mixed results → Explain trade-offs and recommendations
Supporting Files in This Skill
This skill includes additional reference material:
baseline-management.md- Comprehensive guide on when and how to update performance baselines: when to update (after optimization, architectural changes, dependency updates, accepting trade-offs), when NOT to update (regressions, still debugging, experimental code, flaky results), step-by-step update process, baseline update checklist, reading flamegraph differences, example workflows, and common mistakes. Read this when:- Deciding whether to update the baseline → "When to Update" section
- Performance improved and want to lock it in → Update workflow
- Unsure if baseline update is appropriate → Checklist
- Need to understand flamegraph diff signals → "Reading Flamegraph Differences"
- Avoiding common mistakes → "Common Mistakes" section
Related Skills
check-code-quality- Run before performance analysis to ensure correctnesswrite-documentation- Document performance characteristics
Related Commands
/check-regression- Explicitly invokes this skill
Related Agents
perf-checker- Agent that delegates to this skill
Additional Resources
- Flamegraph format:
tui/*.perf-foldedfiles - Benchmark script:
script-lib.fish - Visual flamegraphs: Use
flamegraph.plto generate SVGs
Related Skills
Mermaid Diagrams
Comprehensive guide for creating software diagrams using Mermaid syntax. Use when users need to create, visualize, or document software through diagrams including class diagrams (domain modeling, object-oriented design), sequence diagrams (application flows, API interactions, code execution), flowcharts (processes, algorithms, user journeys), entity relationship diagrams (database schemas), C4 architecture diagrams (system context, containers, components), state diagrams, git graphs, pie charts,
Matlab
MATLAB and GNU Octave numerical computing for matrix operations, data analysis, visualization, and scientific computing. Use when writing MATLAB/Octave scripts for linear algebra, signal processing, image processing, differential equations, optimization, statistics, or creating scientific visualizations. Also use when the user needs help with MATLAB syntax, functions, or wants to convert between MATLAB and Python code. Scripts can be executed with MATLAB or the open-source GNU Octave interpreter
Dask
Distributed computing for larger-than-RAM pandas/NumPy workflows. Use when you need to scale existing pandas/NumPy code beyond memory or across clusters. Best for parallel file processing, distributed ML, integration with existing pandas code. For out-of-core analytics on single machine use vaex; for in-memory speed use polars.
Consult Zai
Compare z.ai GLM 4.7 and code-searcher responses for comprehensive dual-AI code analysis. Use when you need multiple AI perspectives on code questions.
Writing Effective Prompts
Structure Claude prompts for clarity and better results using roles, explicit instructions, context, positive framing, and strategic organization. Use when crafting prompts for complex tasks, long documents, tool workflows, or code generation.
Flowchart Creator
Create HTML flowcharts and process diagrams with decision trees, color-coded stages, arrows, and swimlanes. Use when users request flowcharts, process diagrams, workflow visualizations, or decision trees.
Bio Reporting Rmarkdown Reports
Create reproducible bioinformatics analysis reports with R Markdown including code, results, and visualizations in HTML, PDF, or Word format. Use when generating analysis reports with RMarkdown.
Desmos Graphing
Create interactive Desmos graphs in Obsidian using desmos-graph code blocks. Use when visualizing functions, parametric curves, inequalities, or mathematical relationships with customizable styling and settings.
Performance
Rendimiento & Optimización - Atoll Tourisme. Use when optimizing performance or profiling code.
Performance
Performance & Optimierung - Atoll Tourisme. Use when optimizing performance or profiling code.
