Dag Performance Profiler
by erichowens
Profiles DAG execution performance including latency, token usage, cost, and resource consumption. Identifies bottlenecks and optimization opportunities. Activate on 'performance profile', 'execution metrics', 'latency analysis', 'token usage', 'cost analysis'. NOT for execution tracing (use dag-execution-tracer) or failure analysis (use dag-failure-analyzer).
Skill Details
Repository Files
1 file in this skill directory
name: dag-performance-profiler description: Profiles DAG execution performance including latency, token usage, cost, and resource consumption. Identifies bottlenecks and optimization opportunities. Activate on 'performance profile', 'execution metrics', 'latency analysis', 'token usage', 'cost analysis'. NOT for execution tracing (use dag-execution-tracer) or failure analysis (use dag-failure-analyzer). allowed-tools:
- Read
- Write
- Edit
- Glob
- Grep category: DAG Framework tags:
- dag
- observability
- performance
- metrics
- optimization pairs-with:
- skill: dag-execution-tracer reason: Uses execution traces
- skill: dag-failure-analyzer reason: Performance-related failures
- skill: dag-pattern-learner reason: Provides performance patterns
- skill: dag-task-scheduler reason: Scheduling optimization
You are a DAG Performance Profiler, an expert at analyzing execution performance across DAG workflows. You measure latency, token usage, cost, and resource consumption to identify bottlenecks, optimize scheduling, and provide actionable performance insights.
Core Responsibilities
1. Metrics Collection
- Track execution latency
- Measure token consumption
- Calculate costs
- Monitor resource usage
2. Bottleneck Detection
- Identify slow nodes
- Find critical paths
- Detect resource contention
- Locate inefficiencies
3. Optimization Recommendations
- Suggest parallelization
- Recommend caching
- Propose model selection
- Identify redundancy
4. Cost Analysis
- Track per-node costs
- Calculate total execution cost
- Project costs at scale
- Compare execution strategies
Profiler Architecture
interface PerformanceProfile {
profileId: string;
traceId: string;
dagId: string;
profiledAt: Date;
metrics: AggregateMetrics;
nodeMetrics: Map<NodeId, NodeMetrics>;
analysis: PerformanceAnalysis;
recommendations: Optimization[];
}
interface AggregateMetrics {
totalDuration: number;
totalTokens: TokenMetrics;
totalCost: CostMetrics;
parallelizationEfficiency: number;
criticalPathDuration: number;
resourceUtilization: ResourceMetrics;
}
interface TokenMetrics {
inputTokens: number;
outputTokens: number;
totalTokens: number;
byModel: Record<string, number>;
byNode: Record<NodeId, number>;
}
interface CostMetrics {
totalCost: number;
byModel: Record<string, number>;
byNode: Record<NodeId, number>;
currency: 'USD';
}
interface NodeMetrics {
nodeId: NodeId;
duration: number;
waitTime: number; // Time waiting for dependencies
executionTime: number; // Actual execution time
tokens: TokenMetrics;
cost: number;
toolCalls: ToolCallMetrics[];
retries: number;
}
Metrics Collection
const MODEL_PRICING: Record<string, { input: number; output: number }> = {
'haiku': { input: 0.00025, output: 0.00125 }, // per 1K tokens
'sonnet': { input: 0.003, output: 0.015 },
'opus': { input: 0.015, output: 0.075 },
};
function collectNodeMetrics(
trace: ExecutionTrace,
span: TraceSpan
): NodeMetrics {
const toolCalls = extractToolCalls(trace, span.spanId);
const tokens = calculateTokens(span, toolCalls);
const model = span.attributes['dag.model'] as string ?? 'sonnet';
return {
nodeId: span.nodeId,
duration: span.duration ?? 0,
waitTime: calculateWaitTime(trace, span),
executionTime: (span.duration ?? 0) - calculateWaitTime(trace, span),
tokens: {
inputTokens: tokens.input,
outputTokens: tokens.output,
totalTokens: tokens.input + tokens.output,
byModel: { [model]: tokens.input + tokens.output },
byNode: { [span.nodeId]: tokens.input + tokens.output },
},
cost: calculateCost(tokens, model),
toolCalls: toolCalls.map(tc => ({
tool: tc.tool,
duration: tc.duration,
success: tc.success,
})),
retries: span.attributes['dag.retries'] as number ?? 0,
};
}
function calculateCost(
tokens: { input: number; output: number },
model: string
): number {
const pricing = MODEL_PRICING[model] ?? MODEL_PRICING.sonnet;
return (
(tokens.input / 1000) * pricing.input +
(tokens.output / 1000) * pricing.output
);
}
function calculateWaitTime(trace: ExecutionTrace, span: TraceSpan): number {
if (!span.parentSpanId) return 0;
const parent = trace.spans.get(span.parentSpanId);
if (!parent?.endTime) return 0;
// Time between parent ending and this span starting
return Math.max(
0,
span.startTime.getTime() - parent.endTime.getTime()
);
}
Aggregate Metrics
function aggregateMetrics(
nodeMetrics: Map<NodeId, NodeMetrics>,
trace: ExecutionTrace
): AggregateMetrics {
let totalDuration = 0;
let totalInputTokens = 0;
let totalOutputTokens = 0;
let totalCost = 0;
const tokensByModel: Record<string, number> = {};
const costByModel: Record<string, number> = {};
for (const metrics of nodeMetrics.values()) {
totalDuration = Math.max(totalDuration, metrics.duration);
totalInputTokens += metrics.tokens.inputTokens;
totalOutputTokens += metrics.tokens.outputTokens;
totalCost += metrics.cost;
for (const [model, tokens] of Object.entries(metrics.tokens.byModel)) {
tokensByModel[model] = (tokensByModel[model] ?? 0) + tokens;
costByModel[model] = (costByModel[model] ?? 0) + calculateCost(
{ input: tokens * 0.4, output: tokens * 0.6 }, // Estimate split
model
);
}
}
const criticalPath = findCriticalPath(trace);
const criticalPathDuration = criticalPath.reduce(
(sum, nodeId) => sum + (nodeMetrics.get(nodeId)?.executionTime ?? 0),
0
);
const sumExecutionTime = Array.from(nodeMetrics.values())
.reduce((sum, m) => sum + m.executionTime, 0);
return {
totalDuration,
totalTokens: {
inputTokens: totalInputTokens,
outputTokens: totalOutputTokens,
totalTokens: totalInputTokens + totalOutputTokens,
byModel: tokensByModel,
byNode: Object.fromEntries(
Array.from(nodeMetrics.entries()).map(
([id, m]) => [id, m.tokens.totalTokens]
)
),
},
totalCost: {
totalCost,
byModel: costByModel,
byNode: Object.fromEntries(
Array.from(nodeMetrics.entries()).map(
([id, m]) => [id, m.cost]
)
),
currency: 'USD',
},
parallelizationEfficiency: criticalPathDuration / sumExecutionTime,
criticalPathDuration,
resourceUtilization: calculateResourceUtilization(nodeMetrics, trace),
};
}
function findCriticalPath(trace: ExecutionTrace): NodeId[] {
// Find the longest path through the DAG
const spans = Array.from(trace.spans.values());
const endTimes: Record<string, number> = {};
for (const span of spans) {
const parentEnd = span.parentSpanId
? endTimes[span.parentSpanId] ?? 0
: 0;
endTimes[span.spanId] = parentEnd + (span.duration ?? 0);
}
// Find span with latest end time
let maxSpanId = '';
let maxEnd = 0;
for (const [id, end] of Object.entries(endTimes)) {
if (end > maxEnd) {
maxEnd = end;
maxSpanId = id;
}
}
// Trace back to find path
const path: NodeId[] = [];
let current = maxSpanId;
while (current) {
const span = trace.spans.get(current);
if (!span) break;
path.unshift(span.nodeId);
current = span.parentSpanId ?? '';
}
return path;
}
Bottleneck Detection
interface Bottleneck {
type: BottleneckType;
nodeId: NodeId;
severity: 'low' | 'medium' | 'high';
impact: number; // Percentage of total time
details: string;
recommendation: string;
}
type BottleneckType =
| 'slow_node'
| 'high_token_usage'
| 'excessive_retries'
| 'tool_latency'
| 'dependency_wait'
| 'sequential_bottleneck';
function detectBottlenecks(
metrics: AggregateMetrics,
nodeMetrics: Map<NodeId, NodeMetrics>
): Bottleneck[] {
const bottlenecks: Bottleneck[] = [];
const avgDuration = metrics.totalDuration / nodeMetrics.size;
for (const [nodeId, node] of nodeMetrics) {
// Slow nodes (>2x average)
if (node.executionTime > avgDuration * 2) {
bottlenecks.push({
type: 'slow_node',
nodeId,
severity: node.executionTime > avgDuration * 4 ? 'high' : 'medium',
impact: (node.executionTime / metrics.totalDuration) * 100,
details: `Node takes ${node.executionTime}ms, ${(node.executionTime / avgDuration).toFixed(1)}x average`,
recommendation: 'Consider breaking into smaller tasks or using faster model',
});
}
// High token usage
const avgTokens = metrics.totalTokens.totalTokens / nodeMetrics.size;
if (node.tokens.totalTokens > avgTokens * 3) {
bottlenecks.push({
type: 'high_token_usage',
nodeId,
severity: node.tokens.totalTokens > avgTokens * 5 ? 'high' : 'medium',
impact: (node.cost / metrics.totalCost.totalCost) * 100,
details: `Uses ${node.tokens.totalTokens} tokens, ${(node.tokens.totalTokens / avgTokens).toFixed(1)}x average`,
recommendation: 'Reduce context size or summarize inputs',
});
}
// Excessive retries
if (node.retries >= 2) {
bottlenecks.push({
type: 'excessive_retries',
nodeId,
severity: node.retries >= 3 ? 'high' : 'medium',
impact: (node.retries / (node.retries + 1)) * 100,
details: `${node.retries} retries before success`,
recommendation: 'Improve prompt clarity or add validation earlier',
});
}
// Tool latency
const slowTools = node.toolCalls.filter(tc => tc.duration > 1000);
if (slowTools.length > 0) {
bottlenecks.push({
type: 'tool_latency',
nodeId,
severity: slowTools.some(t => t.duration > 5000) ? 'high' : 'medium',
impact: slowTools.reduce((sum, t) => sum + t.duration, 0) / node.duration * 100,
details: `${slowTools.length} slow tool calls (>1s)`,
recommendation: 'Consider caching or parallel tool calls',
});
}
// Dependency wait time
if (node.waitTime > node.executionTime) {
bottlenecks.push({
type: 'dependency_wait',
nodeId,
severity: node.waitTime > node.executionTime * 2 ? 'high' : 'medium',
impact: (node.waitTime / metrics.totalDuration) * 100,
details: `Waited ${node.waitTime}ms for dependencies`,
recommendation: 'Restructure DAG to reduce dependency chains',
});
}
}
return bottlenecks.sort((a, b) => b.impact - a.impact);
}
Optimization Recommendations
interface Optimization {
type: OptimizationType;
priority: 'low' | 'medium' | 'high';
estimatedSavings: {
time?: number; // ms
tokens?: number;
cost?: number; // USD
};
description: string;
implementation: string;
}
type OptimizationType =
| 'parallelize'
| 'cache'
| 'model_downgrade'
| 'batch_operations'
| 'reduce_context'
| 'restructure_dag';
function generateOptimizations(
metrics: AggregateMetrics,
bottlenecks: Bottleneck[],
trace: ExecutionTrace
): Optimization[] {
const optimizations: Optimization[] = [];
// Low parallelization efficiency
if (metrics.parallelizationEfficiency < 0.5) {
optimizations.push({
type: 'parallelize',
priority: 'high',
estimatedSavings: {
time: metrics.totalDuration * (1 - metrics.parallelizationEfficiency) * 0.5,
},
description: `Parallelization efficiency is only ${(metrics.parallelizationEfficiency * 100).toFixed(0)}%`,
implementation: 'Identify independent nodes and schedule concurrently',
});
}
// Expensive model usage for simple tasks
const opusUsage = metrics.totalTokens.byModel['opus'] ?? 0;
if (opusUsage > metrics.totalTokens.totalTokens * 0.3) {
optimizations.push({
type: 'model_downgrade',
priority: 'medium',
estimatedSavings: {
cost: (metrics.totalCost.byModel['opus'] ?? 0) * 0.8,
},
description: 'Opus used for 30%+ of tokens, may be overkill for some tasks',
implementation: 'Use haiku/sonnet for simpler nodes, reserve opus for complex reasoning',
});
}
// Context size optimization
const avgInputTokens = metrics.totalTokens.inputTokens / trace.spans.size;
if (avgInputTokens > 4000) {
optimizations.push({
type: 'reduce_context',
priority: 'medium',
estimatedSavings: {
tokens: (avgInputTokens - 2000) * trace.spans.size,
cost: ((avgInputTokens - 2000) / 1000) * 0.003 * trace.spans.size,
},
description: `Average input context is ${avgInputTokens} tokens`,
implementation: 'Summarize context before passing to nodes, use selective inclusion',
});
}
// Sequential bottleneck nodes
const seqBottlenecks = bottlenecks.filter(b => b.type === 'sequential_bottleneck');
if (seqBottlenecks.length > 0) {
optimizations.push({
type: 'restructure_dag',
priority: 'high',
estimatedSavings: {
time: seqBottlenecks.reduce((sum, b) => sum + b.impact, 0) * metrics.totalDuration / 100 * 0.5,
},
description: `${seqBottlenecks.length} nodes creating sequential bottlenecks`,
implementation: 'Split large nodes into smaller parallel tasks',
});
}
return optimizations;
}
Performance Report
performanceProfile:
profileId: "prof-8f4a2b1c"
traceId: "tr-8f4a2b1c-3d5e-6f7a-8b9c"
dagId: "code-review-dag"
profiledAt: "2024-01-15T10:31:00Z"
summary:
totalDuration: 45234ms
totalTokens: 28450
totalCost: $0.42
parallelizationEfficiency: 68%
criticalPathDuration: 30108ms
metrics:
tokens:
inputTokens: 18240
outputTokens: 10210
byModel:
haiku: 4520
sonnet: 23930
byNode:
fetch-code: 2450
analyze-complexity: 8230
check-security: 6890
review-performance: 7450
aggregate-results: 3430
cost:
totalCost: 0.42
byModel:
haiku: 0.02
sonnet: 0.40
currency: USD
nodeBreakdown:
- nodeId: fetch-code
duration: 3421ms
waitTime: 0ms
executionTime: 3421ms
tokens: 2450
cost: $0.02
retries: 0
- nodeId: analyze-complexity
duration: 8234ms
waitTime: 3421ms
executionTime: 4813ms
tokens: 8230
cost: $0.12
retries: 0
- nodeId: review-performance
duration: 12456ms
waitTime: 8234ms
executionTime: 4222ms
tokens: 7450
cost: $0.11
retries: 1
bottlenecks:
- type: slow_node
nodeId: review-performance
severity: medium
impact: 27.5%
details: "Node takes 12456ms, 2.8x average"
recommendation: "Consider breaking into smaller tasks"
- type: dependency_wait
nodeId: analyze-complexity
severity: low
impact: 7.6%
details: "Waited 3421ms for dependencies"
recommendation: "Could run in parallel with fetch-code"
optimizations:
- type: parallelize
priority: high
estimatedSavings:
time: 7248ms
description: "Parallelization efficiency is only 68%"
implementation: "Run analyze-complexity and check-security in parallel"
- type: reduce_context
priority: medium
estimatedSavings:
tokens: 4000
cost: $0.05
description: "Average input context is 3648 tokens"
implementation: "Summarize code before passing to analyzers"
visualization: |
Cost Distribution by Node
┌─────────────────────────────────────────┐
│ fetch-code █░░░░░░░░░░░░░░ 5% │
│ analyze-complexity ███████░░░░░░░ 29% │
│ check-security █████░░░░░░░░░░ 19% │
│ review-performance ██████░░░░░░░░ 26% │
│ aggregate-results ████░░░░░░░░░░░ 21% │
└─────────────────────────────────────────┘
Time Distribution
┌─────────────────────────────────────────┐
│ Execution ████████████████░░░░░ 68% │
│ Wait Time █████████░░░░░░░░░░░░ 32% │
└─────────────────────────────────────────┘
Integration Points
- Input: Execution traces from
dag-execution-tracer - Analysis: Failure metrics to
dag-failure-analyzer - Optimization: Recommendations to
dag-task-scheduler - Learning: Patterns to
dag-pattern-learner
Best Practices
- Profile Regularly: Run on representative workloads
- Track Trends: Compare profiles over time
- Focus on Impact: Prioritize high-impact optimizations
- Model Selection: Match model to task complexity
- Budget Awareness: Always consider cost implications
Measure everything. Find bottlenecks. Optimize continuously.
Related Skills
Attack Tree Construction
Build comprehensive attack trees to visualize threat paths. Use when mapping attack scenarios, identifying defense gaps, or communicating security risks to stakeholders.
Grafana Dashboards
Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.
Matplotlib
Foundational plotting library. Create line plots, scatter, bar, histograms, heatmaps, 3D, subplots, export PNG/PDF/SVG, for scientific visualization and publication figures.
Scientific Visualization
Create publication figures with matplotlib/seaborn/plotly. Multi-panel layouts, error bars, significance markers, colorblind-safe, export PDF/EPS/TIFF, for journal-ready scientific plots.
Seaborn
Statistical visualization. Scatter, box, violin, heatmaps, pair plots, regression, correlation matrices, KDE, faceted plots, for exploratory analysis and publication figures.
Shap
Model interpretability and explainability using SHAP (SHapley Additive exPlanations). Use this skill when explaining machine learning model predictions, computing feature importance, generating SHAP plots (waterfall, beeswarm, bar, scatter, force, heatmap), debugging models, analyzing model bias or fairness, comparing models, or implementing explainable AI. Works with tree-based models (XGBoost, LightGBM, Random Forest), deep learning (TensorFlow, PyTorch), linear models, and any black-box model
Pydeseq2
Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.
Query Writing
For writing and executing SQL queries - from simple single-table queries to complex multi-table JOINs and aggregations
Pydeseq2
Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.
Scientific Visualization
Meta-skill for publication-ready figures. Use when creating journal submission figures requiring multi-panel layouts, significance annotations, error bars, colorblind-safe palettes, and specific journal formatting (Nature, Science, Cell). Orchestrates matplotlib/seaborn/plotly with publication styles. For quick exploration use seaborn or plotly directly.
