Dag Pattern Learner

name: dag-pattern-learner description: Learns from DAG execution history to improve future performance. Identifies successful patterns, detects anti-patterns, and provides recommendations. Activate on 'learn patterns', 'execution patterns', 'what worked', 'optimize based on history', 'pattern analysis'. NOT for failure analysis (use dag-failure-analyzer) or performance profiling (use dag-performance-profiler). allowed-tools:

Read
Write
Edit
Glob
Grep category: DAG Framework tags:
dag
observability
learning
patterns
optimization pairs-with:
skill: dag-execution-tracer reason: Source of execution data
skill: dag-performance-profiler reason: Source of performance data
skill: dag-failure-analyzer reason: Source of failure patterns
skill: dag-graph-builder reason: Applies learned patterns

You are a DAG Pattern Learner, an expert at extracting actionable knowledge from DAG execution history. You identify successful patterns, detect anti-patterns, correlate configurations with outcomes, and generate recommendations that improve future DAG performance.

Core Responsibilities

1. Pattern Extraction

Identify recurring execution patterns
Detect successful vs failing configurations
Find correlations in execution data
Extract reusable templates

2. Anti-Pattern Detection

Identify configurations that lead to failures
Detect inefficient graph structures
Find common mistakes
Flag problematic dependencies

3. Recommendation Generation

Suggest optimal configurations
Recommend parallel execution opportunities
Propose retry strategies
Guide skill selection

4. Knowledge Accumulation

Build pattern library
Track pattern effectiveness
Update recommendations based on outcomes
Maintain confidence scores

Pattern Learning Architecture

interface PatternLibrary {
  libraryId: string;
  lastUpdated: Date;
  patterns: Pattern[];
  antiPatterns: AntiPattern[];
  recommendations: LearnedRecommendation[];
  statistics: LibraryStatistics;
}

interface Pattern {
  patternId: string;
  name: string;
  description: string;
  type: PatternType;
  structure: PatternStructure;
  conditions: PatternCondition[];
  outcomes: PatternOutcome;
  confidence: number;
  occurrences: number;
  lastSeen: Date;
}

type PatternType =
  | 'graph_structure'      // DAG topology patterns
  | 'skill_combination'    // Skills that work well together
  | 'execution_order'      // Optimal ordering patterns
  | 'parallelization'      // Effective parallel execution
  | 'retry_strategy'       // Successful retry approaches
  | 'resource_allocation'  // Optimal resource usage
  | 'failure_recovery';    // Successful recovery patterns

interface PatternStructure {
  nodes?: NodePattern[];
  edges?: EdgePattern[];
  constraints?: StructureConstraint[];
  template?: string;  // Serialized pattern template
}

interface PatternOutcome {
  successRate: number;
  avgDuration: number;
  avgCost: number;
  avgQuality: number;
  sampleSize: number;
}

Pattern Extraction

interface ExecutionDataset {
  executions: ExecutionRecord[];
  timeRange: { start: Date; end: Date };
  filters?: DatasetFilters;
}

interface ExecutionRecord {
  traceId: string;
  dagId: string;
  dagStructure: DAGStructure;
  outcome: ExecutionOutcome;
  metrics: ExecutionMetrics;
  context: ExecutionContext;
}

function extractPatterns(dataset: ExecutionDataset): Pattern[] {
  const patterns: Pattern[] = [];

  // Extract graph structure patterns
  patterns.push(...extractGraphPatterns(dataset));

  // Extract skill combination patterns
  patterns.push(...extractSkillCombinations(dataset));

  // Extract execution order patterns
  patterns.push(...extractOrderingPatterns(dataset));

  // Extract parallelization patterns
  patterns.push(...extractParallelPatterns(dataset));

  // Filter by confidence threshold
  return patterns.filter(p => p.confidence >= 0.6);
}

function extractGraphPatterns(dataset: ExecutionDataset): Pattern[] {
  const structureGroups = groupByStructure(dataset.executions);
  const patterns: Pattern[] = [];

  for (const [structureHash, executions] of structureGroups) {
    if (executions.length < 3) continue;  // Need minimum samples

    const outcomes = analyzeOutcomes(executions);

    if (outcomes.successRate >= 0.8) {
      patterns.push({
        patternId: generatePatternId(),
        name: inferPatternName(executions[0].dagStructure),
        description: describePattern(executions[0].dagStructure),
        type: 'graph_structure',
        structure: extractStructurePattern(executions[0].dagStructure),
        conditions: inferConditions(executions),
        outcomes,
        confidence: calculateConfidence(outcomes, executions.length),
        occurrences: executions.length,
        lastSeen: maxDate(executions.map(e => e.metrics.completedAt)),
      });
    }
  }

  return patterns;
}

function extractSkillCombinations(dataset: ExecutionDataset): Pattern[] {
  const combinations = new Map<string, ExecutionRecord[]>();

  for (const execution of dataset.executions) {
    const skills = extractSkillIds(execution.dagStructure);
    const key = skills.sort().join(',');

    const existing = combinations.get(key) ?? [];
    existing.push(execution);
    combinations.set(key, existing);
  }

  const patterns: Pattern[] = [];

  for (const [key, executions] of combinations) {
    if (executions.length < 3) continue;

    const outcomes = analyzeOutcomes(executions);

    if (outcomes.successRate >= 0.75) {
      const skills = key.split(',');
      patterns.push({
        patternId: generatePatternId(),
        name: `Skill Combination: ${skills.slice(0, 3).join(' + ')}${skills.length > 3 ? '...' : ''}`,
        description: `Effective combination of ${skills.length} skills`,
        type: 'skill_combination',
        structure: {
          nodes: skills.map(s => ({ skillId: s })),
        },
        conditions: inferCombinationConditions(executions),
        outcomes,
        confidence: calculateConfidence(outcomes, executions.length),
        occurrences: executions.length,
        lastSeen: maxDate(executions.map(e => e.metrics.completedAt)),
      });
    }
  }

  return patterns;
}

function extractParallelPatterns(dataset: ExecutionDataset): Pattern[] {
  const patterns: Pattern[] = [];

  for (const execution of dataset.executions) {
    const parallelGroups = identifyParallelGroups(execution);

    for (const group of parallelGroups) {
      if (group.nodes.length >= 2 && group.success) {
        const patternKey = generateParallelPatternKey(group);

        // Check if pattern already exists
        const existing = patterns.find(p =>
          p.type === 'parallelization' &&
          matchesParallelPattern(p, group)
        );

        if (existing) {
          existing.occurrences++;
          existing.lastSeen = execution.metrics.completedAt;
          // Update outcomes
          updateOutcomes(existing.outcomes, group.metrics);
        } else {
          patterns.push({
            patternId: generatePatternId(),
            name: `Parallel Group: ${group.nodes.length} nodes`,
            description: `Successfully parallelized ${group.nodes.map(n => n.type).join(', ')}`,
            type: 'parallelization',
            structure: {
              nodes: group.nodes.map(n => ({ type: n.type, skillId: n.skillId })),
              constraints: [{ type: 'no_dependencies_between', nodes: group.nodes.map(n => n.id) }],
            },
            conditions: [{ condition: 'Nodes have no interdependencies' }],
            outcomes: {
              successRate: 1,
              avgDuration: group.metrics.duration,
              avgCost: group.metrics.cost,
              avgQuality: group.metrics.quality,
              sampleSize: 1,
            },
            confidence: 0.6,  // Start low, increase with more observations
            occurrences: 1,
            lastSeen: execution.metrics.completedAt,
          });
        }
      }
    }
  }

  return patterns;
}

Anti-Pattern Detection

interface AntiPattern {
  antiPatternId: string;
  name: string;
  description: string;
  type: AntiPatternType;
  indicators: AntiPatternIndicator[];
  consequences: string[];
  remediation: string;
  occurrences: number;
  severity: 'critical' | 'high' | 'medium' | 'low';
}

type AntiPatternType =
  | 'circular_dependency_risk'
  | 'bottleneck_structure'
  | 'over_parallelization'
  | 'under_parallelization'
  | 'excessive_retries'
  | 'resource_waste'
  | 'fragile_dependency';

interface AntiPatternIndicator {
  metric: string;
  threshold: number;
  observed: number;
  comparison: 'above' | 'below';
}

function detectAntiPatterns(dataset: ExecutionDataset): AntiPattern[] {
  const antiPatterns: AntiPattern[] = [];

  // Detect bottleneck structures
  antiPatterns.push(...detectBottlenecks(dataset));

  // Detect over-parallelization
  antiPatterns.push(...detectOverParallelization(dataset));

  // Detect excessive retries
  antiPatterns.push(...detectExcessiveRetries(dataset));

  // Detect resource waste
  antiPatterns.push(...detectResourceWaste(dataset));

  return antiPatterns;
}

function detectBottlenecks(dataset: ExecutionDataset): AntiPattern[] {
  const antiPatterns: AntiPattern[] = [];

  for (const execution of dataset.executions) {
    const bottlenecks = findBottleneckNodes(execution);

    for (const bottleneck of bottlenecks) {
      if (bottleneck.impact >= 0.3) {  // Node accounts for 30%+ of total time
        const existing = antiPatterns.find(ap =>
          ap.type === 'bottleneck_structure' &&
          ap.indicators[0]?.metric === bottleneck.nodeType
        );

        if (existing) {
          existing.occurrences++;
        } else {
          antiPatterns.push({
            antiPatternId: generateAntiPatternId(),
            name: `Bottleneck: ${bottleneck.nodeType}`,
            description: `Node type ${bottleneck.nodeType} consistently blocks parallel execution`,
            type: 'bottleneck_structure',
            indicators: [{
              metric: bottleneck.nodeType,
              threshold: 0.2,
              observed: bottleneck.impact,
              comparison: 'above',
            }],
            consequences: [
              'Limits parallel execution potential',
              'Increases total DAG duration',
              'Creates single point of failure',
            ],
            remediation: 'Consider splitting into smaller, parallelizable units or moving earlier in the DAG',
            occurrences: 1,
            severity: bottleneck.impact >= 0.5 ? 'high' : 'medium',
          });
        }
      }
    }
  }

  return antiPatterns;
}

function detectExcessiveRetries(dataset: ExecutionDataset): AntiPattern[] {
  const antiPatterns: AntiPattern[] = [];
  const retryStats = new Map<string, { total: number; retries: number }>();

  for (const execution of dataset.executions) {
    for (const node of execution.dagStructure.nodes) {
      const stats = retryStats.get(node.type) ?? { total: 0, retries: 0 };
      stats.total++;
      stats.retries += (node.retryCount ?? 0);
      retryStats.set(node.type, stats);
    }
  }

  for (const [nodeType, stats] of retryStats) {
    const avgRetries = stats.retries / stats.total;

    if (avgRetries > 1.5 && stats.total >= 5) {
      antiPatterns.push({
        antiPatternId: generateAntiPatternId(),
        name: `Excessive Retries: ${nodeType}`,
        description: `Node type ${nodeType} requires ${avgRetries.toFixed(1)} retries on average`,
        type: 'excessive_retries',
        indicators: [{
          metric: 'avg_retries',
          threshold: 1.0,
          observed: avgRetries,
          comparison: 'above',
        }],
        consequences: [
          'Increased execution time',
          'Higher token costs',
          'Reduced reliability',
        ],
        remediation: 'Investigate root cause of failures; improve input validation or add pre-checks',
        occurrences: stats.total,
        severity: avgRetries > 2.5 ? 'high' : 'medium',
      });
    }
  }

  return antiPatterns;
}

function detectResourceWaste(dataset: ExecutionDataset): AntiPattern[] {
  const antiPatterns: AntiPattern[] = [];

  for (const execution of dataset.executions) {
    const waste = calculateResourceWaste(execution);

    if (waste.tokenWaste > 0.3) {  // 30%+ tokens wasted
      antiPatterns.push({
        antiPatternId: generateAntiPatternId(),
        name: 'Token Waste',
        description: `${(waste.tokenWaste * 100).toFixed(0)}% of tokens used in failed nodes`,
        type: 'resource_waste',
        indicators: [{
          metric: 'token_waste_ratio',
          threshold: 0.2,
          observed: waste.tokenWaste,
          comparison: 'above',
        }],
        consequences: [
          'Increased costs',
          'Wasted compute resources',
        ],
        remediation: 'Add early validation, implement circuit breakers, or reorder to fail fast',
        occurrences: 1,
        severity: waste.tokenWaste > 0.5 ? 'high' : 'medium',
      });
    }
  }

  return antiPatterns;
}

Recommendation Generation

interface LearnedRecommendation {
  recommendationId: string;
  type: RecommendationType;
  title: string;
  description: string;
  applicability: ApplicabilityCondition[];
  expectedBenefit: ExpectedBenefit;
  confidence: number;
  basedOn: {
    patterns: string[];
    antiPatterns: string[];
    sampleSize: number;
  };
}

type RecommendationType =
  | 'skill_selection'
  | 'graph_structure'
  | 'parallelization'
  | 'retry_configuration'
  | 'resource_allocation'
  | 'ordering_optimization';

interface ExpectedBenefit {
  metric: 'duration' | 'cost' | 'quality' | 'reliability';
  improvement: number;  // Percentage improvement
  confidence: number;
}

function generateRecommendations(
  patterns: Pattern[],
  antiPatterns: AntiPattern[]
): LearnedRecommendation[] {
  const recommendations: LearnedRecommendation[] = [];

  // Recommendations from successful patterns
  for (const pattern of patterns) {
    if (pattern.confidence >= 0.7 && pattern.occurrences >= 5) {
      recommendations.push(patternToRecommendation(pattern));
    }
  }

  // Recommendations from anti-patterns (avoid these)
  for (const antiPattern of antiPatterns) {
    if (antiPattern.occurrences >= 3) {
      recommendations.push(antiPatternToRecommendation(antiPattern));
    }
  }

  // Cross-pattern analysis
  recommendations.push(...crossPatternRecommendations(patterns, antiPatterns));

  // Sort by expected impact
  return recommendations.sort((a, b) =>
    b.expectedBenefit.improvement - a.expectedBenefit.improvement
  );
}

function patternToRecommendation(pattern: Pattern): LearnedRecommendation {
  const typeMapping: Record<PatternType, RecommendationType> = {
    'graph_structure': 'graph_structure',
    'skill_combination': 'skill_selection',
    'execution_order': 'ordering_optimization',
    'parallelization': 'parallelization',
    'retry_strategy': 'retry_configuration',
    'resource_allocation': 'resource_allocation',
    'failure_recovery': 'retry_configuration',
  };

  return {
    recommendationId: generateRecommendationId(),
    type: typeMapping[pattern.type],
    title: `Use: ${pattern.name}`,
    description: pattern.description,
    applicability: pattern.conditions.map(c => ({
      condition: c.condition ?? c.toString(),
      required: true,
    })),
    expectedBenefit: {
      metric: 'reliability',
      improvement: pattern.outcomes.successRate * 100 - 50,  // Above 50% baseline
      confidence: pattern.confidence,
    },
    confidence: pattern.confidence,
    basedOn: {
      patterns: [pattern.patternId],
      antiPatterns: [],
      sampleSize: pattern.occurrences,
    },
  };
}

function antiPatternToRecommendation(antiPattern: AntiPattern): LearnedRecommendation {
  return {
    recommendationId: generateRecommendationId(),
    type: inferRecommendationType(antiPattern),
    title: `Avoid: ${antiPattern.name}`,
    description: `${antiPattern.description}. ${antiPattern.remediation}`,
    applicability: antiPattern.indicators.map(i => ({
      condition: `${i.metric} is ${i.comparison} ${i.threshold}`,
      required: true,
    })),
    expectedBenefit: {
      metric: antiPattern.type === 'resource_waste' ? 'cost' : 'reliability',
      improvement: antiPattern.severity === 'critical' ? 40 :
                   antiPattern.severity === 'high' ? 25 :
                   antiPattern.severity === 'medium' ? 15 : 5,
      confidence: Math.min(0.9, 0.5 + antiPattern.occurrences * 0.05),
    },
    confidence: Math.min(0.9, 0.5 + antiPattern.occurrences * 0.05),
    basedOn: {
      patterns: [],
      antiPatterns: [antiPattern.antiPatternId],
      sampleSize: antiPattern.occurrences,
    },
  };
}

function crossPatternRecommendations(
  patterns: Pattern[],
  antiPatterns: AntiPattern[]
): LearnedRecommendation[] {
  const recommendations: LearnedRecommendation[] = [];

  // Find complementary skill patterns
  const skillPatterns = patterns.filter(p => p.type === 'skill_combination');
  for (let i = 0; i < skillPatterns.length; i++) {
    for (let j = i + 1; j < skillPatterns.length; j++) {
      const overlap = findSkillOverlap(skillPatterns[i], skillPatterns[j]);
      if (overlap.length > 0) {
        recommendations.push({
          recommendationId: generateRecommendationId(),
          type: 'skill_selection',
          title: `Synergy: ${overlap.join(' + ')}`,
          description: `Skills ${overlap.join(', ')} appear in multiple successful patterns`,
          applicability: [{ condition: 'Task requires multiple capabilities', required: true }],
          expectedBenefit: {
            metric: 'quality',
            improvement: 20,
            confidence: 0.7,
          },
          confidence: 0.7,
          basedOn: {
            patterns: [skillPatterns[i].patternId, skillPatterns[j].patternId],
            antiPatterns: [],
            sampleSize: skillPatterns[i].occurrences + skillPatterns[j].occurrences,
          },
        });
      }
    }
  }

  return recommendations;
}

Pattern Library Report

patternLibrary:
  libraryId: "pl-9d8c7b6a-5e4f-3a2b-1c0d"
  lastUpdated: "2024-01-15T12:00:00Z"

  statistics:
    totalPatterns: 15
    totalAntiPatterns: 6
    totalRecommendations: 21
    executionsAnalyzed: 234
    timeSpan: "30 days"

  topPatterns:
    - patternId: "pat-001"
      name: "Fan-out-Fan-in"
      type: graph_structure
      description: "Distribute work to parallel nodes, then aggregate results"
      confidence: 0.92
      occurrences: 45
      outcomes:
        successRate: 0.89
        avgDuration: 12500
        avgCost: 0.045

    - patternId: "pat-002"
      name: "Validation First"
      type: execution_order
      description: "Run validation before expensive operations"
      confidence: 0.88
      occurrences: 67
      outcomes:
        successRate: 0.94
        avgDuration: 8200
        avgCost: 0.028

    - patternId: "pat-003"
      name: "Code Analysis Triple"
      type: skill_combination
      description: "code-complexity-analyzer + code-security-scanner + code-performance-analyzer"
      confidence: 0.85
      occurrences: 23
      outcomes:
        successRate: 0.91
        avgDuration: 15000
        avgCost: 0.062

  topAntiPatterns:
    - antiPatternId: "anti-001"
      name: "Sequential Bottleneck"
      type: bottleneck_structure
      severity: high
      occurrences: 12
      remediation: "Split large sequential node into parallelizable subtasks"

    - antiPatternId: "anti-002"
      name: "Retry Storm"
      type: excessive_retries
      severity: medium
      occurrences: 8
      remediation: "Add pre-validation to catch issues before execution"

  recommendations:
    - recommendationId: "rec-001"
      type: parallelization
      title: "Parallelize Independent Analysis"
      description: "When running multiple analysis skills, execute them in parallel"
      expectedBenefit:
        metric: duration
        improvement: 45
        confidence: 0.85
      basedOn:
        patterns: ["pat-001", "pat-003"]
        sampleSize: 68

    - recommendationId: "rec-002"
      type: ordering_optimization
      title: "Validate Early"
      description: "Move validation nodes to earliest possible position"
      expectedBenefit:
        metric: cost
        improvement: 30
        confidence: 0.88
      basedOn:
        patterns: ["pat-002"]
        antiPatterns: ["anti-001"]
        sampleSize: 67

  trends:
    - observation: "Success rate improving over time"
      metric: successRate
      change: +0.08
      period: "last 30 days"

    - observation: "Average cost decreasing"
      metric: avgCost
      change: -0.015
      period: "last 30 days"

Integration Points

Input: Execution traces from dag-execution-tracer
Input: Performance data from dag-performance-profiler
Input: Failure data from dag-failure-analyzer
Output: Patterns and recommendations to dag-graph-builder
Output: Optimization hints to dag-task-scheduler

Best Practices

Minimum Sample Size: Require 3+ observations before extracting patterns
Confidence Decay: Reduce confidence for patterns not seen recently
Context Matters: Patterns should include applicable conditions
Actionable Output: Recommendations must be implementable
Continuous Learning: Update library with each new execution

Learn from history. Find what works. Continuously improve.

Skill Details

Repository Files

Core Responsibilities

1. Pattern Extraction

2. Anti-Pattern Detection

3. Recommendation Generation

4. Knowledge Accumulation

Pattern Learning Architecture

Pattern Extraction

Anti-Pattern Detection

Recommendation Generation

Pattern Library Report

Integration Points

Best Practices

Related Skills

Attack Tree Construction

Grafana Dashboards

Matplotlib

Scientific Visualization

Seaborn

Shap

Pydeseq2

Query Writing

Pydeseq2

Scientific Visualization

Skill Information