Workflow Management

by treasure-data

workflowdata

TD workflow debugging and operations. Covers tdx wf commands for monitoring (sessions, attempt, logs), retry/backfill patterns, alerting (_error with Slack/email), and data quality checks.

Skill Details

Repository Files

1 file in this skill directory


name: workflow-management description: TD workflow debugging and operations. Covers tdx wf commands for monitoring (sessions, attempt, logs), retry/backfill patterns, alerting (_error with Slack/email), and data quality checks.

TD Workflow Management

Setup & Context

tdx wf use my_project                # Set default project for session
tdx wf pull my_project               # Pull project locally for editing
tdx wf push                          # Push changes with diff preview

Monitoring Commands

tdx wf sessions                      # List runs (uses session context)
tdx wf sessions --status error       # Filter by status
tdx wf attempt <id> tasks            # Show task status
tdx wf attempt <id> logs +task_name  # View logs

Debugging Steps

  1. Check error in tdx wf attempt <id> logs +failed_task
  2. Verify query syntax if td> failed
  3. Check time ranges - does data exist for session_date?
  4. Validate parameter values
  5. Check resource limits (memory, timeout)

Retry Operations

tdx wf attempt <id> retry                          # Retry from start
tdx wf attempt <id> retry --resume-from +step     # Retry from task
tdx wf attempt <id> retry --params '{"key":"val"}' # Override params
tdx wf attempt <id> kill                           # Stop running

Alerting

+critical_task:
  td>: queries/important.sql

  _error:
    +slack_alert:
      sh>: |
        curl -X POST ${secret:slack.webhook_url} \
        -H 'Content-Type: application/json' \
        -d '{"text": "Workflow failed: ${session_id}"}'

Data Quality Checks

+process:
  td>: queries/process.sql
  create_table: results

+validate:
  td>:
    query: |
      SELECT COUNT(*) as cnt,
             SUM(CASE WHEN id IS NULL THEN 1 ELSE 0 END) as nulls
      FROM results
  store_last_results: true

+check:
  if>: ${td.last_results.cnt == 0}
  _do:
    +fail:
      sh>: exit 1

Wait for Data

+wait_for_data:
  sh>: |
    for i in {1..30}; do
      COUNT=$(tdx query -d analytics "SELECT COUNT(*) FROM src WHERE date='${session_date}'" --format csv | tail -1)
      if [ "$COUNT" -gt 0 ]; then exit 0; fi
      sleep 60
    done
    exit 1

Idempotent Operations

+safe_insert:
  td>:
    query: |
      DELETE FROM target WHERE date = '${session_date}';
      INSERT INTO target SELECT * FROM source WHERE date = '${session_date}'

Backfill Pattern

+backfill:
  loop>:
    dates: ["2024-01-01", "2024-01-02", "2024-01-03"]
  _do:
    +process:
      call>: main_workflow.dig
      params:
        session_date: ${dates}

Secrets Management

tdx wf secrets list                  # List secret keys (values hidden)
tdx wf secrets set API_KEY=xxx       # Set a secret
tdx wf secrets delete API_KEY        # Delete a secret

Usage in .dig files:

+task:
  sh>: curl -H "Authorization: ${secret:API_KEY}" https://api.example.com

Common Issues

Issue Solution
Timeout Add timeout: 3600s, _retry: 2
Intermittent failures Add _retry: 5 with exponential backoff
Out of memory Reduce data volume, use approx functions
Duplicate runs Use idempotent DELETE+INSERT pattern

Related Skills

Xlsx

Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas

data

Clickhouse Io

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

datacli

Clickhouse Io

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

datacli

Analyzing Financial Statements

This skill calculates key financial ratios and metrics from financial statement data for investment analysis

data

Data Storytelling

Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.

data

Kpi Dashboard Design

Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.

designdata

Dbt Transformation Patterns

Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.

testingdocumenttool

Sql Optimization Patterns

Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.

designdata

Anndata

This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.

arttooldata

Xlsx

Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.

tooldata

Skill Information

Category:Data
Last Updated:12/27/2025