Digdag
by treasure-data
Treasure Workflow (digdag) for TD. Covers .dig syntax, session variables (session_date, session_date_compact), td> operator, _parallel/_retry/_error directives, and tdx wf commands.
Skill Details
Repository Files
1 file in this skill directory
name: digdag description: Treasure Workflow (digdag) for TD. Covers .dig syntax, session variables (session_date, session_date_compact), td> operator, _parallel/_retry/_error directives, and tdx wf commands.
Treasure Workflow (Digdag)
Basic Structure
timezone: Asia/Tokyo
schedule:
daily>: 02:00:00
_export:
td:
database: my_database
engine: presto
+extract:
td>: queries/extract.sql
create_table: raw_data
+transform:
td>: queries/transform.sql
create_table: results
Key points:
.digextension required; filename becomes workflow name- Tasks run sequentially with
+task_name:prefix foo>: baris sugar for_type: fooand_command: bar
Session Variables
| Variable | Example |
|---|---|
${session_time} |
2024-01-30T00:00:00+09:00 |
${session_date} |
2024-01-30 |
${session_date_compact} |
20240130 |
${session_unixtime} |
1706540400 |
${last_session_date} |
Previous scheduled date |
${next_session_date} |
Next scheduled date |
Moment.js available:
+tomorrow:
echo>: ${moment(session_time).add(1, 'days').format("YYYY-MM-DD")}
TD Operator
+query:
td>: queries/analysis.sql
database: analytics
engine: presto
create_table: results # or insert_into: existing_table
Inline SQL:
+inline:
td>:
query: |
SELECT * FROM events
WHERE TD_TIME_RANGE(time, '${session_date}', TD_TIME_ADD('${session_date}', '1d'))
Parallel Execution
+parallel_tasks:
_parallel: true
+task_a:
td>: queries/a.sql
+task_b:
td>: queries/b.sql
+after_parallel:
echo>: "Runs after all parallel tasks"
Limited concurrency:
+limited:
_parallel:
limit: 2
Error Handling
+task:
td>: queries/important.sql
_retry: 3
_error:
+alert:
sh>: python scripts/alert.py "Task failed"
Retry with backoff:
+task:
_retry:
limit: 3
interval: 10
interval_type: exponential # or constant
Variables
_export:
td:
database: production
my_param: value
api_key: ${secret:api_credentials.key} # TD parameter store
+task:
py>: scripts.process.main
param: ${my_param}
Conditional & Loops
+check:
td>: queries/count.sql
store_last_results: true
+if_data:
if>: ${td.last_results.cnt > 0}
_do:
+process:
td>: queries/process.sql
+loop:
for_each>:
region: [US, EU, ASIA]
_do:
+process:
td>: queries/by_region.sql
Event Triggers
# Runs after another workflow succeeds
trigger:
attempt>:
dependent_workflow_name: segment_refresh
dependent_project_name: customer_segments
+activate:
td>: queries/activate.sql
tdx wf Commands
# Project sync (recommended workflow)
tdx wf pull my_project # Pull project to local folder
tdx wf push # Push local changes with diff preview
tdx wf clone --name my_project_prod # Clone to new project name
# Context & discovery
tdx wf use my_project # Set default project for session
tdx wf projects # List all projects
tdx wf workflows # List workflows in project
# Running & monitoring
tdx wf run # Interactive workflow selector
tdx wf run my_project.my_workflow # Run specific workflow
tdx wf sessions # List runs
tdx wf attempt <id> tasks # Show task status
tdx wf attempt <id> logs +task_name # View logs
tdx wf attempt <id> retry # Retry failed
tdx wf attempt <id> kill # Stop running
# Secrets
tdx wf secrets list # List secret keys
tdx wf secrets set KEY=value # Set a secret
tdx wf secrets delete KEY # Delete a secret
# Legacy (digdag-style)
tdx wf upload my_workflow # Push without sync tracking
Project Structure
workflows/
└── my_project/ # Created by tdx wf pull
├── tdx.json # Sync tracking (auto-generated)
├── main.dig # Workflow definition
├── queries/
│ └── analysis.sql
└── scripts/
└── process.py
Schedule Options
schedule:
daily>: 02:00:00
# hourly>: 00:00
# cron>: "0 */4 * * *"
# weekly>: "Mon,00:00:00"
Resources
Related Skills
Dask
Parallel/distributed computing. Scale pandas/NumPy beyond memory, parallel DataFrames/Arrays, multi-file processing, task graphs, for larger-than-RAM datasets and parallel workflows.
Scikit Survival
Comprehensive toolkit for survival analysis and time-to-event modeling in Python using scikit-survival. Use this skill when working with censored survival data, performing time-to-event analysis, fitting Cox models, Random Survival Forests, Gradient Boosting models, or Survival SVMs, evaluating survival predictions with concordance index or Brier score, handling competing risks, or implementing any survival analysis workflow with the scikit-survival library.
Polars
Fast DataFrame library (Apache Arrow). Select, filter, group_by, joins, lazy evaluation, CSV/Parquet I/O, expression API, for high-performance data analysis workflows.
Scikit Survival
Comprehensive toolkit for survival analysis and time-to-event modeling in Python using scikit-survival. Use this skill when working with censored survival data, performing time-to-event analysis, fitting Cox models, Random Survival Forests, Gradient Boosting models, or Survival SVMs, evaluating survival predictions with concordance index or Brier score, handling competing risks, or implementing any survival analysis workflow with the scikit-survival library.
Dask
Distributed computing for larger-than-RAM pandas/NumPy workflows. Use when you need to scale existing pandas/NumPy code beyond memory or across clusters. Best for parallel file processing, distributed ML, integration with existing pandas code. For out-of-core analytics on single machine use vaex; for in-memory speed use polars.
Anndata
Data structure for annotated matrices in single-cell analysis. Use when working with .h5ad files or integrating with the scverse ecosystem. This is the data format skill—for analysis workflows use scanpy; for probabilistic models use scvi-tools; for population-scale queries use cellxgene-census.
Matplotlib
Low-level plotting library for full customization. Use when you need fine-grained control over every plot element, creating novel plot types, or integrating with specific scientific workflows. Export to PNG/PDF/SVG for publication. For quick statistical plots use seaborn; for interactive plots use plotly; for publication-ready multi-panel figures with journal styling, use scientific-visualization.
Dashboard Design
USE THIS SKILL FIRST when user wants to create and design a dashboard, ESPECIALLY Vizro dashboards. This skill enforces a 3-step workflow (requirements, layout, visualization) that must be followed before implementation. For implementation and testing, use the dashboard-build skill after completing Steps 1-3.
Writing Effective Prompts
Structure Claude prompts for clarity and better results using roles, explicit instructions, context, positive framing, and strategic organization. Use when crafting prompts for complex tasks, long documents, tool workflows, or code generation.
Flowchart Creator
Create HTML flowcharts and process diagrams with decision trees, color-coded stages, arrows, and swimlanes. Use when users request flowcharts, process diagrams, workflow visualizations, or decision trees.
