Dbt

by treasure-data

workflowapi

dbt with TD Trino. Covers profiles.yml setup (method:none, user:TD_API_KEY), required override macros (no CREATE VIEW), TD_INTERVAL in models, and TD Workflow deployment.

Skill Details

Repository Files

3 files in this skill directory


name: dbt description: dbt with TD Trino. Covers profiles.yml setup (method:none, user:TD_API_KEY), required override macros (no CREATE VIEW), TD_INTERVAL in models, and TD Workflow deployment.

dbt with Treasure Data Trino

Installation

uv venv && source .venv/bin/activate
uv pip install dbt-core dbt-trino==1.9.3

profiles.yml

td:
  target: dev
  outputs:
    dev:
      type: trino
      method: none                          # Not 'ldap'
      user: "{{ env_var('TD_API_KEY') }}"
      password: dummy                       # Not used
      host: api-presto.treasuredata.com
      port: 443
      database: td                          # Always 'td'
      schema: your_dev_database             # Your TD database name
      threads: 4
      http_scheme: https
      session_properties:
        query_max_run_time: 1h

Key TD settings:

  • method: none for API key auth
  • database: td (always)
  • schema: your_td_database (what you see in TD Console)

Required Override Macros

TD doesn't support CREATE VIEW. Create macros/override_dbt_trino.sql:

{% macro trino__create_view_as(relation, sql) -%}
  create or replace table {{ relation }} as (
    {{ sql }}
  );
{%- endmacro %}

{% macro trino__list_relations_without_caching(schema_relation) %}
  {% call statement('list_relations_without_caching', fetch_result=True) %}
    select
      table_catalog as "database",
      table_schema as "schema",
      table_name as "name",
      table_type as "type"
    from {{ schema_relation }}.information_schema.tables
    where table_schema = '{{ schema_relation.schema }}'
  {% endcall %}
  {{ return(load_result('list_relations_without_caching').table) }}
{% endmacro %}

dbt_project.yml

name: 'my_td_project'
version: '1.0.0'
config-version: 2
profile: 'td'

flags:
  require_certificate_validation: true

vars:
  target_range: '-3M/now'

models:
  my_td_project:
    +materialized: table
    +on_schema_change: "append_new_columns"

Model Patterns

Basic model:

{{
  config(materialized='table')
}}

SELECT
  TD_TIME_STRING(time, 'd!', 'JST') as date,
  COUNT(*) as event_count
FROM {{ source('raw', 'events') }}
WHERE TD_INTERVAL(time, '{{ var("target_range", "-7d") }}', 'JST')
GROUP BY 1

Incremental model:

{{
  config(
    materialized='incremental',
    unique_key='event_id'
  )
}}

SELECT *
FROM {{ source('raw', 'events') }}
WHERE TD_INTERVAL(time, '{{ var("target_range", "-1d") }}', 'JST')
{% if is_incremental() %}
  AND time > (SELECT MAX(time) FROM {{ this }})
{% endif %}

Commands

dbt debug                                    # Test connection
dbt run                                      # Run all
dbt run --select daily_events                # Run specific
dbt run --vars '{"target_range": "-1d"}'     # Override variable
dbt run --full-refresh                       # Rebuild incremental
dbt test                                     # Run tests

TD Workflow Deployment

# dbt_workflow.dig
timezone: Asia/Tokyo

schedule:
  daily>: 03:00:00

_export:
  docker:
    image: "treasuredata/customscript-python:3.12.11-td1"
  _env:
    TD_API_KEY: ${secret:td.apikey}

+setup:
  py>: tasks.InstallPackages

+dbt_run:
  py>: dbt_wrapper.run_dbt
  command_args: ['run', '--target', 'prod']

tasks.py:

def InstallPackages():
    import subprocess, sys
    subprocess.check_call([sys.executable, '-m', 'pip', 'install', 'dbt-core==1.10.9', 'dbt-trino==1.9.3'])

Common Errors

Error Fix
connector does not support creating views Add override macro above
Table ownership information not available Add override macro for list_relations
Var 'target_range' is undefined Add default: {{ var('target_range', '-1d') }}

Resources

Related Skills

Dask

Parallel/distributed computing. Scale pandas/NumPy beyond memory, parallel DataFrames/Arrays, multi-file processing, task graphs, for larger-than-RAM datasets and parallel workflows.

workflowdata

Reactome Database

Query Reactome REST API for pathway analysis, enrichment, gene-pathway mapping, disease pathways, molecular interactions, expression analysis, for systems biology studies.

apidata

Scikit Survival

Comprehensive toolkit for survival analysis and time-to-event modeling in Python using scikit-survival. Use this skill when working with censored survival data, performing time-to-event analysis, fitting Cox models, Random Survival Forests, Gradient Boosting models, or Survival SVMs, evaluating survival predictions with concordance index or Brier score, handling competing risks, or implementing any survival analysis workflow with the scikit-survival library.

workflowtooldata

Mermaid Diagrams

Comprehensive guide for creating software diagrams using Mermaid syntax. Use when users need to create, visualize, or document software through diagrams including class diagrams (domain modeling, object-oriented design), sequence diagrams (application flows, API interactions, code execution), flowcharts (processes, algorithms, user journeys), entity relationship diagrams (database schemas), C4 architecture diagrams (system context, containers, components), state diagrams, git graphs, pie charts,

artdesigncode

Polars

Fast DataFrame library (Apache Arrow). Select, filter, group_by, joins, lazy evaluation, CSV/Parquet I/O, expression API, for high-performance data analysis workflows.

workflowapidata

Reactome Database

Query Reactome REST API for pathway analysis, enrichment, gene-pathway mapping, disease pathways, molecular interactions, expression analysis, for systems biology studies.

apidata

Scikit Survival

Comprehensive toolkit for survival analysis and time-to-event modeling in Python using scikit-survival. Use this skill when working with censored survival data, performing time-to-event analysis, fitting Cox models, Random Survival Forests, Gradient Boosting models, or Survival SVMs, evaluating survival predictions with concordance index or Brier score, handling competing risks, or implementing any survival analysis workflow with the scikit-survival library.

workflowtooldata

Dask

Distributed computing for larger-than-RAM pandas/NumPy workflows. Use when you need to scale existing pandas/NumPy code beyond memory or across clusters. Best for parallel file processing, distributed ML, integration with existing pandas code. For out-of-core analytics on single machine use vaex; for in-memory speed use polars.

codeworkflow

Anndata

Data structure for annotated matrices in single-cell analysis. Use when working with .h5ad files or integrating with the scverse ecosystem. This is the data format skill—for analysis workflows use scanpy; for probabilistic models use scvi-tools; for population-scale queries use cellxgene-census.

workflowtooldata

Matplotlib

Low-level plotting library for full customization. Use when you need fine-grained control over every plot element, creating novel plot types, or integrating with specific scientific workflows. Export to PNG/PDF/SVG for publication. For quick statistical plots use seaborn; for interactive plots use plotly; for publication-ready multi-panel figures with journal styling, use scientific-visualization.

workflow

Skill Information

Category:Technical
Last Updated:12/22/2025