Tracing Downstream Lineage

by astronomer

data

Trace downstream data lineage and impact analysis. Use when the user asks what depends on this data, what breaks if something changes, downstream dependencies, or needs to assess change risk before modifying a table or DAG.

Skill Details

Repository Files

1 file in this skill directory


name: tracing-downstream-lineage description: Trace downstream data lineage and impact analysis. Use when the user asks what depends on this data, what breaks if something changes, downstream dependencies, or needs to assess change risk before modifying a table or DAG.

Downstream Lineage: Impacts

Answer the critical question: "What breaks if I change this?"

Use this BEFORE making changes to understand the blast radius.

Impact Analysis

Step 1: Identify Direct Consumers

Find everything that reads from this target:

For Tables:

  1. Search DAG source code: Look for DAGs that SELECT from this table

    • Use list_dags to get all DAGs
    • Use get_dag_source to search for table references
    • Look for: FROM target_table, JOIN target_table
  2. Check for dependent views:

    -- Snowflake
    SELECT * FROM information_schema.view_table_usage
    WHERE table_name = '<target_table>'
    
    -- Or check SHOW VIEWS and search definitions
    
  3. Look for BI tool connections:

    • Dashboards often query tables directly
    • Check for common BI patterns in table naming (rpt_, dashboard_)

For DAGs:

  1. Check what the DAG produces: Use get_dag_source to find output tables
  2. Then trace those tables' consumers (recursive)

Step 2: Build Dependency Tree

Map the full downstream impact:

SOURCE: fct.orders
    |
    +-- TABLE: agg.daily_sales --> Dashboard: Executive KPIs
    |       |
    |       +-- TABLE: rpt.monthly_summary --> Email: Monthly Report
    |
    +-- TABLE: ml.order_features --> Model: Demand Forecasting
    |
    +-- DIRECT: Looker Dashboard "Sales Overview"

Step 3: Categorize by Criticality

Critical (breaks production):

  • Production dashboards
  • Customer-facing applications
  • Automated reports to executives
  • ML models in production
  • Regulatory/compliance reports

High (causes significant issues):

  • Internal operational dashboards
  • Analyst workflows
  • Data science experiments
  • Downstream ETL jobs

Medium (inconvenient):

  • Ad-hoc analysis tables
  • Development/staging copies
  • Historical archives

Low (minimal impact):

  • Deprecated tables
  • Unused datasets
  • Test data

Step 4: Assess Change Risk

For the proposed change, evaluate:

Schema Changes (adding/removing/renaming columns):

  • Which downstream queries will break?
  • Are there SELECT * patterns that will pick up new columns?
  • Which transformations reference the changing columns?

Data Changes (values, volumes, timing):

  • Will downstream aggregations still be valid?
  • Are there NULL handling assumptions that will break?
  • Will timing changes affect SLAs?

Deletion/Deprecation:

  • Full dependency tree must be migrated first
  • Communication needed for all stakeholders

Step 5: Find Stakeholders

Identify who owns downstream assets:

  1. DAG owners: Check owners field in DAG definitions
  2. Dashboard owners: Usually in BI tool metadata
  3. Team ownership: Look for team naming patterns or documentation

Output: Impact Report

Summary

"Changing fct.orders will impact X tables, Y DAGs, and Z dashboards"

Impact Diagram

                    +--> [agg.daily_sales] --> [Executive Dashboard]
                    |
[fct.orders] -------+--> [rpt.order_details] --> [Ops Team Email]
                    |
                    +--> [ml.features] --> [Demand Model]

Detailed Impacts

Downstream Type Criticality Owner Notes
agg.daily_sales Table Critical data-eng Updated hourly
Executive Dashboard Dashboard Critical analytics CEO views daily
ml.order_features Table High ml-team Retraining weekly

Risk Assessment

Change Type Risk Level Mitigation
Add column Low No action needed
Rename column High Update 3 DAGs, 2 dashboards
Delete column Critical Full migration plan required
Change data type Medium Test downstream aggregations

Recommended Actions

Before making changes:

  1. Notify owners: @data-eng, @analytics, @ml-team
  2. Update downstream DAG: transform_daily_sales
  3. Test dashboard: Executive KPIs
  4. Schedule change during low-impact window

Related Skills

  • Trace where data comes from: tracing-upstream-lineage skill
  • Check downstream freshness: checking-freshness skill
  • Debug any broken DAGs: debugging-dags skill

Related Skills

Xlsx

Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas

data

Clickhouse Io

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

datacli

Clickhouse Io

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

datacli

Analyzing Financial Statements

This skill calculates key financial ratios and metrics from financial statement data for investment analysis

data

Data Storytelling

Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.

data

Kpi Dashboard Design

Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.

designdata

Dbt Transformation Patterns

Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.

testingdocumenttool

Sql Optimization Patterns

Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.

designdata

Anndata

This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.

arttooldata

Xlsx

Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.

tooldata

Skill Information

Category:Data
Last Updated:1/23/2026