Working With Reference Tables

by rustomax

data

Work with Reference Tables (static CSV lookup data) using OPAL to enrich datasets with descriptive information. Use when you need to map IDs to human-readable names, add static metadata from CSV uploads, or perform lookups without temporal considerations. Covers both explicit and implicit lookup patterns, column name matching, and when to choose Reference Tables vs Resources vs Correlation Tags.

Skill Details

Repository Files

1 file in this skill directory


name: working-with-reference-tables description: Work with Reference Tables (static CSV lookup data) using OPAL to enrich datasets with descriptive information. Use when you need to map IDs to human-readable names, add static metadata from CSV uploads, or perform lookups without temporal considerations. Covers both explicit and implicit lookup patterns, column name matching, and when to choose Reference Tables vs Resources vs Correlation Tags.

Working with Reference Tables

Work with Reference Tables (static lookup data) using OPAL to enrich datasets with descriptive information. Reference Tables store static mappings (max 10MB CSV) that don't track changes over time, providing an alternative to Resources when no temporal aspect is needed.

Use when you need to:

  • Map IDs to human-readable names (product IDs → product names, error codes → descriptions)
  • Enrich logs/spans with static metadata
  • Add dimension data from CSV uploads
  • Lookup values without temporal considerations

Covers reference table fundamentals, both explicit and implicit lookup patterns, and when to choose Reference Tables vs Resources vs Correlation Tags.

Key Concepts

What Are Reference Tables?

Reference Tables are static lookup datasets created from CSV uploads:

  • No timestamps - Static data
  • No change tracking - Subsequent uploads of the same table overwrite previous state
  • Max 10MB CSV file size
  • Uploaded manually via Observe UI or API
  • Primary key column for joining
  • Value columns with descriptive data

Example: Product reference table

app_product_id,app_product_name
"OLJCESPC7Z","National Park Foundation Explorascope"
"L9ECAV7KIM","Lens Cleaning Kit"
"6E92ZMYYFZ","Solar Filter"

When to Use Reference Tables

  • Data is static (doesn't change over time, or if it does change, the state tracking is not required)
  • Need simple ID-to-name mappings
  • No temporal aspect required
  • Dataset size under 10MB

Lookup Patterns

Reference Tables support two lookup approaches: explicit join (recommended) and implicit join (requires column name matching).

Pattern A: Explicit Lookup (Recommended)

Most flexible - specify join condition directly without column name matching:

# Join Product Logs with Product reference table using alias
lookup @product.app_product_id=product_id, pid:product_id, product_name:@product.app_product_name
| statsby count(), group_by(product_name)
| topk 10, max(product_name)

How it works:

  1. Use alias for reference table: @product
  2. Specify join condition: @product.app_product_id=product_id
  3. Select columns to retrieve: pname:@product.app_product_name
  4. No need to match column names!

MCP Parameters:

{
  "primary_dataset_id": "42782295",
  "secondary_dataset_ids": ["42782294"],
  "dataset_aliases": {"product": "42782294"}
}

Key advantages:

  • Column names don't need to match
  • Clear and explicit join condition
  • Full control over retrieved columns
  • No extra make_col needed

Pattern B: Implicit Lookup (Column Name Matching)

Requires exact column name matching - simpler syntax but less flexible:

make_col app_product_id:product_id
| lookup @product_ref
| make_col pid:product_id, pname:app_product_name
| limit 10

How it works:

  1. Reference table has primary key app_product_id
  2. Source dataset creates matching column: make_col app_product_id:product_id
  3. lookup automatically joins on matching column name
  4. All reference table columns added to result

MCP Parameters:

{
  "primary_dataset_id": "42782295",
  "secondary_dataset_ids": ["42782294"],
  "dataset_aliases": {"product_ref": "42782294"}
}

Why implicit requires matching:

❌ WRONG - Mismatched names fail:

lookup @product_ref  # Source has 'product_id', reference has 'app_product_id'

Error: "implicit lookup requires all primary key columns in the other dataset to match columns in the source dataset; missing columns from the source dataset: app_product_id"

✅ CORRECT - Create matching column first:

make_col app_product_id:product_id
| lookup @product_ref

When to use implicit:

  • Simple joins where column names already match
  • When you want automatic inclusion of all reference columns
  • Legacy queries or established patterns

When to use explicit (Pattern A):

  • Column names don't match (most common!)
  • Want control over which columns to retrieve
  • Clearer, more maintainable queries

Pattern C: Using on() with Column Bindings

Full control over join conditions and column selection:

# Join with explicit on() syntax
lookup on(product_id=@product.app_product_id), product_name:@product.app_product_name
| statsby count(), group_by(product_name)

Use case: Complex join conditions or when you need precise control over the join and column bindings

Lookup Behavior

  • Join type: Left outer join (keeps all rows)
  • No match: Reference table columns are NULL
  • Multiple matches: Returns all matching rows (Cartesian product)
  • Performance: Fast for small reference tables (<10MB)

Common Patterns

Pattern: Enrich with Descriptive Names (Explicit Join)

# Using explicit lookup - no column name matching needed!
lookup @product.app_product_id=product_id, pname:@product.app_product_name
| filter not is_null(pname)
| make_col pid:product_id, name:pname, service:container
| limit 20

Use case: Add human-readable product names to logs

Result: Logs with "National Park Foundation Explorascope" instead of "OLJCESPC7Z"

Alternative (implicit join):

make_col app_product_id:product_id
| lookup @product_ref
| filter not is_null(app_product_name)
| make_col pid:product_id, name:app_product_name, service:container
| limit 20

Pattern: Aggregate with Reference Data (Explicit Join)

# Using explicit lookup - cleaner and more maintainable
lookup @product.app_product_id=product_id, pname:@product.app_product_name
| statsby log_count:count(), group_by(pname)
| sort desc(log_count)

Use case: Count events by descriptive name

Result:

pname,log_count
"National Park Foundation Explorascope",864

Alternative (implicit join):

make_col app_product_id:product_id
| lookup @product_ref
| statsby log_count:count(), group_by(app_product_name)
| sort desc(log_count)

Pattern: Browse Reference Table Contents

make_col id:app_product_id, name:app_product_name
| limit 50

Dataset: Query reference table directly (use reference table as primary_dataset_id)

Use case: See available lookup values

Result: Complete list of products in reference table

Pattern: Handle Missing Lookups (Explicit Join)

# Using explicit lookup
lookup @product.app_product_id=product_id, pname:@product.app_product_name
| make_col pid:product_id,
          name:if(is_null(pname), "Unknown Product", pname)
| limit 10

Use case: Provide default value when reference lookup fails

Behavior: Shows "Unknown Product" when pname is NULL

Pattern: Filter to Matched Rows Only (Explicit Join)

# Using explicit lookup
lookup @product.app_product_id=product_id, pname:@product.app_product_name
| filter not is_null(pname)
| make_col pid:product_id, name:pname
| limit 20

Use case: Exclude rows without reference table matches

Behavior: Only returns rows with successful lookups

Troubleshooting

Issue: "Missing columns from source dataset"

Error: "implicit lookup requires all primary key columns in the other dataset to match columns in the source dataset; missing columns from the source dataset: app_product_id"

Cause: Source dataset doesn't have column matching reference table's primary key

Solution: Create matching column with make_col:

make_col app_product_id:product_id
| lookup @product_ref

Key insight: Column names must match exactly (case-sensitive!)

Issue: All reference columns are NULL

Cause: No matches found (lookup is left outer join)

Diagnosis: Check if join values actually exist in reference table:

filter app_product_id = "OLJCESPC7Z"
| limit 1

(Query reference table directly to verify value exists)

Solutions:

  1. Verify product_id values in source match app_product_id in reference
  2. Check for typos or case sensitivity issues
  3. Ensure reference table uploaded correctly

Issue: "Implicit lookup does not support additional arguments"

Error: "implicit lookup does not support additional arguments like explicit join predicates or column bindings"

Cause: Mixing implicit and explicit syntax:

lookup @product_ref on product_id = @product_ref.app_product_id  ❌

Solution Option 1: Use explicit lookup (recommended):

lookup @product.app_product_id=product_id, pname:@product.app_product_name  ✅

Solution Option 2: Use implicit lookup with column name matching:

make_col app_product_id:product_id
| lookup @product_ref  ✅

Key Takeaways

  1. Reference Tables are for static CSV data (no timestamps, no change tracking)
  2. Three lookup patterns available:
    • Explicit (recommended): lookup @product.app_product_id=product_id, pname:@product.app_product_name
    • Implicit: Requires exact column name matching via make_col
    • on() syntax: Full control with lookup on(condition), bindings
  3. Explicit lookup advantages: No column name matching needed, clearer syntax, full control over retrieved columns
  4. Left outer join behavior - keeps all rows, NULL when no match
  5. Use for static enrichment - alternative to Resources when no temporal aspect needed
  6. 10MB size limit per reference table
  7. Fast and efficient for small lookup datasets
  8. Column matching is case-sensitive when using implicit lookup
  9. Use is_null() to check for failed lookups - provides default values or filters unmatched rows
  10. Query reference tables directly to browse available lookup values

When to Choose Reference Tables vs Resources

Scenario Use Reference Tables Use Resources
Static data that doesn't change
Data changes over time and you need state history
Simple ID-to-name mappings
Need temporal joins with Events/Intervals
CSV upload (max 10MB)
Track mutable state evolution
Fast lookups without timestamps

References

  • Reference Tables created via CSV upload in Observe UI
  • Use lookup verb with @ alias for joining
  • Explicit lookup: @alias.ref_column=source_column, result:@alias.value_column
  • Implicit lookup: Requires matching column names (automatic join)
  • on() syntax: lookup on(condition), column_bindings
  • Maximum 10MB CSV size per table
  • Interface type typically shows as "unknown" in discovery

Related Skills

Xlsx

Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas

data

Clickhouse Io

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

datacli

Clickhouse Io

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

datacli

Analyzing Financial Statements

This skill calculates key financial ratios and metrics from financial statement data for investment analysis

data

Data Storytelling

Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.

data

Kpi Dashboard Design

Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.

designdata

Dbt Transformation Patterns

Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.

testingdocumenttool

Sql Optimization Patterns

Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.

designdata

Anndata

This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.

arttooldata

Xlsx

Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.

tooldata

Skill Information

Category:Data
Last Updated:12/23/2025