name: working-with-reference-tables description: Work with Reference Tables (static CSV lookup data) using OPAL to enrich datasets with descriptive information. Use when you need to map IDs to human-readable names, add static metadata from CSV uploads, or perform lookups without temporal considerations. Covers both explicit and implicit lookup patterns, column name matching, and when to choose Reference Tables vs Resources vs Correlation Tags.

Working with Reference Tables

Work with Reference Tables (static lookup data) using OPAL to enrich datasets with descriptive information. Reference Tables store static mappings (max 10MB CSV) that don't track changes over time, providing an alternative to Resources when no temporal aspect is needed.

Use when you need to:

Map IDs to human-readable names (product IDs → product names, error codes → descriptions)
Enrich logs/spans with static metadata
Add dimension data from CSV uploads
Lookup values without temporal considerations

Covers reference table fundamentals, both explicit and implicit lookup patterns, and when to choose Reference Tables vs Resources vs Correlation Tags.

Key Concepts

What Are Reference Tables?

Reference Tables are static lookup datasets created from CSV uploads:

No timestamps - Static data
No change tracking - Subsequent uploads of the same table overwrite previous state
Max 10MB CSV file size
Uploaded manually via Observe UI or API
Primary key column for joining
Value columns with descriptive data

Example: Product reference table

app_product_id,app_product_name
"OLJCESPC7Z","National Park Foundation Explorascope"
"L9ECAV7KIM","Lens Cleaning Kit"
"6E92ZMYYFZ","Solar Filter"

When to Use Reference Tables

Data is static (doesn't change over time, or if it does change, the state tracking is not required)
Need simple ID-to-name mappings
No temporal aspect required
Dataset size under 10MB

Lookup Patterns

Reference Tables support two lookup approaches: explicit join (recommended) and implicit join (requires column name matching).

Pattern A: Explicit Lookup (Recommended)

Most flexible - specify join condition directly without column name matching:

# Join Product Logs with Product reference table using alias
lookup @product.app_product_id=product_id, pid:product_id, product_name:@product.app_product_name
| statsby count(), group_by(product_name)
| topk 10, max(product_name)

How it works:

Use alias for reference table: @product
Specify join condition: @product.app_product_id=product_id
Select columns to retrieve: pname:@product.app_product_name
No need to match column names!

MCP Parameters:

{
  "primary_dataset_id": "42782295",
  "secondary_dataset_ids": ["42782294"],
  "dataset_aliases": {"product": "42782294"}
}

Key advantages:

Column names don't need to match
Clear and explicit join condition
Full control over retrieved columns
No extra make_col needed

Pattern B: Implicit Lookup (Column Name Matching)

Requires exact column name matching - simpler syntax but less flexible:

make_col app_product_id:product_id
| lookup @product_ref
| make_col pid:product_id, pname:app_product_name
| limit 10

How it works:

Reference table has primary key app_product_id
Source dataset creates matching column: make_col app_product_id:product_id
lookup automatically joins on matching column name
All reference table columns added to result

MCP Parameters:

{
  "primary_dataset_id": "42782295",
  "secondary_dataset_ids": ["42782294"],
  "dataset_aliases": {"product_ref": "42782294"}
}

Why implicit requires matching:

❌ WRONG - Mismatched names fail:

lookup @product_ref  # Source has 'product_id', reference has 'app_product_id'

Error: "implicit lookup requires all primary key columns in the other dataset to match columns in the source dataset; missing columns from the source dataset: app_product_id"

✅ CORRECT - Create matching column first:

make_col app_product_id:product_id
| lookup @product_ref

When to use implicit:

Simple joins where column names already match
When you want automatic inclusion of all reference columns
Legacy queries or established patterns

When to use explicit (Pattern A):

Column names don't match (most common!)
Want control over which columns to retrieve
Clearer, more maintainable queries

Pattern C: Using on() with Column Bindings

Full control over join conditions and column selection:

# Join with explicit on() syntax
lookup on(product_id=@product.app_product_id), product_name:@product.app_product_name
| statsby count(), group_by(product_name)

Use case: Complex join conditions or when you need precise control over the join and column bindings

Lookup Behavior

Join type: Left outer join (keeps all rows)
No match: Reference table columns are NULL
Multiple matches: Returns all matching rows (Cartesian product)
Performance: Fast for small reference tables (<10MB)

Common Patterns

Pattern: Enrich with Descriptive Names (Explicit Join)

# Using explicit lookup - no column name matching needed!
lookup @product.app_product_id=product_id, pname:@product.app_product_name
| filter not is_null(pname)
| make_col pid:product_id, name:pname, service:container
| limit 20

Use case: Add human-readable product names to logs

Result: Logs with "National Park Foundation Explorascope" instead of "OLJCESPC7Z"

Alternative (implicit join):

make_col app_product_id:product_id
| lookup @product_ref
| filter not is_null(app_product_name)
| make_col pid:product_id, name:app_product_name, service:container
| limit 20

Pattern: Aggregate with Reference Data (Explicit Join)

# Using explicit lookup - cleaner and more maintainable
lookup @product.app_product_id=product_id, pname:@product.app_product_name
| statsby log_count:count(), group_by(pname)
| sort desc(log_count)

Use case: Count events by descriptive name

Result:

pname,log_count
"National Park Foundation Explorascope",864

Alternative (implicit join):

make_col app_product_id:product_id
| lookup @product_ref
| statsby log_count:count(), group_by(app_product_name)
| sort desc(log_count)

Pattern: Browse Reference Table Contents

make_col id:app_product_id, name:app_product_name
| limit 50

Dataset: Query reference table directly (use reference table as primary_dataset_id)

Use case: See available lookup values

Result: Complete list of products in reference table

Pattern: Handle Missing Lookups (Explicit Join)

# Using explicit lookup
lookup @product.app_product_id=product_id, pname:@product.app_product_name
| make_col pid:product_id,
          name:if(is_null(pname), "Unknown Product", pname)
| limit 10

Use case: Provide default value when reference lookup fails

Behavior: Shows "Unknown Product" when pname is NULL

Pattern: Filter to Matched Rows Only (Explicit Join)

# Using explicit lookup
lookup @product.app_product_id=product_id, pname:@product.app_product_name
| filter not is_null(pname)
| make_col pid:product_id, name:pname
| limit 20

Use case: Exclude rows without reference table matches

Behavior: Only returns rows with successful lookups

Troubleshooting

Issue: "Missing columns from source dataset"

Error: "implicit lookup requires all primary key columns in the other dataset to match columns in the source dataset; missing columns from the source dataset: app_product_id"

Cause: Source dataset doesn't have column matching reference table's primary key

Solution: Create matching column with make_col:

make_col app_product_id:product_id
| lookup @product_ref

Key insight: Column names must match exactly (case-sensitive!)

Issue: All reference columns are NULL

Cause: No matches found (lookup is left outer join)

Diagnosis: Check if join values actually exist in reference table:

filter app_product_id = "OLJCESPC7Z"
| limit 1

(Query reference table directly to verify value exists)

Solutions:

Verify product_id values in source match app_product_id in reference
Check for typos or case sensitivity issues
Ensure reference table uploaded correctly

Issue: "Implicit lookup does not support additional arguments"

Error: "implicit lookup does not support additional arguments like explicit join predicates or column bindings"

Cause: Mixing implicit and explicit syntax:

lookup @product_ref on product_id = @product_ref.app_product_id  ❌

Solution Option 1: Use explicit lookup (recommended):

lookup @product.app_product_id=product_id, pname:@product.app_product_name  ✅

Solution Option 2: Use implicit lookup with column name matching:

make_col app_product_id:product_id
| lookup @product_ref  ✅

Key Takeaways

Reference Tables are for static CSV data (no timestamps, no change tracking)
Three lookup patterns available:
- Explicit (recommended): lookup @product.app_product_id=product_id, pname:@product.app_product_name
- Implicit: Requires exact column name matching via make_col
- on() syntax: Full control with lookup on(condition), bindings
Explicit lookup advantages: No column name matching needed, clearer syntax, full control over retrieved columns
Left outer join behavior - keeps all rows, NULL when no match
Use for static enrichment - alternative to Resources when no temporal aspect needed
10MB size limit per reference table
Fast and efficient for small lookup datasets
Column matching is case-sensitive when using implicit lookup
Use is_null() to check for failed lookups - provides default values or filters unmatched rows
Query reference tables directly to browse available lookup values

When to Choose Reference Tables vs Resources

Scenario	Use Reference Tables	Use Resources
Static data that doesn't change	✅	❌
Data changes over time and you need state history	❌	✅
Simple ID-to-name mappings	✅	❌
Need temporal joins with Events/Intervals	❌	✅
CSV upload (max 10MB)	✅	❌
Track mutable state evolution	❌	✅
Fast lookups without timestamps	✅	❌

References

Reference Tables created via CSV upload in Observe UI
Use lookup verb with @ alias for joining
Explicit lookup: @alias.ref_column=source_column, result:@alias.value_column
Implicit lookup: Requires matching column names (automatic join)
on() syntax: lookup on(condition), column_bindings
Maximum 10MB CSV size per table
Interface type typically shows as "unknown" in discovery

Working With Reference Tables

Skill Details

Repository Files