Bsl Query Expert
by boringdata
Query BSL semantic models with group_by, aggregate, filter, and visualizations. Use for data analysis from existing semantic tables.
Skill Details
Repository Files
1 file in this skill directory
name: bsl-query-expert description: Query BSL semantic models with group_by, aggregate, filter, and visualizations. Use for data analysis from existing semantic tables.
BSL Query Expert
Query semantic models using BSL. Be concise.
Workflow
list_models()→ discover available modelsget_model(name)→ get schema (REQUIRED before querying)get_documentation("query-methods")→ call before first query to learn syntaxquery_model(query)→ execute, auto-displays results- Brief summary (1-2 sentences max)
Behavior
- Execute queries immediately - don't show code to user
- Never stop after listing models - proceed to query
- Charts/tables auto-display - don't print data inline
- Reuse context: Don't re-call tools if info already in context
- IMPORTANT: If query fails → call
get_documentation("query-methods")to learn correct syntax before retrying
CRITICAL: Field Names
- Use EXACT names from
get_model()output - Joined columns:
t.customers.country(nott.customer_id.country()) - Direct columns:
t.region(nott.model.region) - NEVER invent methods on columns - they don't exist!
CRITICAL: Never Guess Filter Values
- WRONG:
.filter(lambda t: t.region.isin(["US", "EU"]))without checking actual values first - Data uses codes/IDs that differ from what you expect (e.g., "California" might be "CA" or "US-CA")
- Always discover values first, then filter with real data
Multi-Hop Query Pattern
When filtering by names/locations/categories you haven't seen:
Step 1 (discover): query_model(query="model.group_by('region').aggregate('count')", records_limit=50, get_chart=false)
Step 2 (filter): query_model(query="model.filter(lambda t: t.region.isin(['CA','NY'])).group_by('region').aggregate('count')", get_records=false)
- Step 1: Get data to LLM (
records_limit=50), hide chart (get_chart=false) - Step 2: Display to user (
get_records=false), show chart (default)
query_model Parameters
get_records=true(default): Return data to LLM, table auto-displaysget_records=false: Display-only, no data returned to LLMrecords_limit=N: Max records to LLM (increase for discovery queries)get_chart=true(default): Show chart;falsefor table-only
CRITICAL: Exploration vs Final Query
- Discovery/exploration queries: Use
get_chart=false- no chart when exploring data values - Final answer query: Use
get_chart=true(default) - show chart for user's answer - Example: Looking up airport codes? →
get_chart=false. Final flight count? → chart enabled
Charts
- Default: Omit chart_spec - auto-detect handles most cases
- Override only if needed:
chart_spec={"chart_type": "line"}or"bar" - CRITICAL: Charting only works on BSL SemanticQuery results (after group_by + aggregate)
- If you use filter-only queries (returns Ibis Table), set
get_chart=false- charts will fail on raw tables
Time Dimensions
- Use
.truncate()for time columns:with_dimensions(year=lambda t: t.date.truncate("Y")) - Units:
"Y","Q","M","W","D","h","m","s"
CRITICAL: Case Expressions
- Use
ibis.cases()(PLURAL) - NOTibis.case() - Syntax:
ibis.cases((condition1, value1), (condition2, value2), else_=default) - Example:
ibis.cases((t.value > 100, "high"), (t.value > 50, "medium"), else_="low")
Help
get_documentation(topic) for:
- Core: getting-started, semantic-table, yaml-config, profile, compose, query-methods
- Advanced: windowing, bucketing, nested-subtotals, percentage-total, indexing, sessionized, comparison
- Charts: charting, charting-altair, charting-plotly, charting-plotext
Additional Information
Available documentation:
- Getting Started: Introduction to BSL, installation, and basic usage with semantic tables
- Semantic Tables: Building semantic models with dimensions, measures, and expressions
- YAML Configuration: Defining semantic models in YAML files for better organization
- Profiles: Database connection profiles for connecting to data sources
- Composing Models: Joining multiple semantic tables together
- Query Methods: Complete API reference for group_by, aggregate, filter, order_by, limit, mutate
- Window Functions: Running totals, moving averages, rankings, lag/lead, and cumulative calculations
- Bucketing with Other: Create categorical buckets and consolidate long-tail into 'Other' category
- Nested Subtotals: Rollup calculations with subtotals at each grouping level
- Percent of Total: Calculate percentages using t.all() for market share and distribution analysis
- Dimensional Indexing: Compare values to baselines and calculate indexed metrics
- Charting Overview: Data visualization basics with automatic chart type detection
- Altair Charts: Interactive web charts with Vega-Lite via Altair backend
- Plotly Charts: Interactive charts with Plotly backend for dashboards
- Terminal Charts: ASCII charts for terminal/CLI with Plotext backend
- Sessionized Data: Working with session-based data and user journey analysis
- Comparison Queries: Period-over-period comparisons and trend analysis
Query Syntax Reference
Execute BSL queries and visualize results. Returns query results with optional charts.
Core Pattern
model.group_by(<dimensions>).aggregate(<measures>) # Both take STRING names only
CRITICAL: aggregate() takes measure names as strings, NOT expressions or lambdas!
Method Order
model -> with_dimensions -> filter -> with_measures -> group_by -> aggregate -> order_by -> mutate -> limit
Lambda Column Access
CRITICAL: In with_dimensions and with_measures lambdas, access columns directly - NO model prefix!
# ✅ CORRECT - access columns directly via t
flights.with_dimensions(x=lambda t: ibis.cases((t.carrier == "WN", "Southwest"), else_="Other"))
flights.with_measures(pct=lambda t: t.flight_count / t.all(t.flight_count) * 100)
# ❌ WRONG - model prefix fails in with_dimensions/with_measures
flights.with_dimensions(x=lambda t: t.flights.carrier) # ERROR: 'Table' has no attribute 'flights'
flights.with_measures(x=lambda t: t.flights.flight_count) # ERROR!
Note: Model prefix (e.g., t.flights.carrier) works in .filter() but NOT in with_dimensions/with_measures.
Filtering
# Simple filter
model.filter(lambda t: t.status == "active").group_by("category").aggregate("count")
# Multiple conditions - use ibis.and_() / ibis.or_()
model.filter(lambda t: ibis.and_(t.amount > 1000, t.year >= 2023))
# IN operator - MUST use .isin() (Python "in" does NOT work!)
model.filter(lambda t: t.region.isin(["US", "EU"])) # ✅
model.filter(lambda t: t.region in ["US", "EU"]) # ❌ ERROR!
# Post-aggregate filter (SQL HAVING) - filter AFTER aggregate
model.group_by("carrier").aggregate("count").filter(lambda t: t.count > 1000)
Joined Columns
Models with joins expose prefixed columns (e.g., customers.country). Use EXACT names from get_model():
# ✅ CORRECT - use prefixed column name
model.filter(lambda t: t.customers.country.isin(["US", "CA"])).group_by("customers.country").aggregate("count")
# ❌ WRONG - columns don't have lookup methods!
model.filter(lambda t: t.customer_id.country()) # ERROR: no 'country' attribute
Key: Look for prefixed columns in get_model() output - don't call methods on ID columns.
Time Transformations
group_by() only accepts strings. Use .with_dimensions() first:
model.with_dimensions(year=lambda t: t.created_at.truncate("Y")).group_by("year").aggregate("count")
Truncate units: "Y", "Q", "M", "W", "D", "h", "m", "s"
Filtering Timestamps - Match Types!
# .year() returns int -> compare with int
model.filter(lambda t: t.created_at.year() >= 2023)
# .truncate() returns timestamp -> compare with ISO string
model.with_dimensions(yr=lambda t: t.created_at.truncate("Y")).filter(lambda t: t.yr >= '2023-01-01')
Percentage of Total
Use t.all(t.measure) in .with_measures() for grand total:
# Simple percentage by category
sales.with_measures(pct=lambda t: t.revenue / t.all(t.revenue) * 100).group_by("category").aggregate("revenue", "pct")
# Complex: filter + joined column + time dimension + percentage
orders.filter(lambda t: t.customers.country.isin(["US", "CA"])).with_dimensions(
order_date=lambda t: t.created_at.date()
).with_measures(
pct=lambda t: t.order_count / t.all(t.order_count) * 100
).group_by("order_date").aggregate("order_count", "pct").order_by("order_date")
More: get_documentation(topic="percentage-total")
Sorting & Limiting
model.group_by("category").aggregate("revenue").order_by(ibis.desc("revenue")).limit(10)
CRITICAL: .limit() in query limits data before calculations. Use limit parameter for display-only limiting.
Window Functions
.mutate() for post-aggregation transforms - MUST come after .order_by():
model.group_by("week").aggregate("count").order_by("week").mutate(
rolling_avg=lambda t: t.count.mean().over(ibis.window(rows=(-9, 0), order_by="week"))
)
More: get_documentation(topic="windowing")
Chart
chart_spec={"chart_type": "bar"} # or "line", "scatter" - omit for auto-detect
Related Skills
Xlsx
Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Analyzing Financial Statements
This skill calculates key financial ratios and metrics from financial statement data for investment analysis
Data Storytelling
Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.
Kpi Dashboard Design
Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.
Dbt Transformation Patterns
Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.
Sql Optimization Patterns
Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.
Anndata
This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.
Xlsx
Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.
