Altinity Expert Clickhouse Reporting

by ntk148v

cli

Diagnose ClickHouse SELECT query performance, analyze query patterns, identify slow queries, and find optimization opportunities. Use for query latency and timeout issues.

Skill Details

Repository Files

1 file in this skill directory


name: altinity-expert-clickhouse-reporting description: Diagnose ClickHouse SELECT query performance, analyze query patterns, identify slow queries, and find optimization opportunities. Use for query latency and timeout issues.

Query Performance Analysis

Diagnose SELECT query performance issues, analyze query patterns, and identify optimization opportunities.


Quick Diagnostics

1. Current Running Queries

select
    query_id,
    user,
    round(elapsed, 1) as elapsed_sec,
    formatReadableSize(read_bytes) as read_bytes,
    formatReadableSize(memory_usage) as memory,
    read_rows,
    substring(query, 1, 80) as query_preview
from system.processes
where is_cancelled = 0
order by elapsed desc
limit 20

2. Recent Query Performance Summary

select
    toStartOfFiveMinutes(event_time) as ts,
    count() as queries,
    countIf(type like 'Exception%') as failed,
    round(avg(query_duration_ms)) as avg_ms,
    round(quantile(0.95)(query_duration_ms)) as p95_ms,
    round(max(query_duration_ms)) as max_ms,
    formatReadableSize(sum(read_bytes)) as read_bytes,
    formatReadableSize(sum(memory_usage)) as memory
from system.query_log
where event_time > now() - interval 1 hour
  and type in ('QueryFinish', 'ExceptionWhileProcessing')
group by ts
order by ts desc

3. Slowest Queries (Last 24h)

select
    query_id,
    user,
    query_duration_ms,
    formatReadableSize(read_bytes) as read_bytes,
    formatReadableSize(memory_usage) as memory,
    read_rows,
    result_rows,
    substring(query, 1, 100) as query_preview
from system.query_log
where type = 'QueryFinish'
  and event_date >= today() - 1
  and query_kind = 'Select'
order by query_duration_ms desc
limit 20

4. Most Frequent Queries

select
    normalized_query_hash,
    count() as executions,
    round(avg(query_duration_ms)) as avg_ms,
    round(quantile(0.95)(query_duration_ms)) as p95_ms,
    formatReadableSize(avg(read_bytes)) as avg_read,
    formatReadableSize(avg(memory_usage)) as avg_memory,
    any(substring(query, 1, 100)) as query_sample
from system.query_log
where type = 'QueryFinish'
  and event_date = today()
  and query_kind = 'Select'
group by normalized_query_hash
having count() > 5
order by count() desc
limit 30

Query Analysis

Queries by CPU Time

select
    normalized_query_hash,
    sum(ProfileEvents['UserTimeMicroseconds']) as user_cpu_us,
    sum(ProfileEvents['SystemTimeMicroseconds']) as system_cpu_us,
    count() as executions,
    round(avg(query_duration_ms)) as avg_duration_ms,
    any(substring(query, 1, 80)) as query_sample
from system.query_log
where type = 'QueryFinish'
  and event_date >= today() - 1
group by normalized_query_hash
order by user_cpu_us desc
limit 20

Queries Reading Too Much Data

select
    query_id,
    user,
    formatReadableSize(read_bytes) as read_bytes,
    read_rows,
    result_rows,
    round(read_rows / nullIf(result_rows, 0)) as read_amplification,
    round(query_duration_ms / 1000, 1) as duration_sec,
    substring(query, 1, 100) as query_preview
from system.query_log
where type = 'QueryFinish'
  and event_date = today()
  and query_kind = 'Select'
  and read_rows > 0
order by read_bytes desc
limit 20

High read_amplification indicates:

  • Missing or ineffective indexes
  • Poor ORDER BY alignment with query patterns
  • Full table scans

Queries by Tables Accessed

select
    arrayStringConcat(tables, ', ') as tables,
    count() as queries,
    round(avg(query_duration_ms)) as avg_ms,
    formatReadableSize(avg(read_bytes)) as avg_read
from system.query_log
where type = 'QueryFinish'
  and event_date = today()
  and length(tables) > 0
group by tables
order by count() desc
limit 30

Failed Queries

Recent Failures

select
    event_time,
    user,
    exception_code,
    substring(exception, 1, 150) as exception,
    substring(query, 1, 100) as query_preview
from system.query_log
where type like 'Exception%'
  and event_date = today()
order by event_time desc
limit 30

Failure Summary by Error Code

select
    exception_code,
    count() as failures,
    any(substring(exception, 1, 100)) as example_exception
from system.query_log
where type like 'Exception%'
  and event_date = today()
group by exception_code
order by failures desc
limit 20

Common exception codes:

  • 60 - Table doesn't exist
  • 62 - Syntax error
  • 241 - Memory limit exceeded
  • 159 - Timeout
  • 252 - Too many parts

Query Pattern Analysis

Query Types Distribution

select
    query_kind,
    count() as queries,
    round(avg(query_duration_ms)) as avg_ms,
    formatReadableSize(sum(read_bytes)) as total_read,
    formatReadableSize(sum(written_bytes)) as total_written
from system.query_log
where type = 'QueryFinish'
  and event_date = today()
group by query_kind
order by queries desc

Peak Query Hours

select
    toHour(event_time) as hour,
    count() as queries,
    round(avg(query_duration_ms)) as avg_ms,
    max(query_duration_ms) as max_ms
from system.query_log
where type = 'QueryFinish'
  and event_date = today()
group by hour
order by hour

Queries by User

select
    user,
    count() as queries,
    countIf(type like 'Exception%') as failures,
    round(avg(query_duration_ms)) as avg_ms,
    formatReadableSize(sum(read_bytes)) as total_read
from system.query_log
where event_date = today()
group by user
order by queries desc

Materialized View Performance

MV Execution During Inserts

select
    view_name,
    count() as trigger_count,
    round(avg(view_duration_ms)) as avg_ms,
    round(max(view_duration_ms)) as max_ms,
    round(sum(view_duration_ms) / 1000) as total_sec,
    formatReadableSize(sum(written_bytes)) as written
from system.query_views_log
where event_time > now() - interval 1 hour
group by view_name
order by sum(view_duration_ms) desc
limit 20

Slow MV Breakdown by Query

select
    query_id,
    view_name,
    view_type,
    view_duration_ms,
    read_rows,
    written_rows,
    formatReadableSize(peak_memory_usage) as memory,
    status
from system.query_views_log
where event_time > now() - interval 1 hour
order by view_duration_ms desc
limit 30

Query Optimization Hints

Index Usage Check

-- Check if data skipping indices exist
select
    database,
    table,
    name as index_name,
    type,
    expr,
    granularity
from system.data_skipping_indices
where database = '{database}' and table = '{table}'

Mark Count for Query

For a specific slow query, check how many marks (granules) were read:

select
    query_id,
    read_rows,
    selected_marks,
    selected_parts,
    formatReadableSize(read_bytes) as read_bytes,
    round(read_rows / nullIf(selected_marks, 0)) as rows_per_mark
from system.query_log
where query_id = '{query_id}'
  and type = 'QueryFinish'

High selected_marks relative to result = index not selective enough.


Distributed Query Analysis

Distributed Query Performance

select
    query_id,
    is_initial_query,
    formatReadableSize(read_bytes) as read_bytes,
    read_rows,
    round(query_duration_ms / 1000, 1) as duration_sec,
    length(thread_ids) as threads,
    substring(query, 1, 80) as query_preview
from system.query_log
where type = 'QueryFinish'
  and event_date = today()
  and query like '%Distributed%' or is_initial_query = 0
order by query_duration_ms desc
limit 30

Ad-Hoc Query Guidelines

Required Safeguards

-- Always time-bound
where event_date >= today() - 1
-- or
where event_time > now() - interval 1 hour

-- Always limit
limit 100

-- Filter by type
where type = 'QueryFinish'  -- completed
where type like 'Exception%'  -- failed

Useful Filters

-- By user
where user = 'analytics_user'

-- By query pattern
where query ilike '%SELECT%FROM my_table%'

-- By duration threshold
where query_duration_ms > 10000  -- > 10 seconds

-- By normalized hash (for specific query pattern)
where normalized_query_hash = 1234567890

Cross-Module Triggers

Finding Load Module Reason
High memory queries altinity-expert-clickhouse-memory Memory limits/optimization
Reading too many parts altinity-expert-clickhouse-merges Part consolidation
Poor index selectivity altinity-expert-clickhouse-schema Index/ORDER BY design
Cache misses altinity-expert-clickhouse-caches Cache sizing
MV slow altinity-expert-clickhouse-ingestion MV optimization

Settings Reference

Setting Scope Notes
max_execution_time Query Query timeout
max_rows_to_read Query Limit rows scanned
max_bytes_to_read Query Limit bytes scanned
max_threads Query Parallelism
use_query_cache Query Enable query result caching
log_queries Server Enable query logging
log_queries_min_query_duration_ms Server Log threshold

Related Skills

Clickhouse Io

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

datacli

Clickhouse Io

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

datacli

Clinical Decision Support

Generate professional clinical decision support (CDS) documents for pharmaceutical and clinical research settings, including patient cohort analyses (biomarker-stratified with outcomes) and treatment recommendation reports (evidence-based guidelines with decision algorithms). Supports GRADE evidence grading, statistical analysis (hazard ratios, survival curves, waterfall plots), biomarker integration, and regulatory compliance. Outputs publication-ready LaTeX/PDF format optimized for drug develo

developmentdocumentcli

Geopandas

Python library for working with geospatial vector data including shapefiles, GeoJSON, and GeoPackage files. Use when working with geographic data for spatial analysis, geometric operations, coordinate transformations, spatial joins, overlay operations, choropleth mapping, or any task involving reading/writing/analyzing vector geographic data. Supports PostGIS databases, interactive maps, and integration with matplotlib/folium/cartopy. Use for tasks like buffer analysis, spatial joins between dat

artdatacli

Datacommons Client

Work with Data Commons, a platform providing programmatic access to public statistical data from global sources. Use this skill when working with demographic data, economic indicators, health statistics, environmental data, or any public datasets available through Data Commons. Applicable for querying population statistics, GDP figures, unemployment rates, disease prevalence, geographic entity resolution, and exploring relationships between statistical entities.

datacli

Clickhouse Io

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

datacli

Geopandas

Python library for working with geospatial vector data including shapefiles, GeoJSON, and GeoPackage files. Use when working with geographic data for spatial analysis, geometric operations, coordinate transformations, spatial joins, overlay operations, choropleth mapping, or any task involving reading/writing/analyzing vector geographic data. Supports PostGIS databases, interactive maps, and integration with matplotlib/folium/cartopy. Use for tasks like buffer analysis, spatial joins between dat

artdatacli

Datacommons Client

Work with Data Commons, a platform providing programmatic access to public statistical data from global sources. Use this skill when working with demographic data, economic indicators, health statistics, environmental data, or any public datasets available through Data Commons. Applicable for querying population statistics, GDP figures, unemployment rates, disease prevalence, geographic entity resolution, and exploring relationships between statistical entities.

datacli

Clinical Decision Support

Generate professional clinical decision support (CDS) documents for pharmaceutical and clinical research settings, including patient cohort analyses (biomarker-stratified with outcomes) and treatment recommendation reports (evidence-based guidelines with decision algorithms). Supports GRADE evidence grading, statistical analysis (hazard ratios, survival curves, waterfall plots), biomarker integration, and regulatory compliance. Outputs publication-ready LaTeX/PDF format optimized for drug develo

developmentdocumentcli

Clickhouse Query

Run ClickHouse queries for analytics, metrics analysis, and event data exploration. Use when you need to query ClickHouse directly, analyze metrics, check event tracking data, or test query performance. Read-only by default.

datacli

Skill Information

Category:Technical
Last Updated:1/20/2026