Altinity Expert Clickhouse Memory
by ntk148v
Diagnose ClickHouse RAM usage, OOM errors, memory pressure, and allocation patterns. Use for memory-related issues and out-of-memory errors.
Skill Details
Repository Files
1 file in this skill directory
name: altinity-expert-clickhouse-memory description: Diagnose ClickHouse RAM usage, OOM errors, memory pressure, and allocation patterns. Use for memory-related issues and out-of-memory errors.
Memory Usage and OOM Diagnostics
Diagnose RAM usage, memory pressure, OOM risks, and memory allocation patterns.
Quick Diagnostics
1. Current Memory Overview
with
(select value from system.asynchronous_metrics where metric = 'OSMemoryTotal') as total,
(select value from system.asynchronous_metrics where metric = 'MemoryResident') as resident,
(select value from system.asynchronous_metrics where metric = 'OSMemoryFreeWithoutCached') as free_without_cached,
(select value from system.asynchronous_metrics where metric = 'OSMemoryCached') as cached,
(select value from system.asynchronous_metrics where metric = 'OSMemoryBuffers') as buffers
select
formatReadableSize(total) as total_ram,
formatReadableSize(resident) as clickhouse_resident,
formatReadableSize(free_without_cached) as free,
formatReadableSize(cached) as os_cached,
formatReadableSize(buffers) as os_buffers,
round(100.0 * resident / total, 1) as clickhouse_pct,
multiIf(resident > total * 0.9, 'Critical', resident > total * 0.8, 'Major', 'OK') as severity
2. Memory Breakdown by Component
select
'Dictionaries' as component,
formatReadableSize(sum(bytes_allocated)) as size,
count() as count
from system.dictionaries
union all
select
'Memory Tables (Memory/Set/Join)' as component,
formatReadableSize(assumeNotNull(sum(total_bytes))) as size,
count() as count
from system.tables
where engine in ('Memory', 'Set', 'Join')
union all
select
'Primary Keys' as component,
formatReadableSize(sum(primary_key_bytes_in_memory)) as size,
sum(marks) as count
from system.parts
where active
union all
select
'In-Memory Parts' as component,
formatReadableSize(sumIf(data_uncompressed_bytes, part_type = 'InMemory')) as size,
countIf(part_type = 'InMemory') as count
from system.parts
where active
union all
select
'Active Merges' as component,
formatReadableSize(sum(memory_usage)) as size,
count() as count
from system.merges
union all
select
'Running Queries' as component,
formatReadableSize(sum(memory_usage)) as size,
count() as count
from system.processes
union all
select
'Mark Cache' as component,
formatReadableSize(value) as size,
0 as count
from system.asynchronous_metrics
where metric = 'MarkCacheBytes'
union all
select
'Uncompressed Cache' as component,
formatReadableSize(value) as size,
0 as count
from system.asynchronous_metrics
where metric = 'UncompressedCacheBytes'
order by size desc
3. Memory Allocation Audit
with
(select sum(bytes_allocated) from system.dictionaries) as dictionaries,
(select assumeNotNull(sum(total_bytes)) from system.tables where engine in ('Memory','Set','Join')) as mem_tables,
(select sum(primary_key_bytes_in_memory) from system.parts) as pk_memory,
(select value from system.asynchronous_metrics where metric = 'OSMemoryTotal') as total_ram
select
'Dictionaries + Memory Tables' as check_name,
formatReadableSize(dictionaries + mem_tables) as used,
round(100.0 * (dictionaries + mem_tables) / total_ram, 1) as pct,
multiIf(pct > 30, 'Critical', pct > 25, 'Major', pct > 20, 'Moderate', 'OK') as severity
union all
select
'Primary Keys' as check_name,
formatReadableSize(pk_memory) as used,
round(100.0 * pk_memory / total_ram, 1) as pct,
multiIf(pct > 30, 'Critical', pct > 25, 'Major', pct > 20, 'Moderate', 'OK') as severity
Current Memory Consumers
Top Memory-Using Queries
select
initial_query_id,
user,
round(elapsed, 1) as elapsed_sec,
formatReadableSize(memory_usage) as memory,
formatReadableSize(peak_memory_usage) as peak_memory,
substring(query, 1, 80) as query_preview
from system.processes
order by peak_memory_usage desc
limit 15
Top Memory-Using Dictionaries
select
database,
name,
formatReadableSize(bytes_allocated) as memory,
element_count as elements,
source,
loading_duration
from system.dictionaries
order by bytes_allocated desc
limit 20
Top Memory-Using Tables (Memory Engine)
select
database,
name,
engine,
formatReadableSize(total_bytes) as size,
total_rows as rows
from system.tables
where engine in ('Memory', 'Set', 'Join')
order by total_bytes desc
limit 20
Top Primary Key Memory by Table
select
database,
table,
formatReadableSize(sum(primary_key_bytes_in_memory)) as pk_memory,
formatReadableSize(sum(primary_key_bytes_in_memory_allocated)) as pk_allocated,
sum(marks) as marks
from system.parts
where active
group by database, table
order by sum(primary_key_bytes_in_memory) desc
limit 20
Historical Analysis
Memory Usage Over Time
select
toStartOfFiveMinutes(event_time) as ts,
formatReadableSize(max(value)) as peak_memory
from system.asynchronous_metric_log
where metric = 'MemoryResident'
and event_time > now() - interval 4 hour
group by ts
order by ts
Recent Memory-Heavy Queries
select
event_time,
initial_query_id,
user,
formatReadableSize(memory_usage) as memory,
round(query_duration_ms / 1000, 1) as duration_sec,
substring(query, 1, 100) as query_preview
from system.query_log
where event_date >= today()
and type = 'QueryFinish'
order by memory_usage desc
limit 20
Memory Exceptions
select
event_time,
user,
exception_code,
substring(exception, 1, 200) as exception,
substring(query, 1, 100) as query_preview
from system.query_log
where type like 'Exception%'
and exception_code = 241 -- MEMORY_LIMIT_EXCEEDED
and event_date >= today() - 1
order by event_time desc
limit 30
Advanced: Memory Timeline Reconstruction
Reconstructs memory usage peaks by operation type from query_log + part_log:
with
now() - interval 6 hour as min_time,
now() as max_time,
interval 30 minute as time_frame_size
select
toStartOfInterval(event_timestamp, time_frame_size) as timeframe,
formatReadableSize(max(mem_overall)) as peak_ram,
formatReadableSize(maxIf(mem_by_type, event_type = 'Insert')) as inserts_ram,
formatReadableSize(maxIf(mem_by_type, event_type = 'Select')) as selects_ram,
formatReadableSize(maxIf(mem_by_type, event_type = 'MergeParts')) as merge_ram,
formatReadableSize(maxIf(mem_by_type, event_type = 'MutatePart')) as mutate_ram
from (
select
toDateTime(toUInt32(ts)) as event_timestamp,
t as event_type,
sum(mem) over (partition by t order by ts) as mem_by_type,
sum(mem) over (order by ts) as mem_overall
from (
-- From part_log (merges, mutations)
with arrayJoin([
(toFloat64(event_time_microseconds) - (duration_ms / 1000), toInt64(peak_memory_usage)),
(toFloat64(event_time_microseconds), -peak_memory_usage)
]) as data
select
cast(event_type, 'LowCardinality(String)') as t,
data.1 as ts,
data.2 as mem
from system.part_log
where event_time between min_time and max_time
and peak_memory_usage != 0
union all
-- From query_log
with arrayJoin([
(toFloat64(query_start_time_microseconds), toInt64(memory_usage)),
(toFloat64(event_time_microseconds), -memory_usage)
]) as data
select
query_kind as t,
data.1 as ts,
data.2 as mem
from system.query_log
where event_time between min_time and max_time
and memory_usage != 0
)
)
group by timeframe
order by timeframe
Memory Settings Analysis
Current Settings
select
name,
value,
description
from system.server_settings
where name in (
'max_server_memory_usage',
'max_server_memory_usage_to_ram_ratio',
'max_memory_usage',
'max_memory_usage_for_user',
'memory_tracker_fault_probability'
)
Memory Used by Other Processes
with
(select toFloat64(value) from system.server_settings where name = 'max_server_memory_usage_to_ram_ratio') as max_ratio,
(select value from system.asynchronous_metrics where metric = 'OSMemoryTotal') as total,
(select value from system.asynchronous_metrics where metric = 'OSMemoryFreeWithoutCached') as free_without_cached,
(select value from system.asynchronous_metrics where metric = 'MemoryResident') as clickhouse_resident,
(select value from system.asynchronous_metrics where metric = 'OSMemoryCached') as cached,
(select value from system.asynchronous_metrics where metric = 'OSMemoryBuffers') as buffers,
total - free_without_cached as total_used,
total_used - (buffers + cached + clickhouse_resident) as used_by_others
select
formatReadableSize(used_by_others) as other_processes_memory,
formatReadableSize(total * (1 - max_ratio)) as max_allowed_for_others,
round(100.0 * used_by_others / total, 1) as pct_of_total,
multiIf(used_by_others > total * (1 - max_ratio), 'Critical', 'OK') as severity,
if(severity = 'Critical', 'Other processes consuming RAM reserved for ClickHouse', 'OK') as note
Problem Investigation
High Memory from Aggregations
-- Find queries with high memory aggregations
select
normalized_query_hash,
count() as executions,
formatReadableSize(max(memory_usage)) as max_memory,
formatReadableSize(avg(memory_usage)) as avg_memory,
any(substring(query, 1, 100)) as query_sample
from system.query_log
where type = 'QueryFinish'
and event_date = today()
and memory_usage > 1000000000 -- > 1GB
and query ilike '%group by%'
group by normalized_query_hash
order by max(memory_usage) desc
limit 20
Solutions:
- Add
max_bytes_before_external_group_by - Use
max_threadspragma to limit parallelism - Restructure query to reduce group by cardinality
High Memory from JOINs
select
normalized_query_hash,
count() as executions,
formatReadableSize(max(memory_usage)) as max_memory,
any(substring(query, 1, 100)) as query_sample
from system.query_log
where type = 'QueryFinish'
and event_date = today()
and memory_usage > 1000000000
and query ilike '%join%'
group by normalized_query_hash
order by max(memory_usage) desc
limit 20
Solutions:
- Use
max_bytes_in_join - Consider
join_algorithm = 'partial_merge'or'auto' - Ensure smaller table on right side
Ad-Hoc Query Guidelines
Required Safeguards
-- Always time-bound log queries
where event_date >= today() - 1
-- Limit results
limit 100
Memory-Related Metrics
MemoryTracking- current tracked memoryMemoryResident- RSSOSMemoryTotal,OSMemoryFreeWithoutCached- system memory
Cross-Module Triggers
| Finding | Load Module | Reason |
|---|---|---|
| High merge memory | altinity-expert-clickhouse-merges |
Analyze merge patterns |
| Large dictionaries | altinity-expert-clickhouse-dictionaries |
Dictionary optimization |
| Cache too large | altinity-expert-clickhouse-caches |
Cache sizing |
| PK memory high | altinity-expert-clickhouse-schema |
ORDER BY optimization |
| Query OOMs | altinity-expert-clickhouse-reporting |
Query optimization |
Settings Reference
| Setting | Scope | Notes |
|---|---|---|
max_memory_usage |
Query | Per-query limit |
max_memory_usage_for_user |
User | Per-user aggregate |
max_server_memory_usage |
Server | Global limit |
max_server_memory_usage_to_ram_ratio |
Server | Auto-limit as % of RAM |
max_bytes_before_external_group_by |
Query | Spill aggregation to disk |
max_bytes_in_join |
Query | Spill join to disk |
Related Skills
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Clinical Decision Support
Generate professional clinical decision support (CDS) documents for pharmaceutical and clinical research settings, including patient cohort analyses (biomarker-stratified with outcomes) and treatment recommendation reports (evidence-based guidelines with decision algorithms). Supports GRADE evidence grading, statistical analysis (hazard ratios, survival curves, waterfall plots), biomarker integration, and regulatory compliance. Outputs publication-ready LaTeX/PDF format optimized for drug develo
Geopandas
Python library for working with geospatial vector data including shapefiles, GeoJSON, and GeoPackage files. Use when working with geographic data for spatial analysis, geometric operations, coordinate transformations, spatial joins, overlay operations, choropleth mapping, or any task involving reading/writing/analyzing vector geographic data. Supports PostGIS databases, interactive maps, and integration with matplotlib/folium/cartopy. Use for tasks like buffer analysis, spatial joins between dat
Datacommons Client
Work with Data Commons, a platform providing programmatic access to public statistical data from global sources. Use this skill when working with demographic data, economic indicators, health statistics, environmental data, or any public datasets available through Data Commons. Applicable for querying population statistics, GDP figures, unemployment rates, disease prevalence, geographic entity resolution, and exploring relationships between statistical entities.
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Geopandas
Python library for working with geospatial vector data including shapefiles, GeoJSON, and GeoPackage files. Use when working with geographic data for spatial analysis, geometric operations, coordinate transformations, spatial joins, overlay operations, choropleth mapping, or any task involving reading/writing/analyzing vector geographic data. Supports PostGIS databases, interactive maps, and integration with matplotlib/folium/cartopy. Use for tasks like buffer analysis, spatial joins between dat
Datacommons Client
Work with Data Commons, a platform providing programmatic access to public statistical data from global sources. Use this skill when working with demographic data, economic indicators, health statistics, environmental data, or any public datasets available through Data Commons. Applicable for querying population statistics, GDP figures, unemployment rates, disease prevalence, geographic entity resolution, and exploring relationships between statistical entities.
Clinical Decision Support
Generate professional clinical decision support (CDS) documents for pharmaceutical and clinical research settings, including patient cohort analyses (biomarker-stratified with outcomes) and treatment recommendation reports (evidence-based guidelines with decision algorithms). Supports GRADE evidence grading, statistical analysis (hazard ratios, survival curves, waterfall plots), biomarker integration, and regulatory compliance. Outputs publication-ready LaTeX/PDF format optimized for drug develo
Clickhouse Query
Run ClickHouse queries for analytics, metrics analysis, and event data exploration. Use when you need to query ClickHouse directly, analyze metrics, check event tracking data, or test query performance. Read-only by default.
