M4 Api
by hannesill
Use the M4 Python API to query clinical datasets (MIMIC-IV, eICU) programmatically. Triggers on "M4 API", "query MIMIC with Python", "clinical data analysis", "EHR data", "execute SQL on MIMIC", or when writing code to access clinical databases.
Skill Details
Repository Files
1 file in this skill directory
name: m4-api description: Use the M4 Python API to query clinical datasets (MIMIC-IV, eICU) programmatically. Triggers on "M4 API", "query MIMIC with Python", "clinical data analysis", "EHR data", "execute SQL on MIMIC", or when writing code to access clinical databases.
M4 Python API
The M4 Python API provides programmatic access to clinical datasets for code execution environments. It mirrors the MCP tools but returns native Python types (DataFrames, dicts) instead of formatted strings.
When to Use the API vs MCP Tools
Use the Python API when:
- Complex clinical analysis - Multi-step analyses that require intermediate results, joins across queries, or statistical computations
- Large result sets - Query results with thousands of rows can be stored in DataFrames without dumping into context
- Mathematical operations - Aggregations, percentile calculations, statistical tests, and counting that benefit from pandas/numpy
- Iterative exploration - Building up analysis through multiple queries where each step informs the next
Use MCP tools when:
- Simple one-off queries where the result fits comfortably in context
- Interactive exploration where you want to see results immediately
Required Workflow
You must follow this sequence:
set_dataset()- Select which dataset to query (REQUIRED FIRST)get_schema()/get_table_info()- Explore available tablesexecute_query()- Run SQL queries
from m4 import set_dataset, get_schema, get_table_info, execute_query
# Step 1: Always set dataset first
set_dataset("mimic-iv") # or "mimic-iv-demo", "eicu", "mimic-iv-note"
# Step 2: Explore schema
schema = get_schema()
print(schema['tables']) # List of table names
# Step 3: Inspect specific tables before querying
info = get_table_info("mimiciv_hosp.patients")
print(info['schema']) # DataFrame with column names, types
print(info['sample']) # DataFrame with sample rows
# Step 4: Execute queries
df = execute_query("SELECT gender, COUNT(*) as n FROM mimiciv_hosp.patients GROUP BY gender")
# Returns pd.DataFrame - use pandas operations freely
API Reference
Dataset Management
| Function | Returns | Description |
|---|---|---|
list_datasets() |
list[str] |
Available dataset names |
set_dataset(name) |
str |
Set active dataset (confirmation message) |
get_active_dataset() |
str |
Get current dataset name |
Tabular Data (requires TABULAR modality)
| Function | Returns | Description |
|---|---|---|
get_schema() |
dict |
{'backend_info': str, 'tables': list[str]} |
get_table_info(table, show_sample=True) |
dict |
{'schema': DataFrame, 'sample': DataFrame} |
execute_query(sql) |
DataFrame |
Query results as pandas DataFrame |
Clinical Notes (requires NOTES modality)
| Function | Returns | Description |
|---|---|---|
search_notes(query, note_type, limit, snippet_length) |
dict |
{'results': dict[str, DataFrame]} |
get_note(note_id, max_length) |
dict |
{'text': str, 'subject_id': int, ...} |
list_patient_notes(subject_id, note_type, limit) |
dict |
{'notes': dict[str, DataFrame]} |
Error Handling
M4 uses a hierarchy of exceptions. Catch specific types to handle errors appropriately:
M4Error (base)
├── DatasetError # Dataset doesn't exist or not configured
├── QueryError # SQL syntax error, table not found, query failed
└── ModalityError # Tool incompatible with dataset (e.g., notes on tabular-only)
Recovery patterns:
from m4 import execute_query, set_dataset, DatasetError, QueryError, ModalityError
try:
df = execute_query("SELECT * FROM mimiciv_hosp.patients")
except DatasetError as e:
# No dataset selected, or dataset not found
# Recovery: call set_dataset() first, or check list_datasets()
set_dataset("mimic-iv")
df = execute_query("SELECT * FROM mimiciv_hosp.patients")
except QueryError as e:
# SQL error or table not found
# Recovery: check table name with get_schema(), fix SQL syntax
print(f"Query failed: {e}")
except ModalityError as e:
# Tried notes function on tabular-only dataset
# Recovery: switch to dataset with NOTES modality
set_dataset("mimic-iv-note")
Dataset State
Important: Dataset selection is module-level state that persists across function calls.
set_dataset("mimic-iv")
df1 = execute_query("SELECT COUNT(*) FROM mimiciv_hosp.patients") # Uses mimic-iv
set_dataset("eicu")
df2 = execute_query("SELECT COUNT(*) FROM patient") # Uses eicu
MCP Tool Equivalence
The Python API mirrors MCP tools but with better return types:
| MCP Tool | Python Function | MCP Returns | Python Returns |
|---|---|---|---|
execute_query |
execute_query() |
Formatted string | pd.DataFrame |
get_database_schema |
get_schema() |
Formatted string | dict with tables list |
get_table_info |
get_table_info() |
Formatted string | dict with schema/sample DataFrames |
Use the Python API when you need to:
- Chain queries in analysis pipelines
- Perform pandas operations on results
- Avoid parsing formatted output
NOTE: All queries use canonical schema.table names (e.g., mimiciv_hosp.patients, mimiciv_icu.icustays). These names work on both the local DuckDB backend and the BigQuery backend — no need to adjust table names per backend.
Related Skills
Xlsx
Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Analyzing Financial Statements
This skill calculates key financial ratios and metrics from financial statement data for investment analysis
Data Storytelling
Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.
Kpi Dashboard Design
Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.
Dbt Transformation Patterns
Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.
Sql Optimization Patterns
Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.
Clinical Decision Support
Generate professional clinical decision support (CDS) documents for pharmaceutical and clinical research settings, including patient cohort analyses (biomarker-stratified with outcomes) and treatment recommendation reports (evidence-based guidelines with decision algorithms). Supports GRADE evidence grading, statistical analysis (hazard ratios, survival curves, waterfall plots), biomarker integration, and regulatory compliance. Outputs publication-ready LaTeX/PDF format optimized for drug develo
Anndata
This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.
