Stata Mcp
by tmonk
Run or debug Stata workflows through the local io.github.tmonk/mcp-stata server. Use when users mention Stata commands, .do files, r()/e() results, dataset inspection, Stata graph exports, or data browsing with sorting/filtering.
Skill Details
Repository Files
1 file in this skill directory
name: stata-mcp description: Run or debug Stata workflows through the local io.github.tmonk/mcp-stata server. Use when users mention Stata commands, .do files, r()/e() results, dataset inspection, Stata graph exports, or data browsing with sorting/filtering.
Stata MCP Skill
Instructions
- Ensure the
stataMCP server is registered (see project README for config) and request it if not already active. - When the user asks for Stata work:
- Use
run_commandfor ad-hoc syntax (trace=Truefor call stacks,raw=Truefor plain output). - Use
load_databefore analyses that require datasets. - Use
get_data,describe,codebook, orget_variable_listto inspect data. - Use
run_do_filefor provided.doscripts. - Use
export_graph/export_graphs_allfor visualization requests. - Use
get_helpwhen the user wants Stata documentation. - Use
get_stored_resultsto returnr()/e()scalars/macros after commands for validation. - Use
read_logto tail or retrieve output from long-running commands. - Use
get_ui_channelto obtain a localhost HTTP endpoint for high-volume data browsing.
- Use
- Surface
rc/stderrinfo back to the user, referencingr()/e()codes. - If Stata isn't auto-discovered, remind the user to set
STATA_PATH(examples in README).
Tool quick reference
Command Execution
-
run_command(code, echo=True, as_json=True, trace=False, raw=False, max_output_lines=None): Run Stata syntax.code: The Stata command(s) to execute.echo: Include the command itself in output (default: True).as_json: Return JSON envelope with rc/stdout/stderr/error (default: True).trace: Enableset trace onfor deeper error diagnostics (default: False).raw: Return plain stdout/error message instead of JSON (default: False).max_output_lines: Truncate output to this many lines (default: None for no truncation).- Note: Always writes output to a temporary log file and emits a
notifications/logMessagewith{"event":"log_path","path":"..."}so the client can tail it locally.
-
run_do_file(path, echo=True, as_json=True, trace=False, raw=False, max_output_lines=None): Execute .do files.path: Path to the .do file.echo: Include commands in output (default: True).as_json: Return JSON envelope (default: True).trace: Enable trace mode for debugging (default: False).raw: Return plain output instead of JSON (default: False).max_output_lines: Truncate output to this many lines (default: None).- Note: Always writes output to a temporary log file and emits incremental
notifications/progresswhen the client provides a progress token/callback.
-
read_log(path, offset=0, max_bytes=65536): Read a slice of a previously-provided log file.path: Path to the log file (fromnotifications/logMessage).offset: Byte offset to start reading from (default: 0).max_bytes: Maximum bytes to read (default: 65536).- Returns JSON:
path,offset,next_offset,data.
Data Loading & Inspection
-
load_data(source, clear=True, as_json=True, raw=False, max_output_lines=None): Load data using sysuse/webuse/use heuristics.source: Dataset name, URL, or file path (e.g., "auto", "webuse nlsw88", "/path/to/file.dta").clear: Append, clearto replace existing data (default: True).as_json: Return JSON envelope (default: True).raw: Return plain output (default: False).max_output_lines: Truncate output to this many lines (default: None).- Note: After loading, use UI channel for advanced filtering/sorting at scale.
-
get_data(start=0, count=50): Retrieve a slice of the active dataset as JSON.start: Zero-based index of first observation (default: 0).count: Number of observations to retrieve (default: 50, max: 500).- Note: For advanced sorting/filtering at scale, use the UI channel endpoints (see
get_ui_channel()).
-
describe(): Return variable descriptions, storage types, and labels. -
get_variable_list(): Return JSON list of all variables with names, labels, and types. -
codebook(variable, as_json=True, trace=False, raw=False, max_output_lines=None): Return codebook/summary for a specific variable.variable: Variable name to describe.as_json: Return JSON envelope (default: True).trace: Enable trace mode (default: False).raw: Return plain output (default: False).max_output_lines: Truncate output to this many lines (default: None).
Graph Management
-
list_graphs(): List all graphs in Stata's memory with active graph marked.- Note: Graphs are automatically cached during command execution for instant exports.
-
export_graph(graph_name=None, format="pdf"): Export a stored graph to file.graph_name: Name of graph to export (fromlist_graphs); if None, exports active graph.format: Output format—"pdf" (default) or "png". Use "png" to view plots directly.
-
export_graphs_all(): Export all graphs in memory. Returns file paths.
Help & Results
-
get_help(topic, plain_text=False): Return Stata help text.topic: Command or help topic (e.g., "regress", "graph").plain_text: Return plain text instead of Markdown (default: False).
-
get_stored_results(): Return currentr()ande()results as JSON after a command.
Session Management
create_session(session_id): Manually create a new Stata session.list_sessions(): List all active sessions and their status (running, idle, etc.).stop_session(session_id): Terminate and clean up a specific session.break_session(session_id="default"): Interrupt the currently executing command in a session.- Use this tool when a command is taking too long or you want to stop a long-running loop without losing data already in memory.
- Follow-up with
read_logto see where execution stopped.
UI Data Browser
get_ui_channel(): Return a short-lived localhost HTTP endpoint + bearer token for the UI-only data browser.- Returns JSON with
baseUrl,token,expiresAt, andcapabilities. - Intended for VS Code extension UI to browse data at high volume (paging, filtering, sorting) without sending large payloads over MCP.
- Loopback only (binds to
127.0.0.1), requires bearer auth. - Key endpoints (all require
Authorization: Bearer <token>header):GET /v1/dataset: Dataset identity and stateGET /v1/vars: Variable metadataPOST /v1/page: Page data with optional sorting (sortByparameter)POST /v1/arrow: Binary Arrow IPC streamPOST /v1/views: Create filtered viewPOST /v1/views/:viewId/page: Page within filtered view (supports sorting)POST /v1/views/:viewId/arrow: Arrow stream from filtered viewDELETE /v1/views/:viewId: Delete viewPOST /v1/filters/validate: Validate filter expression
- Sorting: Use
sortByarray in page requests (e.g.,["price"]for ascending,["-price"]for descending,["foreign", "-price"]for multi-level) - Filtering: Filter expressions use Python boolean operators (
==,!=,<,>,and,or); Stata-style&/|also accepted - Server limits: maxLimit=500, maxVars=32767, maxChars=500, maxRequestBytes=1000000, maxArrowLimit=1000000
- Dataset tracking:
datasetIdused for cache invalidation; changing dataset invalidates view handles
- Returns JSON with
Cancellation
- Clients may cancel an in-flight request by sending the MCP notification
notifications/cancelledwithparams.requestIdset to the original tool call ID. - Pass a
_meta.progressTokenwhen invoking the tool if you want progress updates (optional). - Cancellation is best-effort and depends on Stata surfacing
BreakError.
Error Reporting
- All tools executing Stata commands support JSON envelopes (
as_json=true) containing:rc: Return code from r()/c(rc)stdout: Standard outputstderr: Standard error (captures "red text")message: Error messageline: Line number (when Stata reports it)command: The command that was executedlog_path: Path to log file for streaming (when applicable)snippet: Excerpt of error output
- Stata-specific error codes (
r(XXX)) are parsed and preserved - Use
trace=trueto enableset trace onfor detailed program-defined error diagnostics - Set
MCP_STATA_LOGLEVELenvironment variable (e.g.,DEBUG,INFO) to control server logging
MCP Resources
The server exposes these resources for MCP clients:
stata://data/summary→summarizestata://data/metadata→describestata://graphs/list→ graph liststata://variables/list→ variable liststata://results/stored→ stored r()/e() results
Graph review workflow
- Call
list_graphs()to see available plots and identify the active graph. - Use
export_graphs_all()to fetch file paths for every graph; view them directly in the client. - For a single plot, call
export_graph(graph_name="GraphName", format="png")to get a viewable file. - Compare the rendered PNGs to the user spec (titles, axes labels, legends, colors, filters); state whether the graph matches and what to change.
Examples
Run a regression
# Load sample data and run regression
load_data("auto")
run_command("regress price mpg")
get_stored_results() # Retrieve coefficients and statistics
Export a histogram
# Create and export a graph
run_command("histogram price")
list_graphs() # Confirm graph exists
export_graph(graph_name="Graph", format="png") # Export for viewing
Debug a do-file
run_do_file("/path/to/analysis.do", trace=True)
Inspect data structure
load_data("nlsw88", clear=True)
describe()
get_variable_list()
codebook("wage")
get_data(start=0, count=10)
Read log output from long-running command
# After run_command emits a log_path notification
read_log("/tmp/stata_log_abc123.log", offset=0)
# Continue reading with next_offset for incremental output
read_log("/tmp/stata_log_abc123.log", offset=4096)
Advanced data browsing with sorting and filtering
# Get UI channel for high-volume data operations
get_ui_channel() # Returns baseUrl, token, expiresAt
# Example UI channel usage (requires HTTP client):
# POST {baseUrl}/v1/page with Authorization: Bearer {token}
# Body: {"datasetId":"...","offset":0,"limit":50,"vars":["price","mpg"],"sortBy":["-price"]}
# Create filtered view for price < 5000
# POST {baseUrl}/v1/views
# Body: {"datasetId":"...","frame":"default","filterExpr":"price < 5000"}
# Page through filtered view with sorting
# POST {baseUrl}/v1/views/{viewId}/page
# Body: {"offset":0,"limit":50,"vars":["price","mpg"],"sortBy":["-price"]}
Related Skills
Xlsx
Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Analyzing Financial Statements
This skill calculates key financial ratios and metrics from financial statement data for investment analysis
Data Storytelling
Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.
Kpi Dashboard Design
Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.
Dbt Transformation Patterns
Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.
Sql Optimization Patterns
Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.
Anndata
This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.
Xlsx
Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.
