name: data-wrangler description: > Transform and export data using DuckDB SQL. Read CSV/Parquet/JSON/Excel/databases, apply SQL transformations (joins, aggregations, PIVOT/UNPIVOT, sampling), and optionally write results to files. Use when the user wants to: (1) Clean, filter, or transform data, (2) Join multiple data sources, (3) Convert between formats (CSV→Parquet, etc.), (4) Create partitioned datasets, (5) Sample large datasets, (6) Export query results. Prefer this over in-context reasoning for datasets with thousands of rows or complex transformations.

Data Wrangler

Transform and export data using DuckDB SQL.

Usage - Command syntax and Windows escaping
Explore Mode - Quick data profiling
Query Mode - Return results to Claude
Write Mode - Export to files
Request/Response Format - JSON structure
Source Types - File, database, and cloud sources
Transformations - SQL patterns reference
Secrets - Secure credential handling

Usage

IMPORTANT - Windows Shell Escaping:

Always cd to the skill directory first
Use double quotes for echo with escaped inner quotes (\")
Use forward slashes in file paths

cd "<skill_directory>" && echo "{\"query\": \"SELECT * FROM 'D:/path/to/file.csv'\"}" | uv run scripts/query_duckdb.py

Explore Mode

Get schema, statistics, and sample in one call. Use before writing queries to understand data structure.

{"mode": "explore", "path": "D:/data/sales.csv"}

Response:

{
  "file": "D:/data/sales.csv",
  "format": "csv",
  "row_count": 15234,
  "columns": [
    {"name": "order_id", "type": "BIGINT", "null_count": 0, "null_percent": 0.0},
    {"name": "customer", "type": "VARCHAR", "null_count": 45, "null_percent": 0.3}
  ],
  "sample": "| order_id | customer | ... |\\n|----------|----------|-----|\\n| 1001     | Alice    | ... |"
}

Options:

sample_rows: Number of sample rows (default: 10, max: 100)
sources: For database tables (same as query mode)

Query Mode

Return results directly to Claude for analysis.

Direct File Queries

{"query": "SELECT * FROM 'data.csv' LIMIT 10"}

Multi-Source Joins

{
  "query": "SELECT s.*, p.category FROM sales s JOIN products p ON s.product_id = p.id",
  "sources": [
    {"type": "file", "alias": "sales", "path": "/data/sales.parquet"},
    {"type": "file", "alias": "products", "path": "/data/products.csv"}
  ]
}

Write Mode

Export query results to files. Add an output object to write instead of returning data.

Basic Write

{
  "query": "SELECT * FROM 'raw.csv' WHERE status = 'active'",
  "output": {
    "path": "D:/output/filtered.parquet",
    "format": "parquet"
  }
}

Write with Options

{
  "query": "SELECT *, YEAR(date) as year, MONTH(date) as month FROM 'events.csv'",
  "output": {
    "path": "D:/output/events/",
    "format": "parquet",
    "options": {
      "compression": "zstd",
      "partition_by": ["year", "month"],
      "overwrite": true
    }
  }
}

Output Formats

Format	Options
`parquet`	`compression` (zstd/snappy/gzip/lz4), `partition_by`, `row_group_size`
`csv`	`header` (default: true), `delimiter`, `compression`, `partition_by`
`json`	`array` (true=JSON array, false=newline-delimited)

Write Response

Response includes verification info - no need for follow-up queries:

{
  "success": true,
  "output_path": "D:/output/events/",
  "format": "parquet",
  "rows_written": 15234,
  "files_created": ["D:/output/events/year=2023/data_0.parquet", "..."],
  "total_size_bytes": 5678901,
  "duration_ms": 1234
}

Overwrite Protection

By default, existing files are not overwritten. Set options.overwrite: true to allow.

Request/Response Format

Request

{
  "query": "SQL statement",
  "sources": [...],
  "output": {"path": "...", "format": "..."},
  "options": {"max_rows": 200, "format": "markdown"},
  "secrets_file": "path/to/secrets.yaml"
}

Query Mode Options

max_rows: Maximum rows to return (default: 200)
max_bytes: Maximum response size (default: 200000)
format: markdown (default), json, records, or csv

Query Mode Response (markdown)

| column1 | column2 |
|---|---|
| value1 | value2 |

Query Mode Response (json)

{
  "schema": [{"name": "col1", "type": "INTEGER"}],
  "rows": [[1, "value"]],
  "truncated": false,
  "warnings": [],
  "error": null
}

Source Types

File (auto-detects CSV, Parquet, JSON, Excel)

{"type": "file", "alias": "data", "path": "/path/to/file.csv"}

Glob patterns: {"path": "/logs/**/*.parquet"}

Custom delimiter: {"path": "/data/file.csv", "delimiter": "|"}

PostgreSQL

{
  "type": "postgres", "alias": "users",
  "host": "host", "port": 5432, "database": "db",
  "user": "user", "password": "pass",
  "schema": "public", "table": "users"
}

MySQL

{
  "type": "mysql", "alias": "orders",
  "host": "host", "port": 3306, "database": "db",
  "user": "user", "password": "pass", "table": "orders"
}

SQLite

{"type": "sqlite", "alias": "data", "path": "/path/to/db.sqlite", "table": "tablename"}

S3

{
  "type": "s3", "alias": "logs",
  "url": "s3://bucket/path/*.parquet",
  "aws_region": "us-east-1",
  "aws_access_key_id": "...", "aws_secret_access_key": "..."
}

Transformations

See TRANSFORMS.md for advanced patterns including:

PIVOT/UNPIVOT - Reshape data between wide and long formats
Sampling - Random subsets with USING SAMPLE n ROWS or SAMPLE 10%
Dynamic columns - EXCLUDE, REPLACE, COLUMNS('pattern')
Window functions - Running totals, rankings, moving averages
Date/time operations - Extraction, arithmetic, formatting

Quick Examples

-- PIVOT: Convert rows to columns
PIVOT sales ON quarter USING SUM(revenue) GROUP BY region

-- UNPIVOT: Convert columns to rows
UNPIVOT data ON q1, q2, q3, q4 INTO NAME quarter VALUE amount

-- Sampling: Random 10% with reproducible seed
SELECT * FROM large_table USING SAMPLE 10% REPEATABLE(42)

-- Dynamic columns: Exclude sensitive, transform email
SELECT * EXCLUDE (ssn) REPLACE (LOWER(email) AS email) FROM users

Workflow

Inspect schema: DESCRIBE SELECT * FROM 'file.csv'
Preview data: SELECT * FROM 'file.csv' LIMIT 5
Transform: Apply filters, joins, aggregations
Export (optional): Add output to write results

Error Handling

If error is non-null: Check column names, verify paths
If truncated is true: Use more aggregation or filters
If write fails with "exists": Set options.overwrite: true

Secrets

Store credentials securely in YAML. See SECRETS.md for complete documentation.

{
  "query": "SELECT * FROM customers LIMIT 10",
  "secrets_file": "D:/path/to/secrets.yaml",
  "sources": [{
    "type": "postgres", "alias": "customers",
    "secret": "my_postgres", "table": "customers"
  }]
}

Supported: PostgreSQL, MySQL, S3, GCS, Azure, R2, HTTP, HuggingFace, Iceberg, DuckLake.

Data Wrangler

Skill Details

Repository Files

Data Wrangler

Contents

Usage

Explore Mode

Query Mode

Direct File Queries

Multi-Source Joins

Write Mode

Basic Write

Write with Options

Output Formats

Write Response

Overwrite Protection

Request/Response Format

Request

Query Mode Options

Query Mode Response (markdown)

Query Mode Response (json)

Source Types

File (auto-detects CSV, Parquet, JSON, Excel)

PostgreSQL

MySQL

SQLite

S3

Transformations

Quick Examples

Workflow

Error Handling

Secrets

Related Skills

Xlsx

Clickhouse Io

Clickhouse Io

Analyzing Financial Statements

Data Storytelling

Kpi Dashboard Design

Dbt Transformation Patterns

Sql Optimization Patterns

Anndata

Xlsx

Skill Information