Sql
by 1plco
Database querying and analysis using SQLAlchemy 2.0+ with support for PostgreSQL, MySQL, SQLite, and SQL Server. Use when tasks require: (1) Querying databases via SQL, (2) Reading data into DataFrames for analysis, (3) Performing database operations with proper transaction handling. Environment variable with connection string must be set (check resources/RESOURCES.md for available databases and schemas).
Skill Details
Repository Files
8 files in this skill directory
name: sql description: "Database querying and analysis using SQLAlchemy 2.0+ with support for PostgreSQL, MySQL, SQLite, and SQL Server. Use when tasks require: (1) Querying databases via SQL, (2) Reading data into DataFrames for analysis, (3) Performing database operations with proper transaction handling. Environment variable with connection string must be set (check resources/RESOURCES.md for available databases and schemas)." license: "© 2025 Daisyloop Technologies Inc. See LICENSE.txt"
SQL Database Integration
Before You Start
Read
resources/RESOURCES.mdfirst to understand available databases, table structures, column types, and relationships before writing any queries. Alternatively, runscripts/inspect_schema.pyto discover schema programmatically.
Overview
Query databases using SQLAlchemy 2.0+ with pandas integration for analysis. Supports PostgreSQL, MySQL, SQLite, and SQL Server.
Quick Start
import os
import pandas as pd
from sqlalchemy import create_engine, text
# Connection string from environment (set via resources)
engine = create_engine(os.environ["DATABASE_URL"])
# Read query results into DataFrame
df = pd.read_sql("SELECT * FROM users LIMIT 10", engine)
print(df.head())
Connection Patterns
Using Environment Variables
Connection strings are stored as secret resources and injected as environment variables. Check resources/RESOURCES.md for the variable name.
import os
from sqlalchemy import create_engine
# Get connection string from environment
conn_str = os.environ["DATABASE_URL"] # or your specific env var name
engine = create_engine(conn_str)
Engine-Specific Patterns
See references/engines.md for connection string formats and driver-specific details for:
- PostgreSQL (psycopg)
- MySQL (pymysql)
- SQLite (built-in)
- SQL Server (pyodbc)
Read Operations
Basic Queries with pandas
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine(os.environ["DATABASE_URL"])
# Simple query
df = pd.read_sql("SELECT * FROM orders WHERE status = 'pending'", engine)
# With parameters (safe from SQL injection)
df = pd.read_sql(
"SELECT * FROM orders WHERE customer_id = :cid",
engine,
params={"cid": 123}
)
Using SQLAlchemy Core
from sqlalchemy import create_engine, text
engine = create_engine(os.environ["DATABASE_URL"])
with engine.connect() as conn:
result = conn.execute(text("SELECT * FROM products WHERE price > :min_price"), {"min_price": 100})
for row in result:
print(row)
Write Operations
⚠️ EXTREME CAUTION REQUIRED
Write operations (INSERT, UPDATE, DELETE) cannot be undone. Always:
- Use
--dry-runfirst to preview changes without committing- Verify the WHERE clause - a missing or wrong WHERE affects all rows
- Start with SELECT - run the equivalent SELECT to see affected rows
- Back up data before bulk updates on production
# ALWAYS dry-run first uv run scripts/execute_sql.py "UPDATE users SET active = false WHERE id = 123" --dry-run
Transactions
from sqlalchemy import create_engine, text
engine = create_engine(os.environ["DATABASE_URL"])
# Transaction with automatic commit/rollback
with engine.begin() as conn:
conn.execute(text("UPDATE orders SET status = :status WHERE id = :id"), {"status": "shipped", "id": 123})
conn.execute(text("INSERT INTO order_history (order_id, event) VALUES (:oid, :event)"), {"oid": 123, "event": "shipped"})
# Commits automatically if no exception, rolls back on error
Insert with Returning
with engine.begin() as conn:
result = conn.execute(
text("INSERT INTO users (name, email) VALUES (:name, :email) RETURNING id"),
{"name": "Alice", "email": "alice@example.com"}
)
new_id = result.scalar()
print(f"Created user with ID: {new_id}")
Further Analysis with Pandas
After loading data, use pandas for analysis:
import pandas as pd
from sqlalchemy import create_engine
engine = create_engine(os.environ["DATABASE_URL"])
# Load data
df = pd.read_sql("SELECT * FROM sales", engine)
# Analysis
summary = df.groupby("region").agg({
"revenue": ["sum", "mean"],
"quantity": "sum"
})
# Filter and transform
recent = df[df["date"] >= "2024-01-01"]
recent["margin"] = recent["revenue"] - recent["cost"]
# Export results
summary.to_excel("output/sales_summary.xlsx")
Large Datasets with Chunking
# Process large tables in chunks
chunks = pd.read_sql("SELECT * FROM large_table", engine, chunksize=10000)
for chunk in chunks:
# Process each chunk
processed = chunk[chunk["status"] == "active"]
# ... further processing
Best Practices
- Always parameterize queries - Never use f-strings or string concatenation for user inputs
- Use transactions for writes - Ensures atomicity and enables rollback
- Check schema first - Read
resources/RESOURCES.mdbefore writing queries - Close connections - Use context managers (
with) to ensure proper cleanup - Limit result sets - Use
LIMITduring development to avoid pulling large datasets
Scripts
Ready-to-use scripts for common database operations:
scripts/inspect_schema.py- Discover tables, columns, keys, and indexesscripts/query_to_dataframe.py- Execute queries and export as CSV/JSONscripts/execute_sql.py- Run write operations with transaction safety and dry-run
# Examples
uv run scripts/inspect_schema.py --table users
uv run scripts/query_to_dataframe.py "SELECT * FROM orders" --format csv
uv run scripts/execute_sql.py "UPDATE users SET active = true" --dry-run
References
references/engines.md- Connection strings, SQL dialect differences, and gotchasreferences/patterns.md- Advanced SQL patterns:- Window functions (ROW_NUMBER, LAG/LEAD, running totals)
- CTEs (recursive and non-recursive)
- Pivoting data
- Date/time operations by engine
- JSON operations by engine
Related Skills
Xlsx
Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Analyzing Financial Statements
This skill calculates key financial ratios and metrics from financial statement data for investment analysis
Data Storytelling
Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.
Kpi Dashboard Design
Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.
Dbt Transformation Patterns
Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.
Sql Optimization Patterns
Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.
Anndata
This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.
Xlsx
Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.
