Aps Doc Core
by treasure-data
Core documentation generation patterns and framework for Treasure Data pipeline layers. Provides shared templates, quality validation, testing framework, and Confluence integration used by all layer-specific documentation skills.
Skill Details
Repository Files
1 file in this skill directory
name: aps-doc-core description: Core documentation generation patterns and framework for Treasure Data pipeline layers. Provides shared templates, quality validation, testing framework, and Confluence integration used by all layer-specific documentation skills.
APS Documentation Core
Core documentation generation framework providing shared patterns, templates, and utilities used by all APS layer-specific documentation skills (ingestion, hist-union, staging, id-unification, golden).
When to Use This Skill
Use this skill when:
- Creating custom documentation that doesn't fit standard layer types
- Understanding core documentation generation principles
- Extending layer-specific skills with new patterns
- Implementing custom documentation workflows
Note: For layer-specific documentation, use the specialized skills:
aps-doc-skills:ingestionfor ingestion layersaps-doc-skills:hist-unionfor hist-union workflowsaps-doc-skills:stagingfor staging transformationsaps-doc-skills:id-unificationfor ID unificationaps-doc-skills:goldenfor golden layers
🚨 MANDATORY: Codebase Access Required
WITHOUT codebase access = NO documentation. Period.
If no codebase access provided:
I cannot create technical documentation without codebase access.
Required:
- Directory path to code
- Access to relevant files (.dig, .sql, .yml)
Without access, I cannot extract real configurations, SQL, or workflow logic.
Provide path: "Code is in /path/to/layer/"
Before proceeding:
- Ask for codebase path if not provided
- Verify files exist using Glob/Read
- STOP if cannot read files
All documentation MUST contain real data from codebase:
- Actual table/database/column names
- Real file paths with line numbers
- Working code examples from actual files
- Extracted configurations, not placeholders
NO generic templates. Only production-ready, codebase-driven documentation.
Core Principles
1. Template-Based Documentation
All documentation follows a template-driven approach:
Process:
- User provides or references existing Confluence template
- Skill analyzes template structure (sections, formatting, patterns)
- Skill explores codebase to extract implementation details
- Skill generates documentation matching template with actual data
- Skill validates and publishes to Confluence
Benefits:
- Consistency across all documentation
- Reuses proven documentation structures
- Adapts to organization-specific templates
2. Three-Phase Documentation Workflow
Phase 1: Template Analysis
1. Fetch existing Confluence page (if provided)
2. Extract structure:
- Section headings hierarchy
- Content organization patterns
- Tables and formatting styles
- Code block conventions
3. Identify required sections
4. Map sections to codebase elements
Phase 2: Codebase Exploration
1. Locate relevant files:
- Workflow files (.dig)
- Configuration files (.yml)
- SQL/transformation files (.sql)
- README and documentation and others if any
2. Extract metadata:
- Table schemas (columns, types, nullability)
- Data lineage (source → destination)
- Dependencies (what depends on what)
- Configuration parameters
3. Analyze patterns:
- Processing logic (incremental, full, batch)
- Error handling strategies
- Performance optimizations
- Security patterns (PII, auth)
Phase 3: Documentation Generation
1. Create outline matching template
2. Populate sections with codebase data:
- Use actual file names and paths
- Include real configuration examples
- Show actual SQL transformations
- Document real table/column names
3. Add visual elements:
- Mermaid diagrams (flow, ERD, dependencies)
- Tables (configuration, mappings, metrics)
- Code blocks (with syntax highlighting)
4. Validate quality (60+ checks)
5. Test code examples (execute SQL, validate YAML)
6. Publish to Confluence
Standard Documentation Template
Use this structure as the base template for all layer documentation:
# {Layer Name}
## Overview
Brief introduction explaining purpose and key characteristics.
### Key Characteristics
* **Engine**: Processing engine (Presto/Trino, Hive, etc.)
* **Architecture**: Processing approach (loop-based, parallel, etc.)
* **Processing Mode**: Incremental/Full/Batch
* **Location**: File system path
---
## Architecture Overview
### Directory Structure
layer_directory/ ├── main_workflow.dig ├── config/ │ └── configuration.yml ├── sql/ or queries/ │ └── transformation.sql └── README.md
### Core Components
Detailed description of each component.
---
## Processing Flow
### Initial Load (if applicable)
Step-by-step description of first-time processing.
### Incremental Load
Step-by-step description of ongoing processing.
---
## Configuration
Complete configuration reference with examples.
---
## Monitoring and Troubleshooting
### Monitoring Queries
Executable SQL queries for checking status.
### Common Issues
Issue descriptions with solutions.
---
## Best Practices
Numbered list of recommendations.
---
## Summary
Key takeaways and benefits.
Visual Diagram Generation
Generate Mermaid diagrams to visualize architecture:
Data Flow Diagram
graph LR
A[Source] -->|Process| B[Destination]
B -->|Transform| C[Output]
Workflow Execution Graph
graph TD
Start[Start] --> Task1[Task 1]
Task1 --> Parallel{Parallel?}
Parallel -->|Yes| Task2A[Task 2A]
Parallel -->|Yes| Task2B[Task 2B]
Task2A --> End[End]
Task2B --> End
Entity Relationship Diagram
erDiagram
TABLE_A ||--o{ TABLE_B : "has"
TABLE_B ||--|| TABLE_C : "references"
Dependency Tree
graph TB
A[Source A] --> D[Target D]
B[Source B] --> D
C[Source C] --> E[Target E]
D --> F[Final]
E --> F
Metadata Extraction Patterns
Table Schema Documentation
Extract and document schemas:
-- Get schema
DESCRIBE {database}.{table};
SHOW COLUMNS FROM {database}.{table};
Document in table format:
| Column | Type | Nullable | Description | Source | Transformation |
|---|---|---|---|---|---|
| id | BIGINT | NO | Primary key | source.id | CAST(id AS BIGINT) |
| VARCHAR | YES | Email address | source.email | LOWER(TRIM(email)) |
Volume Metrics
SELECT
'{table}' as table_name,
COUNT(*) as total_rows,
COUNT(DISTINCT primary_key) as unique_records,
MIN(time) as earliest_record,
MAX(time) as latest_record
FROM {database}.{table};
Column-Level Lineage
column_name:
Source: source_system.source_table.source_column
→ Raw: raw_db.raw_table.column (as-is)
→ Staging: stg_db.stg_table.column_std (UPPER(TRIM(column)))
→ Unified: unif_db.unif_table.column (from staging)
→ Golden: gld_db.gld_table.column (SCD Type 2)
Quality Validation Framework
60+ Quality Checks
Before publishing, validate:
Content Accuracy (8 checks)
- All file paths exist
- All table names match database
- All column names exist in schemas
- Configuration examples are valid
- SQL queries are syntactically correct
- Database names are accurate
- Data types correctly documented
- Incremental fields verified
Functional Validation (7 checks)
- Monitoring queries execute successfully
- Example commands run without errors
- Configuration examples are copy-paste ready
- SQL transformations produce expected results
- Incremental logic correctly documented
- Deduplication logic verified
- Error handling documented and tested
Structure & Formatting (7 checks)
- Section headings match template
- Code blocks have syntax highlighting
- Tables properly formatted
- Links work correctly
- Mermaid diagrams render
- Collapsible sections work
- Table of contents complete
Completeness (8 checks)
- All required sections included
- Troubleshooting section complete
- Examples actionable and realistic
- All dependencies documented
- All parameters explained
- Performance considerations included
- Security aspects documented
- SLA/freshness requirements documented
Metadata Validation (6 checks)
- Table schemas included
- Data lineage diagram included
- Dependency graph complete
- Volume metrics documented
- Processing SLAs documented
- Data retention policies noted
User Experience (6 checks)
- Easy to navigate
- Technical terms explained
- Examples relevant
- Next steps clear
- Related documentation linked
- Contact information provided
Documentation Testing Framework
Test 1: Code Examples Validation
-- Test monitoring queries
SELECT * FROM {database}.{log_table}
WHERE source = '{source}'
ORDER BY time DESC
LIMIT 10;
-- Test schema queries
DESCRIBE {database}.{table};
-- Test volume queries
SELECT COUNT(*) FROM {database}.{table};
Test 2: Configuration Validation
# Validate YAML syntax
python3 -c "import yaml; yaml.safe_load(open('config.yml'))"
# Check for placeholders
grep -r "your_\|example_\|placeholder" config.yml
Test 3: Link Validation
- Extract all Confluence links
- Test each returns valid page
- Verify hierarchy correct
- Check cross-references
Test 4: Diagram Rendering
- Extract Mermaid blocks
- Validate syntax
- Test rendering in Confluence
- Verify accuracy
Test 5: Accuracy Verification
-- Verify table exists
SHOW TABLES IN {database} LIKE '{table}';
-- Verify columns exist
SELECT column_name FROM information_schema.columns
WHERE table_schema = '{database}'
AND table_name = '{table}';
Test 6: Completeness Check
- Compare with template structure
- Verify mandatory sections present
- Check layer-specific requirements
- Validate examples for each section
Confluence Integration
Creating Pages
Tool: mcp__atlassian__createConfluencePage
Parameters:
cloudId: "https://treasure-data.atlassian.net"
spaceId: "{numeric space ID}"
title: "Clear, descriptive title"
body: "Complete Markdown content"
parentId: "{parent page ID}" (optional, for hierarchy)
Updating Pages
Tool: mcp__atlassian__updateConfluencePage
Parameters:
cloudId: "https://treasure-data.atlassian.net"
pageId: "{existing page ID}"
body: "Updated Markdown content"
title: "New title" (optional)
versionMessage: "Description of changes" (optional)
Creating Child Pages
For complex layers with multiple components:
- Create parent overview page
- Create child pages for each component
- Link from parent using Confluence URLs
## Components
1. [**Component 1**](https://treasure-data.atlassian.net/wiki/spaces/.../pages/.../Component+1)
- Description
2. [**Component 2**](https://treasure-data.atlassian.net/wiki/spaces/.../pages/.../Component+2)
- Description
Common Patterns
Pattern 1: Multi-Component Documentation
For layers with multiple workflows/tables:
- Create parent page with overview
- List all components in summary table
- Create child page per component
- Link child pages from parent
- Add cross-references between related components
Pattern 2: Performance Metrics
Document performance characteristics:
| Metric | Value | Benchmark |
|---|---|---|
| Avg Processing Time | 15 min | < 30 min SLA |
| Peak Memory Usage | 8 GB | 12 GB limit |
| Avg Rows/Day | 2.5M | Growing 10% monthly |
Pattern 3: Security Documentation
Document PII and compliance:
| Table | PII Columns | Protection | Retention | Access |
|---|---|---|---|---|
| table_a | email, phone | SHA256 | 7 years | Restricted |
| table_b | ip_address | Anonymization | 90 days | Internal |
Pattern 4: Version History
Track documentation changes:
| Version | Date | Changed By | Changes | Impact |
|---|---|---|---|---|
| v2.1 | 2025-11-27 | Claude | Added 3 tables | Low |
| v2.0 | 2025-11-15 | Team | Migrated engine | High |
Troubleshooting
Issue: Cannot Find Files
Solutions:
- Verify directory path with
ls - Search recursively with
find . -name "*.dig" - Check git branch
- Ask user for exact location
Issue: Configuration Unclear
Solutions:
- Read multiple example configs
- Look for schema documentation
- Analyze workflow references
- Ask user for clarification
Issue: Complex Transformations
Solutions:
- Break down analysis by section
- Document each CTE separately
- Create transformation flow diagram
- Extract column mapping matrix
Issue: Template Mismatch
Solutions:
- Confirm with user which template to follow
- Identify adaptable sections
- Get approval for deviations
- Document why structure differs
Best Practices
- Always read codebase first - Never document based on assumptions
- Use actual examples - No placeholders or generic values
- Validate everything - Test SQL, validate YAML, check links
- Follow template exactly - Match structure, headings, formatting
- Include visuals - Diagrams make complex systems understandable
- Document security - Always include PII and compliance details
- Test before publishing - Run all 60+ quality checks
- Keep it actionable - Every example should be copy-paste ready
Resources
- Atlassian MCP Documentation: For Confluence integration
- Digdag Documentation: https://docs.digdag.io/ - For workflow syntax
- Treasure Data Documentation: https://docs.treasuredata.com/ - For TD patterns
- Presto SQL Reference: https://prestodb.io/docs/current/ - For SQL transformations
Summary
The APS Documentation Core provides:
- ✅ Template-based generation for consistency
- ✅ Three-phase workflow (analyze, explore, generate)
- ✅ Visual diagram generation (4 Mermaid types)
- ✅ Metadata extraction (schemas, lineage, volumes)
- ✅ Quality validation (60+ checks)
- ✅ Documentation testing (6 test categories)
- ✅ Confluence integration (create, update, hierarchy)
- ✅ Common patterns (multi-component, performance, security)
This core framework is used by all layer-specific skills to ensure consistent, high-quality documentation across all Treasure Data pipeline layers.
Related Skills
Xlsx
Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Clickhouse Io
ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.
Analyzing Financial Statements
This skill calculates key financial ratios and metrics from financial statement data for investment analysis
Data Storytelling
Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.
Kpi Dashboard Design
Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.
Dbt Transformation Patterns
Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.
Sql Optimization Patterns
Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.
Clinical Decision Support
Generate professional clinical decision support (CDS) documents for pharmaceutical and clinical research settings, including patient cohort analyses (biomarker-stratified with outcomes) and treatment recommendation reports (evidence-based guidelines with decision algorithms). Supports GRADE evidence grading, statistical analysis (hazard ratios, survival curves, waterfall plots), biomarker integration, and regulatory compliance. Outputs publication-ready LaTeX/PDF format optimized for drug develo
Anndata
This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.
