Analyze

by nuggetswise

data

Unified analysis skill - Python data analysis (--data) or KB gap identification (--kb)

Skill Details

Repository Files

1 file in this skill directory


name: analyze description: Unified analysis skill - Python data analysis (--data) or KB gap identification (--kb)

Analyze

Overview

The analysis skill provides two modes for generating PM insights from structured sources:

Mode Purpose Output
--data Python-based analysis of CSV/Excel files (retention, funnel, segmentation) outputs/insights/data-analysis-YYYY-MM-DD.md
--kb Knowledge Base gap analysis (pain points, missing articles, AI opportunities) outputs/insights/kb-gaps-YYYY-MM-DD.md

When to Use

--data Mode

  • User provides CSV, Excel, or structured data files
  • User asks for "analysis", "insights", "metrics", "trends", or "charts"
  • Data exists in inputs/data/ folder
  • User wants to understand product performance

--kb Mode

  • Have KB article exports to analyze
  • Want to understand what users struggle with most
  • Exploring AI assistant opportunities
  • Planning documentation improvements

Mode: --data

Process

Step 1: Choose Analysis Method

Your goal is to provide the most accurate analysis. Autonomously select the best method based on the data and the user's query.

  1. Examine the data source and the user's query.
  2. If the data is simple (e.g., < 500 rows, clear headers) and the query is a straightforward aggregation (counting, sorting, grouping), you may perform the analysis directly via LLM calculation.
  3. Else (i.e., the analysis requires complex calculations, statistics, visualizations, or the data is large/complex), you must generate and execute a Python script to ensure accuracy.
  4. When in doubt, default to using a Python script.
  5. Announce which method you are choosing and why before proceeding with the analysis.

Step 2: Execute Analysis


IF Direct LLM Calculation was chosen:

  1. Read the content of the data file.
  2. Perform the requested calculations directly.
  3. Proceed to Step 3, ensuring all findings and claims are based on your direct calculations.

IF Python Script Execution was chosen:

  1. Exploratory Data Analysis (EDA): Write and execute a Python script to get basic info.
    import pandas as pd
    # Load data, print shape, dtypes, head, describe, isnull, etc.
    
  2. Data Dictionary: Create a markdown table for the data dictionary. Ask the user to clarify any unknown column meanings before proceeding.
  3. Analysis Plan: Propose an analysis plan to the user.
  4. Execution & Visualization: Upon approval, write and execute the Python script to perform the analysis and generate any required visualizations (e.g., charts saved to outputs/insights/).
  5. Proceed to Step 3, ensuring all findings and claims are based on the Python script's output.

Step 3: Generate Output

Write to outputs/insights/data-analysis-YYYY-MM-DD.md. The output must be structured as follows and must include the analysis_method field in the YAML frontmatter.

---
generated: YYYY-MM-DD HH:MM
skill: analyze --data
analysis_method: "Direct LLM Calculation" # or "Python Script Execution"
sources:
  - inputs/data/filename.csv (modified: YYYY-MM-DD)
downstream: []
---

# Data Analysis: [Dataset Name]

## Analysis Method
This analysis was performed via **[Direct LLM Calculation / Python Script Execution]**.

## Dataset Overview
| Attribute | Value |
|-----------|-------|
| Rows | N |
| Columns | N |
| Date range | [if applicable] |

## Data Dictionary
| Column | Type | Example Values | Meaning |
|--------|------|----------------|---------|
| ... | ... | ... | Explicit/Unknown |

## Key Metrics
| Metric | Value | Source |
|--------|-------|--------|
| [Metric name] | [Number] | [Direct Calculation / Python output] |

## Findings
1. **[Finding]** - Evidence: [Direct Calculation / Python output]

## Hypotheses (require validation)
1. **[Hypothesis]** - Based on: [observation]

## Visualizations
- [Chart description]: outputs/insights/[filename].png

## Sources Used
- [file paths]

## Claims Ledger
| Claim | Type | Source |
|-------|------|--------|
| [Metric] | Evidence | [Direct Calculation / Python output] |
| [Trend interpretation] | Hypothesis | [Based on metric X] |

Mode: --kb

Process

Step 1: Gather Sources

Read files in:

  • inputs/knowledge_base/ - KB article exports
  • outputs/insights/voc-synthesis-*.md - VOC insights (if available, for correlation)

Step 2: Analyze Article Coverage

For each KB article (or category), note:

  • Topic / Category
  • Article count
  • Last updated date
  • Estimated complexity (simple how-to vs. complex troubleshooting)

Step 3: Identify Gaps

Look for:

  1. High-volume topics - Many articles = users struggle here
  2. Outdated articles - Not updated in 6+ months
  3. Missing topics - VOC mentions issues with no KB coverage
  4. Complex troubleshooting - Multi-step processes that could be simplified

Step 4: Assess AI Opportunities

For each gap, evaluate:

Opportunity Type Criteria Risk Level
Better search/IA Hard to find articles Low
Guided resolution Multi-step process Low-Medium
AI-assisted Can be automated with citations Medium
DO NOT automate Compliance, billing, trust-sensitive High

Step 5: Generate Output

Write to outputs/insights/kb-gaps-YYYY-MM-DD.md:

---
generated: YYYY-MM-DD HH:MM
skill: analyze --kb
sources:
  - inputs/knowledge_base/*.md
  - outputs/insights/voc-synthesis-*.md (if used)
downstream:
  - outputs/roadmap/Qx-YYYY-charters.md
---

# KB Gap Analysis: [Date]

## Executive Summary
[2-3 sentences: What's the state of KB? Where are the biggest gaps?]

## Coverage Overview

| Category | Article Count | Last Updated | Complexity | Gap Score |
|----------|---------------|--------------|------------|-----------|
| [Category 1] | N | YYYY-MM-DD | Simple/Complex | High/Med/Low |

## High-Volume Topics
*Categories with most articles (signal: users struggle here)*

| Topic | Article Count | Sample Titles | VOC Correlation |
|-------|---------------|---------------|-----------------|
| [Topic] | N | [title1, title2] | [Yes/No/Unknown] |

## Missing / Outdated Articles

| Gap | Type | Evidence | Priority |
|-----|------|----------|----------|
| [Topic with no article] | Missing | VOC mentions in [file] | High |
| [Article X] | Outdated | Last updated [date] | Medium |

## AI Opportunity Assessment

### Safe to Automate (Low Risk)
| Opportunity | Type | Rationale |
|-------------|------|-----------|
| [Better search for X] | Search/IA | Articles exist but hard to find |
| [Guided wizard for Y] | Guided resolution | Clear steps, no judgment needed |

### Automate with Caution (Medium Risk)
| Opportunity | Type | Guardrails Needed |
|-------------|------|-------------------|
| [AI assist for Z] | AI-assisted | Must cite source article, human review |

### DO NOT Automate (High Risk)
| Topic | Reason |
|-------|--------|
| [Billing disputes] | Financial, requires human judgment |
| [Data deletion] | Compliance, irreversible |
| [Access control] | Trust/security sensitive |

## Recommendations
1. **[Recommendation]** - Evidence: [source]

## Sources Used
- [file paths]

## Claims Ledger
| Claim | Type | Source |
|-------|------|--------|
| [High volume in X] | Evidence | [article count] |
| [Users struggle with Y] | Evidence | [VOC file] |

Quick Reference

--data Mode

Action Command
Load CSV pd.read_csv('inputs/data/file.csv')
Load Excel pd.read_excel('inputs/data/file.xlsx')
Save chart plt.savefig('outputs/insights/output.png')
Check nulls df.isnull().sum()

--kb Mode

Risk Level Examples Action
Low Search improvements, FAQ bots Safe to build
Medium Troubleshooting assistants Build with guardrails
High Billing, compliance, security Human only

Common Mistakes

--data Mode

  • Assuming column meanings: "user_id probably means..." -> Ask user to confirm
  • Stating implications as facts: "Users are churning because..." -> Label as hypothesis
  • Using sample data for conclusions: "Based on 10 rows..." -> Ensure representative data
  • Ignoring missing data: 50% nulls in key column -> Report this prominently
  • No data dictionary: Jumping to analysis -> Always document columns first

--kb Mode

  • Counting wrong: "Many articles" -> Exact count: "47 articles"
  • Missing VOC correlation: KB analysis in isolation -> Cross-reference with VOC
  • Underestimating risk: "AI can handle billing" -> Compliance topics need humans
  • No priorities: "Everything is a gap" -> Rank by impact
  • Stale analysis: Using old VOC -> Check VOC synthesis date

Verification Checklist

--data

  • Data dictionary created with all columns
  • Unknown meanings explicitly marked
  • User confirmed column semantics before analysis
  • Metrics separated from hypotheses
  • Missing data reported
  • Charts saved to outputs/insights/
  • All code executed successfully
  • Metadata header complete
  • Copied to history, tracker updated

--kb

  • All KB files read
  • Article counts accurate
  • Outdated articles identified (6+ months)
  • VOC correlation checked (if available)
  • AI opportunities categorized by risk
  • DO NOT automate list includes compliance/billing/trust topics
  • Recommendations backed by evidence
  • Metadata header complete
  • Copied to history, tracker updated

Output Locations

Mode Primary Output History
--data outputs/insights/data-analysis-YYYY-MM-DD.md history/analyze/data/
--kb outputs/insights/kb-gaps-YYYY-MM-DD.md history/analyze/kb/

Evidence Tracking

Claim Type Source
[Metric] Evidence [Python output]
[Trend interpretation] Hypothesis [Based on metric X]
[Column meaning] Evidence/Unknown [User confirmed / Not stated]
[47 articles on X] Evidence [KB export count]
[Users complain about Y] Evidence [VOC file:line]
[Safe to automate Z] Assumption [no compliance concern identified]

Related Skills

Xlsx

Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas

data

Clickhouse Io

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

datacli

Clickhouse Io

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

datacli

Analyzing Financial Statements

This skill calculates key financial ratios and metrics from financial statement data for investment analysis

data

Data Storytelling

Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.

data

Kpi Dashboard Design

Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.

designdata

Dbt Transformation Patterns

Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.

testingdocumenttool

Sql Optimization Patterns

Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.

designdata

Anndata

This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.

arttooldata

Xlsx

Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.

tooldata

Skill Information

Category:Data
Last Updated:1/26/2026