Context Analysis

by nilenso

artdocument

Analyze plain text documents to understand their semantic structure and token distribution. Use when asked to analyze context, visualize token usage, segment text, identify components, create waffle charts, or compare multiple documents.

Skill Details

Repository Files

28 files in this skill directory


name: context-analysis description: Analyze plain text documents to understand their semantic structure and token distribution. Use when asked to analyze context, visualize token usage, segment text, identify components, create waffle charts, or compare multiple documents.

Context Analysis

Analyze plain text documents to understand their semantic structure and token distribution. This skill helps visualize how tokens are distributed across different semantic components of a document.

Workflow

When analyzing a document, follow these steps:

1. Parse the Text

Convert the input text file to JSON format:

./scripts/parse.sh input.txt > parsed.json

2. Count Tokens

Add token counts using tiktoken (GPT-4o encoding):

./scripts/count-tokens.sh parsed.json > counted.json

3. Segment Large Parts

For parts with more than 500 tokens, identify semantic breakpoints and split them.

You (Claude) should do this directly:

  • Read the text of each large part
  • Identify natural semantic boundaries (topic changes, section breaks, logical divisions)
  • Split into coherent chunks
  • Update the JSON with new parts (use IDs like part-1.1, part-1.2, etc.)

After segmenting, recount tokens:

./scripts/count-tokens.sh segmented.json > recounted.json

4. Identify Components

Analyze the document and identify the distinct semantic components it contains.

You (Claude) should do this directly:

  • Read all parts
  • Identify categories/themes (e.g., "introduction", "methodology", "examples", "conclusion")
  • Assign each part to a component
  • Add to JSON: components array and component field on each part
  • Calculate component_tokens totals

5. Assign Colors

Run the colorization script to assign consistent colors:

./scripts/colorise.sh componentised.json > colored.json

6. Generate Visualizations

./scripts/waffle-chart.sh colored.json > waffle.html
./scripts/bar-chart.sh colored.json > bar-chart.html
./scripts/text-view.sh colored.json > text-view.html

Quick Commands

For a complete analysis pipeline:

# Parse and count
./scripts/parse.sh input.txt > /tmp/1-parsed.json
./scripts/count-tokens.sh /tmp/1-parsed.json > /tmp/2-counted.json

# You segment and componentise the JSON directly, then:
./scripts/colorise.sh /tmp/3-componentised.json > /tmp/4-colored.json
./scripts/waffle-chart.sh /tmp/4-colored.json > waffle.html

For multiple files:

./scripts/group.sh input_folder/ output_dir/

Data Format

The JSON format used throughout:

{
  "source": "filename.txt",
  "parts": [
    {
      "id": "part-1",
      "text": "The actual text content...",
      "token_count": 150,
      "component": "introduction"
    }
  ],
  "components": ["introduction", "methodology", "results"],
  "component_tokens": {
    "introduction": 500,
    "methodology": 1200,
    "results": 800
  },
  "total_tokens": 2500
}

Segmentation Guidelines

When segmenting large parts (>500 tokens):

  1. Look for natural breakpoints:

    • Markdown headings (#, ##, etc.)
    • Topic transitions
    • Logical section boundaries
    • List groupings
  2. Create semantically coherent chunks:

    • Each segment should cover one topic/concept
    • Aim for 100-500 tokens per segment
    • Preserve context within segments
  3. Update IDs hierarchically:

    • part-1 splits into part-1.1, part-1.2, etc.

Component Identification Guidelines

When identifying components:

  1. Read all parts to understand the document structure

  2. Identify 3-10 distinct semantic categories

  3. Use descriptive, lowercase names with underscores

  4. Common patterns:

    • Documents: introduction, methodology, results, conclusion
    • Technical: configuration, implementation, examples, api_reference
    • Instructional: overview, prerequisites, steps, troubleshooting
  5. Assign each part to exactly one component

  6. Use other for parts that don't fit elsewhere

Available Colors

Components are assigned these colors:

  • blue - Primary content, introductions
  • emerald - Workflows, processes, methodology
  • purple - Style, personality, guidelines
  • orange - Context, examples, highlights
  • indigo - Code, technical content
  • slate - Environment, configuration
  • gray - Tools, utilities, other

References

See the references/ folder for detailed documentation on each step.

Related Skills

Team Composition Analysis

This skill should be used when the user asks to "plan team structure", "determine hiring needs", "design org chart", "calculate compensation", "plan equity allocation", or requests organizational design and headcount planning for a startup.

artdesign

Startup Financial Modeling

This skill should be used when the user asks to "create financial projections", "build a financial model", "forecast revenue", "calculate burn rate", "estimate runway", "model cash flow", or requests 3-5 year financial planning for a startup.

art

Dbt Transformation Patterns

Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.

testingdocumenttool

Startup Metrics Framework

This skill should be used when the user asks about "key startup metrics", "SaaS metrics", "CAC and LTV", "unit economics", "burn multiple", "rule of 40", "marketplace metrics", or requests guidance on tracking and optimizing business performance metrics.

art

Market Sizing Analysis

This skill should be used when the user asks to "calculate TAM", "determine SAM", "estimate SOM", "size the market", "calculate market opportunity", "what's the total addressable market", or requests market sizing analysis for a startup or business opportunity.

art

Clinical Decision Support

Generate professional clinical decision support (CDS) documents for pharmaceutical and clinical research settings, including patient cohort analyses (biomarker-stratified with outcomes) and treatment recommendation reports (evidence-based guidelines with decision algorithms). Supports GRADE evidence grading, statistical analysis (hazard ratios, survival curves, waterfall plots), biomarker integration, and regulatory compliance. Outputs publication-ready LaTeX/PDF format optimized for drug develo

developmentdocumentcli

Anndata

This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.

arttooldata

Geopandas

Python library for working with geospatial vector data including shapefiles, GeoJSON, and GeoPackage files. Use when working with geographic data for spatial analysis, geometric operations, coordinate transformations, spatial joins, overlay operations, choropleth mapping, or any task involving reading/writing/analyzing vector geographic data. Supports PostGIS databases, interactive maps, and integration with matplotlib/folium/cartopy. Use for tasks like buffer analysis, spatial joins between dat

artdatacli

Market Research Reports

Generate comprehensive market research reports (50+ pages) in the style of top consulting firms (McKinsey, BCG, Gartner). Features professional LaTeX formatting, extensive visual generation with scientific-schematics and generate-image, deep integration with research-lookup for data gathering, and multi-framework strategic analysis including Porter's Five Forces, PESTLE, SWOT, TAM/SAM/SOM, and BCG Matrix.

artdata

Plotly

Interactive scientific and statistical data visualization library for Python. Use when creating charts, plots, or visualizations including scatter plots, line charts, bar charts, heatmaps, 3D plots, geographic maps, statistical distributions, financial charts, and dashboards. Supports both quick visualizations (Plotly Express) and fine-grained customization (graph objects). Outputs interactive HTML or static images (PNG, PDF, SVG).

artdata

Skill Information

Category:Creative
Last Updated:12/24/2025