Context Analysis
by nilenso
Analyze plain text documents to understand their semantic structure and token distribution. Use when asked to analyze context, visualize token usage, segment text, identify components, create waffle charts, or compare multiple documents.
Skill Details
Repository Files
28 files in this skill directory
name: context-analysis description: Analyze plain text documents to understand their semantic structure and token distribution. Use when asked to analyze context, visualize token usage, segment text, identify components, create waffle charts, or compare multiple documents.
Context Analysis
Analyze plain text documents to understand their semantic structure and token distribution. This skill helps visualize how tokens are distributed across different semantic components of a document.
Workflow
When analyzing a document, follow these steps:
1. Parse the Text
Convert the input text file to JSON format:
./scripts/parse.sh input.txt > parsed.json
2. Count Tokens
Add token counts using tiktoken (GPT-4o encoding):
./scripts/count-tokens.sh parsed.json > counted.json
3. Segment Large Parts
For parts with more than 500 tokens, identify semantic breakpoints and split them.
You (Claude) should do this directly:
- Read the text of each large part
- Identify natural semantic boundaries (topic changes, section breaks, logical divisions)
- Split into coherent chunks
- Update the JSON with new parts (use IDs like
part-1.1,part-1.2, etc.)
After segmenting, recount tokens:
./scripts/count-tokens.sh segmented.json > recounted.json
4. Identify Components
Analyze the document and identify the distinct semantic components it contains.
You (Claude) should do this directly:
- Read all parts
- Identify categories/themes (e.g., "introduction", "methodology", "examples", "conclusion")
- Assign each part to a component
- Add to JSON:
componentsarray andcomponentfield on each part - Calculate
component_tokenstotals
5. Assign Colors
Run the colorization script to assign consistent colors:
./scripts/colorise.sh componentised.json > colored.json
6. Generate Visualizations
./scripts/waffle-chart.sh colored.json > waffle.html
./scripts/bar-chart.sh colored.json > bar-chart.html
./scripts/text-view.sh colored.json > text-view.html
Quick Commands
For a complete analysis pipeline:
# Parse and count
./scripts/parse.sh input.txt > /tmp/1-parsed.json
./scripts/count-tokens.sh /tmp/1-parsed.json > /tmp/2-counted.json
# You segment and componentise the JSON directly, then:
./scripts/colorise.sh /tmp/3-componentised.json > /tmp/4-colored.json
./scripts/waffle-chart.sh /tmp/4-colored.json > waffle.html
For multiple files:
./scripts/group.sh input_folder/ output_dir/
Data Format
The JSON format used throughout:
{
"source": "filename.txt",
"parts": [
{
"id": "part-1",
"text": "The actual text content...",
"token_count": 150,
"component": "introduction"
}
],
"components": ["introduction", "methodology", "results"],
"component_tokens": {
"introduction": 500,
"methodology": 1200,
"results": 800
},
"total_tokens": 2500
}
Segmentation Guidelines
When segmenting large parts (>500 tokens):
-
Look for natural breakpoints:
- Markdown headings (
#,##, etc.) - Topic transitions
- Logical section boundaries
- List groupings
- Markdown headings (
-
Create semantically coherent chunks:
- Each segment should cover one topic/concept
- Aim for 100-500 tokens per segment
- Preserve context within segments
-
Update IDs hierarchically:
part-1splits intopart-1.1,part-1.2, etc.
Component Identification Guidelines
When identifying components:
-
Read all parts to understand the document structure
-
Identify 3-10 distinct semantic categories
-
Use descriptive, lowercase names with underscores
-
Common patterns:
- Documents:
introduction,methodology,results,conclusion - Technical:
configuration,implementation,examples,api_reference - Instructional:
overview,prerequisites,steps,troubleshooting
- Documents:
-
Assign each part to exactly one component
-
Use
otherfor parts that don't fit elsewhere
Available Colors
Components are assigned these colors:
blue- Primary content, introductionsemerald- Workflows, processes, methodologypurple- Style, personality, guidelinesorange- Context, examples, highlightsindigo- Code, technical contentslate- Environment, configurationgray- Tools, utilities, other
References
See the references/ folder for detailed documentation on each step.
Related Skills
Team Composition Analysis
This skill should be used when the user asks to "plan team structure", "determine hiring needs", "design org chart", "calculate compensation", "plan equity allocation", or requests organizational design and headcount planning for a startup.
Startup Financial Modeling
This skill should be used when the user asks to "create financial projections", "build a financial model", "forecast revenue", "calculate burn rate", "estimate runway", "model cash flow", or requests 3-5 year financial planning for a startup.
Dbt Transformation Patterns
Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.
Startup Metrics Framework
This skill should be used when the user asks about "key startup metrics", "SaaS metrics", "CAC and LTV", "unit economics", "burn multiple", "rule of 40", "marketplace metrics", or requests guidance on tracking and optimizing business performance metrics.
Market Sizing Analysis
This skill should be used when the user asks to "calculate TAM", "determine SAM", "estimate SOM", "size the market", "calculate market opportunity", "what's the total addressable market", or requests market sizing analysis for a startup or business opportunity.
Clinical Decision Support
Generate professional clinical decision support (CDS) documents for pharmaceutical and clinical research settings, including patient cohort analyses (biomarker-stratified with outcomes) and treatment recommendation reports (evidence-based guidelines with decision algorithms). Supports GRADE evidence grading, statistical analysis (hazard ratios, survival curves, waterfall plots), biomarker integration, and regulatory compliance. Outputs publication-ready LaTeX/PDF format optimized for drug develo
Anndata
This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.
Geopandas
Python library for working with geospatial vector data including shapefiles, GeoJSON, and GeoPackage files. Use when working with geographic data for spatial analysis, geometric operations, coordinate transformations, spatial joins, overlay operations, choropleth mapping, or any task involving reading/writing/analyzing vector geographic data. Supports PostGIS databases, interactive maps, and integration with matplotlib/folium/cartopy. Use for tasks like buffer analysis, spatial joins between dat
Market Research Reports
Generate comprehensive market research reports (50+ pages) in the style of top consulting firms (McKinsey, BCG, Gartner). Features professional LaTeX formatting, extensive visual generation with scientific-schematics and generate-image, deep integration with research-lookup for data gathering, and multi-framework strategic analysis including Porter's Five Forces, PESTLE, SWOT, TAM/SAM/SOM, and BCG Matrix.
Plotly
Interactive scientific and statistical data visualization library for Python. Use when creating charts, plots, or visualizations including scatter plots, line charts, bar charts, heatmaps, 3D plots, geographic maps, statistical distributions, financial charts, and dashboards. Supports both quick visualizations (Plotly Express) and fine-grained customization (graph objects). Outputs interactive HTML or static images (PNG, PDF, SVG).
