Cobol Kg

by vin082

developmenttool

COBOL Knowledge Graph analysis and debugging toolkit. Use this skill when working with Neo4j queries, debugging pipeline issues, analyzing COBOL programs, or running common development tasks for this project.

Skill Details

Repository Files

1 file in this skill directory


name: cobol-kg description: COBOL Knowledge Graph analysis and debugging toolkit. Use this skill when working with Neo4j queries, debugging pipeline issues, analyzing COBOL programs, or running common development tasks for this project.

COBOL Knowledge Graph Skill

This skill provides commands and workflows for the COBOL Agentic Knowledge Graph project.

Project Overview

This project builds a knowledge graph from legacy COBOL codebases using:

  • Neo4j for graph storage
  • LangChain/LangGraph for agent orchestration
  • Multiple LLM providers (OpenAI, Groq, Gemini)

Quick Reference Commands

Check Neo4j Connection & Counts

uv run python -c "
from cobol_agentic_kg.utils.neo4j_client import Neo4jClient
client = Neo4jClient()
result = client.run_query('MATCH (n) RETURN labels(n)[0] as label, count(*) as count ORDER BY count DESC')
for r in result: print(f'{r[\"label\"]}: {r[\"count\"]}')
client.close()
"

Common Cypher Queries

List all programs:

MATCH (p:Program) RETURN p.name, p.type, p.complexity ORDER BY p.name

Find program dependencies:

MATCH (p:Program)-[:CALLS]->(called:Program)
WHERE p.name = 'PROGRAM_NAME'
RETURN p.name, called.name

Find data flow:

MATCH (p:Program)-[:USES]->(d:DataItem)
WHERE p.name = 'PROGRAM_NAME'
RETURN d.name, d.type, d.level

Find JCL jobs for a program:

MATCH (j:Job)-[:EXECUTES]->(p:Program)
WHERE p.name = 'PROGRAM_NAME'
RETURN j.name, j.description

Find copybooks used:

MATCH (p:Program)-[:INCLUDES]->(c:Copybook)
WHERE p.name = 'PROGRAM_NAME'
RETURN c.name

Run the Streamlit UI

uv run streamlit run cobol_agentic_kg/ui/app.py

Run Tests

uv run pytest cobol_agentic_kg/tests/ -v

Run Evals

uv run python cobol_agentic_kg/evals/run_all_evals.py

Debugging Common Issues

Neo4j Connection Failed

  1. Check Neo4j is running: neo4j status or check Docker
  2. Verify .env has correct NEO4J_URI, NEO4J_USER, NEO4J_PASSWORD
  3. Default URI: bolt://localhost:7687

LLM API Errors

  1. Check .env for OPENAI_API_KEY, GROQ_API_KEY, or GOOGLE_API_KEY
  2. For Groq (free tier): Use llama-3.1-70b-versatile model
  3. Test connection: uv run python cobol_agentic_kg/test_groq_setup.py

Parser Issues

  • Check cobol_agentic_kg/agents/parsing.py for COBOL parser
  • Check cobol_agentic_kg/agents/jcl_parser.py for JCL parser
  • Check cobol_agentic_kg/agents/copybook_parser.py for copybook parser

Project Structure

cobol_agentic_kg/
├── agents/           # LangChain agents
│   ├── parsing.py        # COBOL parser
│   ├── jcl_parser.py     # JCL parser
│   ├── copybook_parser.py # Copybook parser
│   ├── validation.py     # Validation agent
│   ├── enrichment.py     # Enrichment agent
│   ├── graph_builder.py  # Neo4j graph builder
│   ├── cypher_gen.py     # Cypher query generator
│   ├── retrieval.py      # RAG retrieval
│   ├── tech_debt_analyzer.py # Tech debt analysis
│   ├── modernization.py  # Modernization recommendations
│   └── translation.py    # Code translation
├── config/
│   └── settings.py       # Configuration
├── utils/
│   ├── neo4j_client.py   # Neo4j connection
│   ├── llm_factory.py    # Multi-LLM support
│   └── state.py          # LangGraph state
├── workflows/
│   └── orchestrator.py   # Pipeline orchestration
├── ui/
│   └── app.py            # Streamlit UI
└── evals/                # Evaluation suite

Key Files to Check

When debugging or making changes:

  1. Pipeline issues: workflows/orchestrator.py
  2. Graph schema: agents/graph_builder.py
  3. Query generation: agents/cypher_gen.py
  4. State management: utils/state.py
  5. UI problems: ui/app.py

Related Skills

Dbt Transformation Patterns

Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.

testingdocumenttool

Clinical Decision Support

Generate professional clinical decision support (CDS) documents for pharmaceutical and clinical research settings, including patient cohort analyses (biomarker-stratified with outcomes) and treatment recommendation reports (evidence-based guidelines with decision algorithms). Supports GRADE evidence grading, statistical analysis (hazard ratios, survival curves, waterfall plots), biomarker integration, and regulatory compliance. Outputs publication-ready LaTeX/PDF format optimized for drug develo

developmentdocumentcli

Anndata

This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.

arttooldata

Xlsx

Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.

tooldata

Tensorboard

Visualize training metrics, debug models with histograms, compare experiments, visualize model graphs, and profile performance with TensorBoard - Google's ML visualization toolkit

tool

Deeptools

NGS analysis toolkit. BAM to bigWig conversion, QC (correlation, PCA, fingerprints), heatmaps/profiles (TSS, peaks), for ChIP-seq, RNA-seq, ATAC-seq visualization.

tool

Scvi Tools

This skill should be used when working with single-cell omics data analysis using scvi-tools, including scRNA-seq, scATAC-seq, CITE-seq, spatial transcriptomics, and other single-cell modalities. Use this skill for probabilistic modeling, batch correction, dimensionality reduction, differential expression, cell type annotation, multimodal integration, and spatial analysis tasks.

tooldata

Statsmodels

Statistical modeling toolkit. OLS, GLM, logistic, ARIMA, time series, hypothesis tests, diagnostics, AIC/BIC, for rigorous statistical inference and econometric analysis.

tool

Scikit Survival

Comprehensive toolkit for survival analysis and time-to-event modeling in Python using scikit-survival. Use this skill when working with censored survival data, performing time-to-event analysis, fitting Cox models, Random Survival Forests, Gradient Boosting models, or Survival SVMs, evaluating survival predictions with concordance index or Brier score, handling competing risks, or implementing any survival analysis workflow with the scikit-survival library.

workflowtooldata

Neurokit2

Comprehensive biosignal processing toolkit for analyzing physiological data including ECG, EEG, EDA, RSP, PPG, EMG, and EOG signals. Use this skill when processing cardiovascular signals, brain activity, electrodermal responses, respiratory patterns, muscle activity, or eye movements. Applicable for heart rate variability analysis, event-related potentials, complexity measures, autonomic nervous system assessment, psychophysiology research, and multi-modal physiological signal integration.

arttooldata

Skill Information

Category:Technical
Last Updated:1/30/2026