Cobol Kg
by vin082
COBOL Knowledge Graph analysis and debugging toolkit. Use this skill when working with Neo4j queries, debugging pipeline issues, analyzing COBOL programs, or running common development tasks for this project.
Skill Details
Repository Files
1 file in this skill directory
name: cobol-kg description: COBOL Knowledge Graph analysis and debugging toolkit. Use this skill when working with Neo4j queries, debugging pipeline issues, analyzing COBOL programs, or running common development tasks for this project.
COBOL Knowledge Graph Skill
This skill provides commands and workflows for the COBOL Agentic Knowledge Graph project.
Project Overview
This project builds a knowledge graph from legacy COBOL codebases using:
- Neo4j for graph storage
- LangChain/LangGraph for agent orchestration
- Multiple LLM providers (OpenAI, Groq, Gemini)
Quick Reference Commands
Check Neo4j Connection & Counts
uv run python -c "
from cobol_agentic_kg.utils.neo4j_client import Neo4jClient
client = Neo4jClient()
result = client.run_query('MATCH (n) RETURN labels(n)[0] as label, count(*) as count ORDER BY count DESC')
for r in result: print(f'{r[\"label\"]}: {r[\"count\"]}')
client.close()
"
Common Cypher Queries
List all programs:
MATCH (p:Program) RETURN p.name, p.type, p.complexity ORDER BY p.name
Find program dependencies:
MATCH (p:Program)-[:CALLS]->(called:Program)
WHERE p.name = 'PROGRAM_NAME'
RETURN p.name, called.name
Find data flow:
MATCH (p:Program)-[:USES]->(d:DataItem)
WHERE p.name = 'PROGRAM_NAME'
RETURN d.name, d.type, d.level
Find JCL jobs for a program:
MATCH (j:Job)-[:EXECUTES]->(p:Program)
WHERE p.name = 'PROGRAM_NAME'
RETURN j.name, j.description
Find copybooks used:
MATCH (p:Program)-[:INCLUDES]->(c:Copybook)
WHERE p.name = 'PROGRAM_NAME'
RETURN c.name
Run the Streamlit UI
uv run streamlit run cobol_agentic_kg/ui/app.py
Run Tests
uv run pytest cobol_agentic_kg/tests/ -v
Run Evals
uv run python cobol_agentic_kg/evals/run_all_evals.py
Debugging Common Issues
Neo4j Connection Failed
- Check Neo4j is running:
neo4j statusor check Docker - Verify
.envhas correctNEO4J_URI,NEO4J_USER,NEO4J_PASSWORD - Default URI:
bolt://localhost:7687
LLM API Errors
- Check
.envforOPENAI_API_KEY,GROQ_API_KEY, orGOOGLE_API_KEY - For Groq (free tier): Use
llama-3.1-70b-versatilemodel - Test connection:
uv run python cobol_agentic_kg/test_groq_setup.py
Parser Issues
- Check
cobol_agentic_kg/agents/parsing.pyfor COBOL parser - Check
cobol_agentic_kg/agents/jcl_parser.pyfor JCL parser - Check
cobol_agentic_kg/agents/copybook_parser.pyfor copybook parser
Project Structure
cobol_agentic_kg/
├── agents/ # LangChain agents
│ ├── parsing.py # COBOL parser
│ ├── jcl_parser.py # JCL parser
│ ├── copybook_parser.py # Copybook parser
│ ├── validation.py # Validation agent
│ ├── enrichment.py # Enrichment agent
│ ├── graph_builder.py # Neo4j graph builder
│ ├── cypher_gen.py # Cypher query generator
│ ├── retrieval.py # RAG retrieval
│ ├── tech_debt_analyzer.py # Tech debt analysis
│ ├── modernization.py # Modernization recommendations
│ └── translation.py # Code translation
├── config/
│ └── settings.py # Configuration
├── utils/
│ ├── neo4j_client.py # Neo4j connection
│ ├── llm_factory.py # Multi-LLM support
│ └── state.py # LangGraph state
├── workflows/
│ └── orchestrator.py # Pipeline orchestration
├── ui/
│ └── app.py # Streamlit UI
└── evals/ # Evaluation suite
Key Files to Check
When debugging or making changes:
- Pipeline issues:
workflows/orchestrator.py - Graph schema:
agents/graph_builder.py - Query generation:
agents/cypher_gen.py - State management:
utils/state.py - UI problems:
ui/app.py
Related Skills
Dbt Transformation Patterns
Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.
Clinical Decision Support
Generate professional clinical decision support (CDS) documents for pharmaceutical and clinical research settings, including patient cohort analyses (biomarker-stratified with outcomes) and treatment recommendation reports (evidence-based guidelines with decision algorithms). Supports GRADE evidence grading, statistical analysis (hazard ratios, survival curves, waterfall plots), biomarker integration, and regulatory compliance. Outputs publication-ready LaTeX/PDF format optimized for drug develo
Anndata
This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.
Xlsx
Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.
Tensorboard
Visualize training metrics, debug models with histograms, compare experiments, visualize model graphs, and profile performance with TensorBoard - Google's ML visualization toolkit
Deeptools
NGS analysis toolkit. BAM to bigWig conversion, QC (correlation, PCA, fingerprints), heatmaps/profiles (TSS, peaks), for ChIP-seq, RNA-seq, ATAC-seq visualization.
Scvi Tools
This skill should be used when working with single-cell omics data analysis using scvi-tools, including scRNA-seq, scATAC-seq, CITE-seq, spatial transcriptomics, and other single-cell modalities. Use this skill for probabilistic modeling, batch correction, dimensionality reduction, differential expression, cell type annotation, multimodal integration, and spatial analysis tasks.
Statsmodels
Statistical modeling toolkit. OLS, GLM, logistic, ARIMA, time series, hypothesis tests, diagnostics, AIC/BIC, for rigorous statistical inference and econometric analysis.
Scikit Survival
Comprehensive toolkit for survival analysis and time-to-event modeling in Python using scikit-survival. Use this skill when working with censored survival data, performing time-to-event analysis, fitting Cox models, Random Survival Forests, Gradient Boosting models, or Survival SVMs, evaluating survival predictions with concordance index or Brier score, handling competing risks, or implementing any survival analysis workflow with the scikit-survival library.
Neurokit2
Comprehensive biosignal processing toolkit for analyzing physiological data including ECG, EEG, EDA, RSP, PPG, EMG, and EOG signals. Use this skill when processing cardiovascular signals, brain activity, electrodermal responses, respiratory patterns, muscle activity, or eye movements. Applicable for heart rate variability analysis, event-related potentials, complexity measures, autonomic nervous system assessment, psychophysiology research, and multi-modal physiological signal integration.
