Data Model Reporter

by lastdays03

data

데이터 분석/머신러닝 노트북의 결과를 분석하여 표준화된 Model Card 보고서(Markdown)를 자동 생성합니다.

Skill Details

Repository Files

2 files in this skill directory


name: data-model-reporter description: 데이터 분석/머신러닝 노트북의 결과를 분석하여 표준화된 Model Card 보고서(Markdown)를 자동 생성합니다.

Model Card Reporter Workflow

이 워크플로우는 Jupyter Notebook의 분석 결과를 추출하여, Hugging Face/Google 표준에 부합하는 Model Card 문서를 생성합니다.

1단계: 대상 선정 (Selection)

  1. Input: 사용자가 분석하려는 노트북 파일 경로(*.ipynb)를 입력받습니다.
  2. Context Loading: this document를 읽어 추출 규칙과 표준을 로드합니다.

2단계: 추출 및 분석 (Extraction)

  1. Read Notebook: read_file로 노트북 내용을 읽습니다 (JSON 포맷 파싱).
  2. Analyze:
    • Model: 사용된 모델(알고리즘, 프레임워크 버전)을 식별합니다.
    • Metrics: accuracy_score, f1_score 등 정량적 지표의 실행 결과를 찾습니다.
    • Features: X.columns, feature_importances_ 등을 통해 주요 변수를 추출합니다.
    • Visuals: Confusion Matrix, SHAP Summary, ROC Curve 등 핵심 시각화 그래프를 식별하고 이미지 경로를 확보합니다. (없다면 생성을 제안)

3단계: 리포트 작성 (Drafting)

  1. Template Load: resources/report-template.md를 로드합니다.
  2. Fill: 추출된 정보를 바탕으로 템플릿의 빈칸({...})을 채웁니다.
    • Visualizations: 추출된 이미지 파일 경로를 마크다운 이미지 태그(![Description](path))로 삽입합니다.
    • Warning: '윤리적 고려사항'이나 '사용 목적'처럼 코드에서 알 수 없는 내용은 "사용자 입력 필요"로 표시하거나, 노트북의 마크다운 셀에서 문맥을 추론합니다.
  3. Create Artifact: docs/reports/[Topic]_Report.md (혹은 Model_Card_[Topic].md 등 일관된 규칙 사용) 아티팩트를 생성합니다.

4단계: 검토 및 완료 (Review)

  1. Notify: 생성된 리포트 경로를 사용자에게 알리고, "윤리적 고려사항" 섹션을 반드시 수동으로 검토할 것을 안내합니다.

Standards & Rules

Skill: Model Card Reporter

This skill defines the standard for generating Model Cards from data analysis notebooks. It aligns with Hugging Face and Google standards to ensure transparency, reproducibility, and ethical reporting.


💎 1. Core Principles

  1. Standard Alignment:
    • Follows the Hugging Face Model Card structure (YAML Metadata + Markdown Sections).
    • Must include "Ethical Considerations" and "Limitations".
  2. Evidence-Based:
    • All metrics (Accuracy, F1, etc.) must be extracted directly from the notebook execution results.
    • No hallucinated metrics.
  3. Neutral Tone:
    • Use objective language. Avoid marketing buzzwords like "Superb", "Perfect".
    • Acknowledge biases and limitations honestly.
  4. Visual Evidence:
    • A picture is worth a thousand words. Prefer charts (SHAP, ROC, Matrix) over raw numbers where possible.
    • All visualizations must have captions explaining "what this means".

🏗️ 2. Report Structure

The output must follow report-template.md.

Metadata (YAML Frontmatter)

Essential for machine readability (Hugging Face Hub compatibility).

  • language: (e.g., en, ko)
  • library_name: (e.g., sklearn, pytorch)
  • tags: (e.g., tabular-classification, finance)
  • metrics: (e.g., accuracy, f1)

Section 1: Model Details

  • Architecture: Algorithm used (e.g., Random Forest, BERT).
  • Framework: Version info (e.g., Scikit-Learn 1.0.2).
  • Author: Developer or Team name.

Section 2: Intended Use

  • Primary Use: What specific problem does this solve?
  • Out of Scope: When should this model NOT be used? (Crucial for safety).

Section 3: Factors & Metrics

  • Factors: Input features used. Highlight key drivers (SHAP values, feature importance).
  • Metrics: Quantitative performance on Test/Validation sets.

Section 4: Ethical Considerations (Critical)

  • Bias: Are there protected groups (gender, race) that might be unfairly treated?
  • Fairness: Disparate impact analysis results.

🏆 3. Quality Standards

  1. Metric Integrity:
    • REPORTED metrics MUST MATCH valid execution outputs.
    • If code failed to run, do NOT guess the number. Mark as "N/A".
  2. Disclosure:
    • Always disclose the 'Out of Scope' use cases to prevent misuse.
    • Always mention the framework version for reproducibility.

✅ 4. Checklist

  • Extraction: Did you find the model object and training metrics?
  • Completeness: Are all 5 sections of the template filled?
  • Safety Check: Is 'Out of Scope' clearly defined?
  • Verification: Did you explicitly warn the user to review the 'Ethical Considerations'?

Related Skills

Xlsx

Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas

data

Clickhouse Io

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

datacli

Clickhouse Io

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

datacli

Analyzing Financial Statements

This skill calculates key financial ratios and metrics from financial statement data for investment analysis

data

Data Storytelling

Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.

data

Kpi Dashboard Design

Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.

designdata

Dbt Transformation Patterns

Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.

testingdocumenttool

Sql Optimization Patterns

Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.

designdata

Anndata

This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.

arttooldata

Xlsx

Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.

tooldata

Skill Information

Category:Data
Last Updated:1/20/2026