Td Glm

by teradata-labs

skill

Comprehensive Generalized Linear Model analytics for regression and classification

Skill Details

Repository Files

11 files in this skill directory


name: td-glm description: Comprehensive Generalized Linear Model analytics for regression and classification

Teradata GLM Analytics

Skill Name Teradata GLM Analytics
Description Comprehensive Generalized Linear Model analytics for regression and classification
Category Regression Analytics
Function TD_GLM

Core Capabilities

  • Complete analytical workflow from data exploration to model deployment
  • Automated preprocessing including scaling, encoding, and train-test splitting
  • Advanced TD_GLM implementation with parameter optimization
  • Comprehensive evaluation metrics and model validation
  • Production-ready SQL generation with proper table management
  • Error handling and data quality checks throughout the pipeline
  • Business-focused interpretation of analytical results

Table Analysis Workflow

This skill automatically analyzes your provided table to generate optimized SQL workflows. Here's how it works:

1. Table Structure Analysis

  • Column Detection: Automatically identifies all columns and their data types
  • Data Type Classification: Distinguishes between numeric, categorical, and text columns
  • Primary Key Identification: Detects unique identifier columns
  • Missing Value Assessment: Analyzes data completeness

2. Feature Engineering Recommendations

  • Numeric Features: Identifies columns suitable for scaling and normalization
  • Categorical Features: Detects columns requiring encoding (one-hot, label encoding)
  • Target Variable: Helps identify the dependent variable for modeling
  • Feature Selection: Recommends relevant features based on data types

3. SQL Generation Process

  • Dynamic Column Lists: Generates column lists based on your table structure
  • Parameterized Queries: Creates flexible SQL templates using your table schema
  • Table Name Integration: Replaces placeholders with your actual table names
  • Database Context: Adapts to your database and schema naming conventions

How to Use This Skill

  1. Provide Your Table Information:

    "Analyze table: database_name.table_name"
    or
    "Use table: my_data with target column: target_var"
    
  2. The Skill Will:

    • Query your table structure using SHOW COLUMNS FROM table_name
    • Analyze data types and suggest appropriate preprocessing
    • Generate complete SQL workflow with your specific column names
    • Provide optimized parameters based on your data characteristics

Input Requirements

Data Requirements

  • Source table: Teradata table with analytical data
  • Target column: Dependent variable for regression analysis
  • Feature columns: Independent variables (numeric and categorical)
  • ID column: Unique identifier for record tracking
  • Minimum sample size: 100+ records for reliable regression modeling

Technical Requirements

  • Teradata Vantage with ClearScape Analytics enabled
  • Database permissions: CREATE, DROP, SELECT on working database
  • Function access: TD_GLM, TD_GLMPredict

Output Formats

Generated Tables

  • Preprocessed data tables with proper scaling and encoding
  • Train/test split tables for model validation
  • Model table containing trained TD_GLM parameters
  • Prediction results with confidence metrics
  • Evaluation metrics table with performance statistics

SQL Scripts

  • Complete workflow scripts ready for execution
  • Parameterized queries for different datasets
  • Table management with proper cleanup procedures

Regression Use Cases Supported

  1. Linear regression: Comprehensive analysis workflow
  2. Logistic regression: Comprehensive analysis workflow
  3. Poisson regression: Comprehensive analysis workflow
  4. Statistical modeling: Comprehensive analysis workflow

Best Practices Applied

  • Data validation before analysis execution
  • Proper feature scaling and categorical encoding
  • Train-test splitting with stratification when appropriate
  • Cross-validation for robust model evaluation
  • Parameter optimization using systematic approaches
  • Residual analysis and diagnostic checks
  • Business interpretation of statistical results
  • Documentation of methodology and assumptions

Example Usage

-- Example workflow for Teradata GLM Analytics
-- Replace 'your_table' with actual table name

-- 1. Data exploration and validation
SELECT COUNT(*),
       COUNT(DISTINCT your_id_column),
       AVG(your_target_column),
       STDDEV(your_target_column)
FROM your_database.your_table;

-- 2. Execute complete regression workflow
-- (Detailed SQL provided by the skill)

Scripts Included

Core Analytics Scripts

  • preprocessing.sql: Data preparation and feature engineering
  • table_analysis.sql: Automatic table structure analysis
  • complete_workflow_template.sql: End-to-end workflow template
  • model_training.sql: TD_GLM training procedures
  • prediction.sql: TD_GLMPredict execution
  • evaluation.sql: Model validation and metrics calculation

Utility Scripts

  • data_quality_checks.sql: Comprehensive data validation
  • parameter_tuning.sql: Systematic parameter optimization
  • diagnostic_queries.sql: Model diagnostics and interpretation

Limitations and Disclaimers

  • Data quality: Results depend on input data quality and completeness
  • Sample size: Minimum sample size requirements for reliable results
  • Feature selection: Manual feature engineering may be required
  • Computational resources: Large datasets may require optimization
  • Business context: Statistical results require domain expertise for interpretation
  • Model assumptions: Understand underlying mathematical assumptions

Quality Checks

Automated Validations

  • Data completeness verification before analysis
  • Statistical assumptions testing where applicable
  • Model convergence monitoring during training
  • Prediction quality assessment using validation data
  • Performance metrics calculation and interpretation

Manual Review Points

  • Feature selection appropriateness for business problem
  • Model interpretation alignment with domain knowledge
  • Results validation against business expectations
  • Documentation completeness for reproducibility

Updates and Maintenance

  • Version compatibility: Tested with latest Teradata Vantage releases
  • Performance optimization: Regular query performance reviews
  • Best practices: Updated based on analytics community feedback
  • Documentation: Maintained with latest ClearScape Analytics features
  • Examples: Updated with real-world use cases and scenarios

This skill provides production-ready regression analytics using Teradata ClearScape Analytics TD_GLM with comprehensive data science best practices.

Related Skills

Attack Tree Construction

Build comprehensive attack trees to visualize threat paths. Use when mapping attack scenarios, identifying defense gaps, or communicating security risks to stakeholders.

skill

Grafana Dashboards

Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.

skill

Matplotlib

Foundational plotting library. Create line plots, scatter, bar, histograms, heatmaps, 3D, subplots, export PNG/PDF/SVG, for scientific visualization and publication figures.

skill

Scientific Visualization

Create publication figures with matplotlib/seaborn/plotly. Multi-panel layouts, error bars, significance markers, colorblind-safe, export PDF/EPS/TIFF, for journal-ready scientific plots.

skill

Seaborn

Statistical visualization. Scatter, box, violin, heatmaps, pair plots, regression, correlation matrices, KDE, faceted plots, for exploratory analysis and publication figures.

skill

Shap

Model interpretability and explainability using SHAP (SHapley Additive exPlanations). Use this skill when explaining machine learning model predictions, computing feature importance, generating SHAP plots (waterfall, beeswarm, bar, scatter, force, heatmap), debugging models, analyzing model bias or fairness, comparing models, or implementing explainable AI. Works with tree-based models (XGBoost, LightGBM, Random Forest), deep learning (TensorFlow, PyTorch), linear models, and any black-box model

skill

Pydeseq2

Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.

skill

Query Writing

For writing and executing SQL queries - from simple single-table queries to complex multi-table JOINs and aggregations

skill

Pydeseq2

Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.

skill

Scientific Visualization

Meta-skill for publication-ready figures. Use when creating journal submission figures requiring multi-panel layouts, significance annotations, error bars, colorblind-safe palettes, and specific journal formatting (Nature, Science, Cell). Orchestrates matplotlib/seaborn/plotly with publication styles. For quick exploration use seaborn or plotly directly.

skill

Skill Information

Category:Skill
Last Updated:12/10/2025