Td Glm
by teradata-labs
Comprehensive Generalized Linear Model analytics for regression and classification
Skill Details
Repository Files
11 files in this skill directory
name: td-glm description: Comprehensive Generalized Linear Model analytics for regression and classification
Teradata GLM Analytics
| Skill Name | Teradata GLM Analytics |
|---|---|
| Description | Comprehensive Generalized Linear Model analytics for regression and classification |
| Category | Regression Analytics |
| Function | TD_GLM |
Core Capabilities
- Complete analytical workflow from data exploration to model deployment
- Automated preprocessing including scaling, encoding, and train-test splitting
- Advanced TD_GLM implementation with parameter optimization
- Comprehensive evaluation metrics and model validation
- Production-ready SQL generation with proper table management
- Error handling and data quality checks throughout the pipeline
- Business-focused interpretation of analytical results
Table Analysis Workflow
This skill automatically analyzes your provided table to generate optimized SQL workflows. Here's how it works:
1. Table Structure Analysis
- Column Detection: Automatically identifies all columns and their data types
- Data Type Classification: Distinguishes between numeric, categorical, and text columns
- Primary Key Identification: Detects unique identifier columns
- Missing Value Assessment: Analyzes data completeness
2. Feature Engineering Recommendations
- Numeric Features: Identifies columns suitable for scaling and normalization
- Categorical Features: Detects columns requiring encoding (one-hot, label encoding)
- Target Variable: Helps identify the dependent variable for modeling
- Feature Selection: Recommends relevant features based on data types
3. SQL Generation Process
- Dynamic Column Lists: Generates column lists based on your table structure
- Parameterized Queries: Creates flexible SQL templates using your table schema
- Table Name Integration: Replaces placeholders with your actual table names
- Database Context: Adapts to your database and schema naming conventions
How to Use This Skill
-
Provide Your Table Information:
"Analyze table: database_name.table_name" or "Use table: my_data with target column: target_var" -
The Skill Will:
- Query your table structure using
SHOW COLUMNS FROM table_name - Analyze data types and suggest appropriate preprocessing
- Generate complete SQL workflow with your specific column names
- Provide optimized parameters based on your data characteristics
- Query your table structure using
Input Requirements
Data Requirements
- Source table: Teradata table with analytical data
- Target column: Dependent variable for regression analysis
- Feature columns: Independent variables (numeric and categorical)
- ID column: Unique identifier for record tracking
- Minimum sample size: 100+ records for reliable regression modeling
Technical Requirements
- Teradata Vantage with ClearScape Analytics enabled
- Database permissions: CREATE, DROP, SELECT on working database
- Function access: TD_GLM, TD_GLMPredict
Output Formats
Generated Tables
- Preprocessed data tables with proper scaling and encoding
- Train/test split tables for model validation
- Model table containing trained TD_GLM parameters
- Prediction results with confidence metrics
- Evaluation metrics table with performance statistics
SQL Scripts
- Complete workflow scripts ready for execution
- Parameterized queries for different datasets
- Table management with proper cleanup procedures
Regression Use Cases Supported
- Linear regression: Comprehensive analysis workflow
- Logistic regression: Comprehensive analysis workflow
- Poisson regression: Comprehensive analysis workflow
- Statistical modeling: Comprehensive analysis workflow
Best Practices Applied
- Data validation before analysis execution
- Proper feature scaling and categorical encoding
- Train-test splitting with stratification when appropriate
- Cross-validation for robust model evaluation
- Parameter optimization using systematic approaches
- Residual analysis and diagnostic checks
- Business interpretation of statistical results
- Documentation of methodology and assumptions
Example Usage
-- Example workflow for Teradata GLM Analytics
-- Replace 'your_table' with actual table name
-- 1. Data exploration and validation
SELECT COUNT(*),
COUNT(DISTINCT your_id_column),
AVG(your_target_column),
STDDEV(your_target_column)
FROM your_database.your_table;
-- 2. Execute complete regression workflow
-- (Detailed SQL provided by the skill)
Scripts Included
Core Analytics Scripts
preprocessing.sql: Data preparation and feature engineeringtable_analysis.sql: Automatic table structure analysiscomplete_workflow_template.sql: End-to-end workflow templatemodel_training.sql: TD_GLM training proceduresprediction.sql: TD_GLMPredict executionevaluation.sql: Model validation and metrics calculation
Utility Scripts
data_quality_checks.sql: Comprehensive data validationparameter_tuning.sql: Systematic parameter optimizationdiagnostic_queries.sql: Model diagnostics and interpretation
Limitations and Disclaimers
- Data quality: Results depend on input data quality and completeness
- Sample size: Minimum sample size requirements for reliable results
- Feature selection: Manual feature engineering may be required
- Computational resources: Large datasets may require optimization
- Business context: Statistical results require domain expertise for interpretation
- Model assumptions: Understand underlying mathematical assumptions
Quality Checks
Automated Validations
- Data completeness verification before analysis
- Statistical assumptions testing where applicable
- Model convergence monitoring during training
- Prediction quality assessment using validation data
- Performance metrics calculation and interpretation
Manual Review Points
- Feature selection appropriateness for business problem
- Model interpretation alignment with domain knowledge
- Results validation against business expectations
- Documentation completeness for reproducibility
Updates and Maintenance
- Version compatibility: Tested with latest Teradata Vantage releases
- Performance optimization: Regular query performance reviews
- Best practices: Updated based on analytics community feedback
- Documentation: Maintained with latest ClearScape Analytics features
- Examples: Updated with real-world use cases and scenarios
This skill provides production-ready regression analytics using Teradata ClearScape Analytics TD_GLM with comprehensive data science best practices.
Related Skills
Attack Tree Construction
Build comprehensive attack trees to visualize threat paths. Use when mapping attack scenarios, identifying defense gaps, or communicating security risks to stakeholders.
Grafana Dashboards
Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.
Matplotlib
Foundational plotting library. Create line plots, scatter, bar, histograms, heatmaps, 3D, subplots, export PNG/PDF/SVG, for scientific visualization and publication figures.
Scientific Visualization
Create publication figures with matplotlib/seaborn/plotly. Multi-panel layouts, error bars, significance markers, colorblind-safe, export PDF/EPS/TIFF, for journal-ready scientific plots.
Seaborn
Statistical visualization. Scatter, box, violin, heatmaps, pair plots, regression, correlation matrices, KDE, faceted plots, for exploratory analysis and publication figures.
Shap
Model interpretability and explainability using SHAP (SHapley Additive exPlanations). Use this skill when explaining machine learning model predictions, computing feature importance, generating SHAP plots (waterfall, beeswarm, bar, scatter, force, heatmap), debugging models, analyzing model bias or fairness, comparing models, or implementing explainable AI. Works with tree-based models (XGBoost, LightGBM, Random Forest), deep learning (TensorFlow, PyTorch), linear models, and any black-box model
Pydeseq2
Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.
Query Writing
For writing and executing SQL queries - from simple single-table queries to complex multi-table JOINs and aggregations
Pydeseq2
Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.
Scientific Visualization
Meta-skill for publication-ready figures. Use when creating journal submission figures requiring multi-panel layouts, significance annotations, error bars, colorblind-safe palettes, and specific journal formatting (Nature, Science, Cell). Orchestrates matplotlib/seaborn/plotly with publication styles. For quick exploration use seaborn or plotly directly.
