Engineering Features For Machine Learning
by jeremylongshore
|
Skill Details
Repository Files
11 files in this skill directory
name: engineering-features-for-machine-learning description: | Execute create, select, and transform features to improve machine learning model performance. Handles feature scaling, encoding, and importance analysis. Use when asked to "engineer features" or "select features". Trigger with relevant phrases based on skill purpose. allowed-tools: Read, Write, Edit, Grep, Glob, Bash(cmd:*) version: 1.0.0 author: Jeremy Longshore jeremy@intentsolutions.io license: MIT
Feature Engineering Toolkit
This skill provides automated assistance for feature engineering toolkit tasks.
Overview
This skill provides automated assistance for feature engineering toolkit tasks. This skill enables Claude to leverage the feature-engineering-toolkit plugin to enhance machine learning models. It automates the process of creating new features, selecting the most relevant ones, and transforming existing features to better suit the model's needs. Use this skill to improve the accuracy, efficiency, and interpretability of machine learning models.
How It Works
- Analyzing Requirements: Claude analyzes the user's request and identifies the specific feature engineering task required.
- Generating Code: Claude generates Python code using the feature-engineering-toolkit plugin to perform the requested task. This includes data validation and error handling.
- Executing Task: The generated code is executed, creating, selecting, or transforming features as requested.
- Providing Insights: Claude provides performance metrics and insights related to the feature engineering process, such as the importance of newly created features or the impact of transformations on model performance.
When to Use This Skill
This skill activates when you need to:
- Create new features from existing data to improve model accuracy.
- Select the most relevant features from a dataset to reduce model complexity and improve efficiency.
- Transform features to better suit the assumptions of a machine learning model (e.g., scaling, normalization, encoding).
Examples
Example 1: Improving Model Accuracy
User request: "Create new features from the existing 'age' and 'income' columns to improve the accuracy of a customer churn prediction model."
The skill will:
- Generate code to create interaction terms between 'age' and 'income' (e.g., age * income, age / income).
- Execute the code and evaluate the impact of the new features on model performance.
Example 2: Reducing Model Complexity
User request: "Select the top 10 most important features from the dataset to reduce the complexity of a fraud detection model."
The skill will:
- Generate code to calculate feature importance using a suitable method (e.g., Random Forest, SelectKBest).
- Execute the code and select the top 10 features based on their importance scores.
Best Practices
- Data Validation: Always validate the input data to ensure it is clean and consistent before performing feature engineering.
- Feature Scaling: Scale numerical features to prevent features with larger ranges from dominating the model.
- Encoding Categorical Features: Encode categorical features appropriately (e.g., one-hot encoding, label encoding) to make them suitable for machine learning models.
Integration
This skill integrates with the feature-engineering-toolkit plugin, providing a seamless way to create, select, and transform features for machine learning models. It can be used in conjunction with other Claude Code skills to build complete machine learning pipelines.
Prerequisites
- Appropriate file access permissions
- Required dependencies installed
Instructions
- Invoke this skill when the trigger conditions are met
- Provide necessary context and parameters
- Review the generated output
- Apply modifications as needed
Output
The skill produces structured output relevant to the task.
Error Handling
- Invalid input: Prompts for correction
- Missing dependencies: Lists required components
- Permission errors: Suggests remediation steps
Resources
- Project documentation
- Related skills and commands
Related Skills
Attack Tree Construction
Build comprehensive attack trees to visualize threat paths. Use when mapping attack scenarios, identifying defense gaps, or communicating security risks to stakeholders.
Grafana Dashboards
Create and manage production Grafana dashboards for real-time visualization of system and application metrics. Use when building monitoring dashboards, visualizing metrics, or creating operational observability interfaces.
Matplotlib
Foundational plotting library. Create line plots, scatter, bar, histograms, heatmaps, 3D, subplots, export PNG/PDF/SVG, for scientific visualization and publication figures.
Scientific Visualization
Create publication figures with matplotlib/seaborn/plotly. Multi-panel layouts, error bars, significance markers, colorblind-safe, export PDF/EPS/TIFF, for journal-ready scientific plots.
Seaborn
Statistical visualization. Scatter, box, violin, heatmaps, pair plots, regression, correlation matrices, KDE, faceted plots, for exploratory analysis and publication figures.
Shap
Model interpretability and explainability using SHAP (SHapley Additive exPlanations). Use this skill when explaining machine learning model predictions, computing feature importance, generating SHAP plots (waterfall, beeswarm, bar, scatter, force, heatmap), debugging models, analyzing model bias or fairness, comparing models, or implementing explainable AI. Works with tree-based models (XGBoost, LightGBM, Random Forest), deep learning (TensorFlow, PyTorch), linear models, and any black-box model
Pydeseq2
Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.
Query Writing
For writing and executing SQL queries - from simple single-table queries to complex multi-table JOINs and aggregations
Pydeseq2
Differential gene expression analysis (Python DESeq2). Identify DE genes from bulk RNA-seq counts, Wald tests, FDR correction, volcano/MA plots, for RNA-seq analysis.
Scientific Visualization
Meta-skill for publication-ready figures. Use when creating journal submission figures requiring multi-panel layouts, significance annotations, error bars, colorblind-safe palettes, and specific journal formatting (Nature, Science, Cell). Orchestrates matplotlib/seaborn/plotly with publication styles. For quick exploration use seaborn or plotly directly.
