Statistical Analysis
by benchflow-ai
Probability, distributions, hypothesis testing, and statistical inference. Use for A/B testing, experimental design, or statistical validation.
Skill Details
Repository Files
4 files in this skill directory
name: statistical-analysis description: Probability, distributions, hypothesis testing, and statistical inference. Use for A/B testing, experimental design, or statistical validation. sasmp_version: "1.3.0" bonded_agent: 02-mathematics-statistics bond_type: PRIMARY_BOND
Statistical Analysis
Apply statistical methods to understand data and validate findings.
Quick Start
from scipy import stats
import numpy as np
# Descriptive statistics
data = np.array([1, 2, 3, 4, 5])
print(f"Mean: {np.mean(data)}")
print(f"Std: {np.std(data)}")
# Hypothesis testing
group1 = [23, 25, 27, 29, 31]
group2 = [20, 22, 24, 26, 28]
t_stat, p_value = stats.ttest_ind(group1, group2)
print(f"P-value: {p_value}")
Core Tests
T-Test (Compare Means)
# One-sample: Compare to population mean
stats.ttest_1samp(data, 100)
# Two-sample: Compare two groups
stats.ttest_ind(group1, group2)
# Paired: Before/after comparison
stats.ttest_rel(before, after)
Chi-Square (Categorical Data)
from scipy.stats import chi2_contingency
observed = np.array([[10, 20], [15, 25]])
chi2, p_value, dof, expected = chi2_contingency(observed)
ANOVA (Multiple Groups)
f_stat, p_value = stats.f_oneway(group1, group2, group3)
Confidence Intervals
from scipy import stats
confidence_level = 0.95
mean = np.mean(data)
se = stats.sem(data)
ci = stats.t.interval(confidence_level, len(data)-1, mean, se)
print(f"95% CI: [{ci[0]:.2f}, {ci[1]:.2f}]")
Correlation
# Pearson (linear)
r, p_value = stats.pearsonr(x, y)
# Spearman (rank-based)
rho, p_value = stats.spearmanr(x, y)
Distributions
# Normal
x = np.linspace(-3, 3, 100)
pdf = stats.norm.pdf(x, loc=0, scale=1)
# Sampling
samples = np.random.normal(0, 1, 1000)
# Test normality
stat, p_value = stats.shapiro(data)
A/B Testing Framework
def ab_test(control, treatment, alpha=0.05):
"""
Run A/B test with statistical significance
Returns: significant (bool), p_value (float)
"""
t_stat, p_value = stats.ttest_ind(control, treatment)
significant = p_value < alpha
improvement = (np.mean(treatment) - np.mean(control)) / np.mean(control) * 100
return {
'significant': significant,
'p_value': p_value,
'improvement': f"{improvement:.2f}%"
}
Interpretation
P-value < 0.05: Reject null hypothesis (statistically significant)
P-value >= 0.05: Fail to reject null (not significant)
Common Pitfalls
- Multiple testing without correction
- Small sample sizes
- Ignoring assumptions (normality, independence)
- Confusing correlation with causation
- p-hacking (searching for significance)
Troubleshooting
Common Issues
Problem: Non-normal data for t-test
# Check normality first
stat, p = stats.shapiro(data)
if p < 0.05:
# Use non-parametric alternative
stat, p = stats.mannwhitneyu(group1, group2) # Instead of ttest_ind
Problem: Multiple comparisons inflating false positives
from statsmodels.stats.multitest import multipletests
# Apply Bonferroni correction
p_values = [0.01, 0.03, 0.04, 0.02, 0.06]
rejected, p_adjusted, _, _ = multipletests(p_values, method='bonferroni')
Problem: Underpowered study (sample too small)
from statsmodels.stats.power import TTestIndPower
# Calculate required sample size
power_analysis = TTestIndPower()
sample_size = power_analysis.solve_power(
effect_size=0.5, # Medium effect (Cohen's d)
power=0.8, # 80% power
alpha=0.05 # 5% significance
)
print(f"Required n per group: {sample_size:.0f}")
Problem: Heterogeneous variances
# Check with Levene's test
stat, p = stats.levene(group1, group2)
if p < 0.05:
# Use Welch's t-test (default in scipy)
t, p = stats.ttest_ind(group1, group2, equal_var=False)
Problem: Outliers affecting results
from scipy.stats import zscore
# Detect outliers (|z| > 3)
z_scores = np.abs(zscore(data))
clean_data = data[z_scores < 3]
# Or use robust statistics
median = np.median(data)
mad = np.median(np.abs(data - median)) # Median Absolute Deviation
Debug Checklist
- Check sample size adequacy (power analysis)
- Test normality assumption (Shapiro-Wilk)
- Test homogeneity of variance (Levene's)
- Check for outliers (z-scores, IQR)
- Apply multiple testing correction if needed
- Report effect sizes, not just p-values
Related Skills
Team Composition Analysis
This skill should be used when the user asks to "plan team structure", "determine hiring needs", "design org chart", "calculate compensation", "plan equity allocation", or requests organizational design and headcount planning for a startup.
Kpi Dashboard Design
Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.
Dbt Transformation Patterns
Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.
Sql Optimization Patterns
Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.
Senior Data Scientist
World-class data science skill for statistical modeling, experimentation, causal inference, and advanced analytics. Expertise in Python (NumPy, Pandas, Scikit-learn), R, SQL, statistical methods, A/B testing, time series, and business intelligence. Includes experiment design, feature engineering, model evaluation, and stakeholder communication. Use when designing experiments, building predictive models, performing causal analysis, or driving data-driven decisions.
Hypogenic
Automated hypothesis generation and testing using large language models. Use this skill when generating scientific hypotheses from datasets, combining literature insights with empirical data, testing hypotheses against observational data, or conducting systematic hypothesis exploration for research discovery in domains like deception detection, AI content detection, mental health analysis, or other empirical research tasks.
Mermaid Diagrams
Comprehensive guide for creating software diagrams using Mermaid syntax. Use when users need to create, visualize, or document software through diagrams including class diagrams (domain modeling, object-oriented design), sequence diagrams (application flows, API interactions, code execution), flowcharts (processes, algorithms, user journeys), entity relationship diagrams (database schemas), C4 architecture diagrams (system context, containers, components), state diagrams, git graphs, pie charts,
Ux Researcher Designer
UX research and design toolkit for Senior UX Designer/Researcher including data-driven persona generation, journey mapping, usability testing frameworks, and research synthesis. Use for user research, persona creation, journey mapping, and design validation.
Supabase Postgres Best Practices
Postgres performance optimization and best practices from Supabase. Use this skill when writing, reviewing, or optimizing Postgres queries, schema designs, or database configurations.
Hypogenic
Automated LLM-driven hypothesis generation and testing on tabular datasets. Use when you want to systematically explore hypotheses about patterns in empirical data (e.g., deception detection, content analysis). Combines literature insights with data-driven hypothesis testing. For manual hypothesis formulation use hypothesis-generation; for creative ideation use scientific-brainstorming.
