Pycse

by jkitchin

data

Use when performing regression analysis with confidence intervals, solving ODEs, fitting models to experimental data, or caching expensive scientific computations - provides convenient wrappers around scipy that automatically calculate confidence intervals and prediction bounds for linear, nonlinear, and polynomial regression

Skill Details

Repository Files

4 files in this skill directory


name: pycse description: Use when performing regression analysis with confidence intervals, solving ODEs, fitting models to experimental data, or caching expensive scientific computations - provides convenient wrappers around scipy that automatically calculate confidence intervals and prediction bounds for linear, nonlinear, and polynomial regression

pycse - Python Computations in Science and Engineering

Overview

pycse extends numpy/scipy with convenience functions that automatically return confidence intervals for regression, making statistical analysis faster and less error-prone. Instead of manually extracting covariance matrices and calculating confidence intervals, pycse returns them directly.

Core value: Turn 100+ lines of scipy boilerplate into 10 lines of clear, reusable code.

When to Use

Use pycse when:

  • Fitting models to experimental data and need parameter confidence intervals
  • Performing regression analysis (linear, nonlinear, polynomial)
  • Comparing models with statistical criteria (BIC, R²)
  • Generating predictions with error bounds
  • Caching expensive computational results
  • Reading data from Google Sheets into pandas
  • Solving ODEs (wraps scipy with convenient interface)

Don't use when:

  • scipy alone meets your needs (both are valid)
  • You need custom optimization beyond least squares
  • Working with models pycse doesn't support

Quick Reference

Task pycse Function Returns
Linear regression regress(A, y, alpha=0.05) p, pint, se
Nonlinear regression nlinfit(model, x, y, p0, alpha=0.05) p, pint, se
Polynomial fit polyfit(x, y, deg, alpha=0.05) p, pint, se
Prediction intervals predict(X, y, pars, XX, alpha=0.05) prediction, intervals
Nonlinear predict nlpredict(X, y, model, loss, popt, xnew) prediction, bounds
Model comparison bic(x, y, model, popt) bic_value
Linear BIC lbic(X, y, popt) bic_value
R-squared Rsquared(y, Y) r2_value
ODE solver ivp(f, tspan, y0, **kwargs) solution

All regression functions return: (p, pint, se) where:

  • p = fitted parameters
  • pint = confidence intervals for parameters
  • se = standard errors

Common Patterns

Nonlinear Regression with Confidence Intervals

import numpy as np
import pycse

# Data
time = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])
concentration = np.array([100, 82, 67, 55, 45, 37, 30, 25, 20, 17, 14])

# Model: C(t) = C0 * exp(-k * t)
def model(t, C0, k):
    return C0 * np.exp(-k * t)

# Fit with 95% confidence intervals
p, pint, se = pycse.nlinfit(model, time, concentration, [100, 0.1])

print(f"C0 = {p[0]:.2f} ± {pint[0,1] - p[0]:.2f}")
print(f"k = {p[1]:.4f} ± {pint[1,1] - p[1]:.4f}")

# That's it! No manual covariance extraction or t-distribution calculations.

Compare to scipy: Would require extracting covariance, calculating standard errors, looking up t-distribution, computing intervals manually (~50+ lines).

Linear Regression

import numpy as np
import pycse

# Data matrix A and observations y
A = np.array([[1, 2], [1, 3], [1, 4], [1, 5]])  # [intercept, x]
y = np.array([3, 5, 7, 9])

# Fit: y = p[0] + p[1]*x
p, pint, se = pycse.regress(A, y)

print(f"Intercept: {p[0]:.2f}, 95% CI: [{pint[0,0]:.2f}, {pint[0,1]:.2f}]")
print(f"Slope: {p[1]:.2f}, 95% CI: [{pint[1,0]:.2f}, {pint[1,1]:.2f}]")

Polynomial Fitting

import numpy as np
import pycse

x = np.array([1, 2, 3, 4, 5, 6, 7, 8])
y = np.array([1.5, 3.8, 8.2, 14.9, 23.5, 34.8, 48.2, 64.1])

# Fit quadratic: y = p[0] + p[1]*x + p[2]*x^2
p, pint, se = pycse.polyfit(x, y, deg=2)

print(f"Coefficients: {p}")
print(f"95% CI: {pint}")

Prediction with Error Bounds

import numpy as np
import pycse

# After fitting (see above examples)
x_new = np.array([11, 12, 13])

# Linear prediction
X_new = np.column_stack([np.ones(len(x_new)), x_new])
y_pred, intervals = pycse.predict(A, y, p, X_new)

print(f"Predictions: {y_pred}")
print(f"95% intervals: {intervals}")

# Nonlinear prediction
y_pred_nl, bounds = pycse.nlpredict(time, concentration, model,
                                     lambda p: np.sum((concentration - model(time, *p))**2),
                                     p, x_new)

Model Comparison

import pycse

# Fit two models
p1, _, _ = pycse.polyfit(x, y, deg=1)  # Linear
p2, _, _ = pycse.polyfit(x, y, deg=2)  # Quadratic

# Compare with BIC (lower is better)
bic1 = pycse.lbic(X1, y, p1)
bic2 = pycse.lbic(X2, y, p2)

print(f"Linear BIC: {bic1:.2f}")
print(f"Quadratic BIC: {bic2:.2f}")
print(f"Better model: {'Quadratic' if bic2 < bic1 else 'Linear'}")

# R-squared for goodness of fit
r2 = pycse.Rsquared(y, model(x, *p))
print(f"R² = {r2:.4f}")

Unique Features

Persistent Hash-based Caching

Cache expensive computations to disk - especially valuable for molecular dynamics, DFT calculations, or long-running simulations.

from pycse.hashcache import HashCache, JsonCache, SqlCache

# Decorator approach
@HashCache()
def expensive_simulation(param1, param2):
    # Long-running computation
    result = complex_calculation(param1, param2)
    return result

# First call: runs computation and caches
result1 = expensive_simulation(1.0, 2.0)

# Second call with same args: retrieves from cache (instant)
result2 = expensive_simulation(1.0, 2.0)

# SqlCache supports searching cached results
@SqlCache(name='my_sim_cache')
def simulation(x, y):
    return complex_calc(x, y)

# Search cache
cache = SqlCache(name='my_sim_cache')
results = cache.search({'x': 1.0})  # Find all cached results where x=1.0

Cache types:

  • HashCache: Pickle-based (fastest)
  • JsonCache: JSON format (human-readable, maggma-compatible)
  • SqlCache: SQLite with search() capability

Google Sheets Integration

from pycse.utils import read_gsheet

# Read Google Sheet directly into pandas DataFrame
url = "https://docs.google.com/spreadsheets/d/YOUR_SHEET_ID/edit"
df = pycse.utils.read_gsheet(url)

# Now use with pycse functions
x = df['time'].values
y = df['concentration'].values
p, pint, se = pycse.nlinfit(model, x, y, p0)

Fuzzy Comparisons

For floating-point comparisons with tolerance:

from pycse.utils import feq, fgt, flt, fge, fle

# Check if value is "close enough" to target
if pycse.utils.feq(calculated_pi, np.pi, epsilon=1e-6):
    print("Converged!")

# Fuzzy comparisons
if pycse.utils.fgt(value, threshold, epsilon=1e-8):
    print("Value exceeds threshold (within tolerance)")

Installation

pip install pycse

Requirements: Python 3.6+, numpy, scipy

Common Mistakes

❌ Forgetting initial guess for nonlinear fit:

# Will fail - nlinfit needs initial parameter guess
p, pint, se = pycse.nlinfit(model, x, y)  # Missing p0!

✅ Correct:

p, pint, se = pycse.nlinfit(model, x, y, p0=[100, 0.1])

❌ Wrong shape for regress():

# regress expects A to be 2D with shape (n_observations, n_parameters)
A = x  # 1D array - wrong!
p, pint, se = pycse.regress(A, y)

✅ Correct:

# Add column for intercept
A = np.column_stack([np.ones(len(x)), x])  # Shape: (n, 2)
p, pint, se = pycse.regress(A, y)

When pycse vs scipy

Use pycse when:

  • You need confidence intervals (pycse returns them automatically)
  • Doing many regressions in a workflow (consistent interface)
  • Want prediction intervals with error bounds
  • Need caching for expensive computations
  • Integrating with Google Sheets

Use scipy when:

  • You need custom optimization methods
  • Doing complex constrained optimization
  • Need features pycse doesn't expose
  • Building low-level computational tools

Both are valid! pycse wraps scipy for convenience, not replacement.

Additional Resources

Related Skills

Xlsx

Comprehensive spreadsheet creation, editing, and analysis with support for formulas, formatting, data analysis, and visualization. When Claude needs to work with spreadsheets (.xlsx, .xlsm, .csv, .tsv, etc) for: (1) Creating new spreadsheets with formulas and formatting, (2) Reading or analyzing data, (3) Modify existing spreadsheets while preserving formulas, (4) Data analysis and visualization in spreadsheets, or (5) Recalculating formulas

data

Clickhouse Io

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

datacli

Clickhouse Io

ClickHouse database patterns, query optimization, analytics, and data engineering best practices for high-performance analytical workloads.

datacli

Analyzing Financial Statements

This skill calculates key financial ratios and metrics from financial statement data for investment analysis

data

Data Storytelling

Transform data into compelling narratives using visualization, context, and persuasive structure. Use when presenting analytics to stakeholders, creating data reports, or building executive presentations.

data

Kpi Dashboard Design

Design effective KPI dashboards with metrics selection, visualization best practices, and real-time monitoring patterns. Use when building business dashboards, selecting metrics, or designing data visualization layouts.

designdata

Dbt Transformation Patterns

Master dbt (data build tool) for analytics engineering with model organization, testing, documentation, and incremental strategies. Use when building data transformations, creating data models, or implementing analytics engineering best practices.

testingdocumenttool

Sql Optimization Patterns

Master SQL query optimization, indexing strategies, and EXPLAIN analysis to dramatically improve database performance and eliminate slow queries. Use when debugging slow queries, designing database schemas, or optimizing application performance.

designdata

Anndata

This skill should be used when working with annotated data matrices in Python, particularly for single-cell genomics analysis, managing experimental measurements with metadata, or handling large-scale biological datasets. Use when tasks involve AnnData objects, h5ad files, single-cell RNA-seq data, or integration with scanpy/scverse tools.

arttooldata

Xlsx

Spreadsheet toolkit (.xlsx/.csv). Create/edit with formulas/formatting, analyze data, visualization, recalculate formulas, for spreadsheet processing and analysis.

tooldata

Skill Information

Category:Data
Last Updated:11/8/2025