Skip to content

Choosing Calibration Strategies

This guide helps you select the optimal calibration strategy for your conformal anomaly detection task based on dataset characteristics, computational constraints, and accuracy requirements.

Strategy Overview

nonconform provides four calibration strategies, each with distinct trade-offs:

Strategy Speed Accuracy Data Efficiency Best For
Split ⭐⭐⭐⭐ ⭐⭐ ⭐⭐ Large datasets, real-time
Jackknife+ ⭐⭐⭐ ⭐⭐⭐ ⭐⭐⭐ General purpose, balanced
Cross-Validation ⭐⭐ ⭐⭐⭐⭐ ⭐⭐⭐⭐ Small datasets, maximum accuracy
JackknifeBootstrap (JaB+) ⭐⭐⭐⭐ ⭐⭐⭐ Uncertainty quantification

Guarantee note: Strict finite-sample/theoretical guarantees are tied to "plus" variants (for example CV+, Jackknife+, JaB+).
Non-plus (mode="single_model") variants can be close in practice and lighter at inference time, but they do not provide the same strict guarantees.

Detailed Strategy Characteristics

Split Conformal

When to use: - Large training datasets (>5,000 samples) - Real-time or production environments requiring fast inference - When computational resources are limited - Initial prototyping and development

Advantages: - Fastest training and inference - Minimal memory usage - Simple to understand and implement - Predictable computational cost

Disadvantages: - Uses only a subset of data for calibration - May be less reliable with small datasets - No theoretical optimality guarantees

Configuration example:

from nonconform import Split

# For large datasets
strategy = Split(n_calib=0.2)  # Use 20% for calibration

# For fixed calibration size
strategy = Split(n_calib=2000)  # Use exactly 2000 samples

Jackknife+ Conformal

When to use: - Medium-sized datasets (1,000-10,000 samples) - When you need good accuracy without excessive computation - Production systems with moderate performance requirements - General-purpose applications

Advantages: - Uses all training data efficiently - Provides theoretical finite-sample guarantees - Good balance of speed and accuracy - Automatic calibration set sizing

Disadvantages: - More computationally expensive than Split - Memory usage scales with training set size - Cannot easily parallelize calibration

Configuration example:

from nonconform import CrossValidation

# Standard Jackknife+ (recommended) - use factory method
strategy = CrossValidation.jackknife(mode="plus")

# Regular Jackknife (less conservative)
strategy = CrossValidation.jackknife(mode="single_model")

Cross-Validation Conformal

When to use: - Small to medium datasets (<5,000 samples) - When maximum data efficiency is crucial - Research applications requiring robust results - When you have sufficient computational budget

Advantages: - Most efficient use of available data - Provides robust calibration estimates - Works well with limited training data - Theoretical guarantees with finite-sample corrections

Disadvantages: - Highest computational cost - Memory intensive for large datasets - Longer training times - Complex implementation

Configuration example:

from nonconform import CrossValidation

# Standard 5-fold CV+ (recommended)
strategy = CrossValidation(k=5, mode="plus")

# More folds for smaller datasets
strategy = CrossValidation(k=10, mode="plus")

# Faster alternative without plus correction
strategy = CrossValidation(k=3, mode="single_model")

Bootstrap Conformal

When to use: - When uncertainty quantification is critical - Research applications requiring statistical robustness - Noisy or heterogeneous training data - When computational cost is not a primary concern

Advantages: - Most robust calibration under model uncertainty - Provides distribution of calibration estimates - Works well with complex data distributions - Best theoretical properties

Disadvantages: - Highest computational cost - Requires careful tuning of bootstrap parameters - Memory intensive - Longest training times

Configuration example:

from nonconform import JackknifeBootstrap

# Standard JaB+ (typically 100+ bootstraps)
strategy = JackknifeBootstrap(n_bootstraps=100)

# High-precision JaB+ for research
strategy = JackknifeBootstrap(n_bootstraps=200)

# Fast JaB+ for prototyping
strategy = JackknifeBootstrap(n_bootstraps=50)

Decision Framework

1. Dataset Size Considerations

Large datasets (>10,000 samples): - Primary choice: Split (fast, efficient) - Alternative: JackknifeBootstrap (if speed is not the top priority)

Medium datasets (1,000-10,000 samples): - Primary choice: JackknifeBootstrap (balanced robustness and practicality) - Alternative: Jackknife+ (if you want lower compute than larger-bootstrap setups)

Small datasets (<1,000 samples): - Primary choice: Jackknife+ - Alternative: Jackknife (for the smallest datasets)

2. Performance Requirements

Real-time applications (latency <100ms): - Use Split conformal - Pre-compute calibration sets where possible - Consider caching fitted detectors

Batch processing (latency <10s): - Jackknife+ or JackknifeBootstrap - Optimize based on accuracy requirements

Offline analysis (no latency constraints): - Any strategy based on accuracy needs - JackknifeBootstrap for maximum robustness

3. Accuracy vs Speed Trade-offs

Maximum speed (production systems):

# Fastest configuration
strategy = Split(n_calib=1000)  # Fixed size for predictable performance

Balanced (general applications):

# Good robustness with practical defaults
strategy = JackknifeBootstrap(n_bootstraps=100)

Maximum accuracy (research/critical applications):

# Most robust but slower
strategy = JackknifeBootstrap(n_bootstraps=200)

Advanced Considerations

Data Distribution Properties

Exchangeable data (IID assumption holds): - All strategies work well - Choose based on computational constraints

Non-exchangeable data (distribution shift): - Consider weighted conformal detection - JackknifeBootstrap strategy may provide additional robustness - Monitor calibration performance over time

Heterogeneous data (mixed distributions): - JackknifeBootstrap recommended - Jackknife+ as alternative - Avoid Split with very diverse training sets

Computational Resource Planning

Memory constraints: - Split: O(n_calib) memory usage - Jackknife+: O(n_train) memory usage - Cross-Validation: O(k × n_test) inference peak; O(k) stored models + O(n_train) calibration scores - JackknifeBootstrap: O(n_train x n_bootstraps) memory usage (includes permanent _oob_mask storage)

CPU considerations: - Split: Single model training - Jackknife+: n_train + 1 model trainings - Cross-Validation: n_folds model trainings - JackknifeBootstrap: n_bootstraps model trainings

Strategy Transition Guide

From Research to Production

  1. Development phase: Use JackknifeBootstrap for robust results
  2. Validation phase: Compare with Jackknife+ for speed assessment
  3. Production phase: Deploy with Split for optimal performance
  4. Monitoring phase: Validate that Split maintains required accuracy

Handling Performance Degradation

If you observe degraded performance after strategy changes:

  1. Check calibration set size: Ensure adequate samples for reliable calibration
  2. Validate data assumptions: Verify exchangeability hasn't changed
  3. Monitor drift: Use weighted conformal if distribution shift detected
  4. Adjust parameters: Tune strategy-specific parameters

Common Pitfalls

Split Conformal

  • Don't: Use with very small datasets (<500 samples)
  • Don't: Use fixed small calibration sets with varying dataset sizes
  • Do: Use proportional calibration sizing for consistency

Jackknife+ Conformal

  • Don't: Use with extremely large datasets if memory is constrained
  • Don't: Forget that it requires n+1 model fits
  • Do: Enable parallel processing where available

Cross-Validation Conformal

  • Don't: Use too many folds with small datasets (overfitting risk)
  • Don't: Use without plus correction in critical applications
  • Do: Balance n_folds with computational budget

JackknifeBootstrap (JaB+) Conformal

  • Don't: Use too few bootstraps (<20) for robust estimates
  • Don't: Ignore bootstrap variance in interpretation
  • Do: Monitor convergence of bootstrap estimates

Benchmarking Your Choice

Always validate your strategy choice with performance metrics:

from nonconform import ConformalDetector, CrossValidation, JackknifeBootstrap, Split
from nonconform.metrics import false_discovery_rate, statistical_power

# Compare strategies on your data
strategies = {
    'Split': Split(n_calib=0.2),
    'Jackknife+': CrossValidation.jackknife(mode="plus"),
    'JaB+': JackknifeBootstrap(n_bootstraps=100)
}

for name, strategy in strategies.items():
    detector = ConformalDetector(
        detector=your_detector,
        strategy=strategy,
        seed=42
    )
    detector.fit(X_train)
    decisions = detector.select(X_test, alpha=0.1)

    # Evaluate FDR-controlled decisions
    fdr = false_discovery_rate(y_test, decisions)
    power = statistical_power(y_test, decisions)

    print(f"{name}: FDR={fdr:.3f}, Power={power:.3f}")

Choose the strategy that best meets your specific requirements for FDR control, statistical power, and computational performance.