Weighted Conformal P-values¶
Handle distribution shift between training and test data while maintaining statistical guarantees.
Executive Summary
When to use: Your test data comes from a different distribution than your training data (e.g., different time period, different sensor, different domain).
How it works: Weighted conformal prediction estimates how much the distributions differ and reweights the calibration data accordingly.
Quick start:
from nonconform import ConformalDetector, Split, logistic_weight_estimator
detector = ConformalDetector(
detector=your_detector,
strategy=Split(n_calib=0.3),
weight_estimator=logistic_weight_estimator(), # Add this
seed=42
)
Key assumption: Only the feature distribution P(X) changes—the relationship between features and anomaly status P(Y|X) must stay the same. You also need sufficient feature-support overlap between calibration and test data; if distributions are too far apart, weighting can become unstable and guarantees can degrade.
Overview¶
Weighted conformal p-values extend classical conformal prediction to handle covariate shift scenarios [Jin & Candès, 2023; Tibshirani et al., 2019]. Key assumption: the marginal distribution P(X) may change between calibration and test data, while the conditional distribution P(Y|X) – the relationship between features and anomaly status – remains constant. This assumption is crucial for the validity of weighted conformal inference. A second practical requirement is sufficient support overlap between calibration and test feature distributions; when shift is too extreme, estimated density ratios become unstable and weighted conformal adjustment may fail. When assumptions hold you can pair the p-values with Weighted Conformal Selection (WCS) to obtain rigorous False Discovery Rate control under distribution shift [Jin & Candès, 2023].
The ConformalDetector with a weight_estimator parameter automatically estimates importance weights to distinguish between calibration and test samples, then uses these weights to compute adjusted p-values.
Basic Usage¶
import numpy as np
from nonconform import ConformalDetector, Split, logistic_weight_estimator
from pyod.models.lof import LOF
# Initialize base detector
base_detector = LOF()
strategy = Split(n_calib=0.2)
# Create weighted conformal detector
detector = ConformalDetector(
detector=base_detector,
strategy=strategy,
aggregation="median",
weight_estimator=logistic_weight_estimator(),
seed=42,
)
# Fit on training data and get weighted p-values
# By default, prediction refits the weight model for each batch
p_values = detector.fit(X_train).compute_p_values(X_test)
How It Works¶
The weighted conformal method works through the following steps:
1. Calibration¶
During fitting, the detector: - Uses the specified strategy to split data and train models - Computes calibration scores on held-out calibration data - Stores calibration samples for later weight computation
2. Weight Estimation¶
During prediction, the detector: - Fits the configured likelihood-ratio estimator (typically a probabilistic binary domain classifier) to distinguish calibration from test samples - Uses predicted probabilities/scores from that estimator to compute importance weights - Applies weights to both calibration and test instances
For explicit control of this state transition, you can precompute weights once and reuse them:
detector.fit(X_train)
detector.prepare_weights_for(X_test_shifted)
p_values = detector.compute_p_values(X_test_shifted, refit_weights=False)
By default, this reuse path verifies exact batch content identity. If you need
maximum throughput on very large batches and can guarantee your own batch identity
discipline, set verify_prepared_batch_content=False when constructing
ConformalDetector to validate only batch size.
3. Weighted P-value Calculation¶
The p-values are computed using weighted empirical distribution functions. By default, nonconform uses the classical (non-randomized) formula. The randomized variant [Jin & Candès, 2023] handles ties more gracefully:
# Randomized weighted p-value calculation (Jin & Candes 2023)
import numpy as np
def weighted_p_value(test_score, calibration_scores, calibration_weights, test_weight):
"""
Calculate weighted conformal p-value with randomized tie handling.
"""
# Count calibration scores strictly greater than test score
weighted_greater = np.sum(calibration_weights[calibration_scores > test_score])
# Handle ties: add random fraction of tied weights
tied_weights = np.sum(calibration_weights[calibration_scores == test_score])
u = np.random.uniform(0, 1)
# Randomized formula: strictly greater + U * (tied + test weight)
numerator = weighted_greater + u * (tied_weights + test_weight)
denominator = np.sum(calibration_weights) + test_weight
return numerator / denominator
Classical vs. Randomized
By default, Empirical() uses tie_break="classical" (non-randomized formula). Valid values are "classical" and "randomized" (or TieBreakMode.CLASSICAL / TieBreakMode.RANDOMIZED). None is not valid. For randomized smoothing as shown above, use Empirical(tie_break="randomized"). Note that with small calibration sets, randomized smoothing can produce anti-conservative p-values.
When to Use Weighted Conformal¶
Covariate Shift Scenarios¶
Use weighted conformal detection when the shift is primarily in P(X) and not in P(Y|X), for example:
- Domain Adaptation: Training on one domain, testing on another with stable anomaly mechanism
- Sampling/Selection Shift: Deployment sampling differs from calibration sampling (population mix changes)
- Subgroup Mixture Shift: Different subgroup prevalence between calibration and test data
- Time-based Deployment Changes: Different time periods, only if the change is mostly covariate shift and
P(Y|X)is still approximately stable
Do not treat generic temporal drift as automatically suitable for weighted conformal. If the anomaly mechanism itself changes (P(Y|X) shift), weighting alone is insufficient.
Examples Where Covariate Shift May Occur¶
# Example 1: Time-separated data
# Use this only if P(Y|X) is approximately stable across periods
detector.fit(X_train_2020)
p_values_2024 = detector.compute_p_values(X_test_2024)
# Example 2: Geographic shift
# Training on US data, testing on European data
detector.fit(X_us)
p_values_europe = detector.compute_p_values(X_europe)
# Example 3: Sensor/population shift
# Suitable when feature distribution changed but anomaly semantics stayed stable
detector.fit(X_before_drift)
p_values_after_drift = detector.compute_p_values(X_after_drift)
Comparison with Standard Conformal¶
# Standard conformal detector
standard_detector = ConformalDetector(
detector=base_detector,
strategy=strategy,
aggregation="median",
seed=42
)
# Weighted conformal detector
weighted_detector = ConformalDetector(
detector=base_detector,
strategy=strategy,
aggregation="median",
weight_estimator=logistic_weight_estimator(),
seed=42,
)
# Fit both on training data
standard_detector.fit(X_train)
weighted_detector.fit(X_train)
# Compare on shifted test data using the current one-step API
from nonconform.enums import Pruning
standard_mask = standard_detector.select(X_test_shifted, alpha=0.05)
weighted_mask = weighted_detector.select(
X_test_shifted,
alpha=0.05,
pruning=Pruning.DETERMINISTIC,
seed=42,
)
print(f"Standard conformal detections: {standard_mask.sum()}")
print(f"Weighted conformal detections: {weighted_mask.sum()}")
Different Aggregation Strategies¶
The choice of aggregation method can affect performance under distribution shift:
# Compare different aggregation methods
from nonconform.enums import Pruning
from nonconform.fdr import weighted_false_discovery_control
aggregation_methods = [
"mean",
"median",
"maximum",
]
for agg_method in aggregation_methods:
detector = ConformalDetector(
detector=base_detector,
strategy=strategy,
aggregation=agg_method,
weight_estimator=logistic_weight_estimator(),
seed=42,
)
detector.fit(X_train)
_ = detector.compute_p_values(X_test_shifted)
wcs_mask = weighted_false_discovery_control(
result=detector.last_result,
alpha=0.05,
pruning=Pruning.DETERMINISTIC,
seed=42,
)
print(f"{agg_method}: {wcs_mask.sum()} discoveries")
Note: Aggregation is applied to the raw anomaly scores from each model before conformal p-values are computed. P-values are not averaged; the aggregated score is turned into a single p-value per point.
Weight Estimators¶
nonconform provides two weight estimator factory functions for handling covariate shift:
logistic_weight_estimator¶
Uses logistic regression to estimate likelihood ratios between calibration and test distributions:
from nonconform import logistic_weight_estimator
detector = ConformalDetector(
detector=base_detector,
strategy=strategy,
aggregation="median",
weight_estimator=logistic_weight_estimator(),
seed=42,
)
When to use: - Linear or moderately complex distribution shifts - High-dimensional data where interpretability matters - Fast weight estimation is needed - Default choice for most applications
Parameters:
- regularization: Regularization strength ('auto' or float C value)
- clip_quantile: Quantile for weight clipping (default: 0.05). Set to None to disable clipping.
- class_weight: Class weights for LogisticRegression (default: 'balanced')
- max_iter: Maximum iterations (default: 1000)
forest_weight_estimator¶
Uses random forest classification to estimate likelihood ratios:
from nonconform import forest_weight_estimator
detector = ConformalDetector(
detector=base_detector,
strategy=strategy,
aggregation="median",
weight_estimator=forest_weight_estimator(n_estimators=100, max_depth=10),
seed=42,
)
When to use: - Complex, non-linear distribution shifts - Feature interactions are important - More robust to outliers in feature space - When you have sufficient calibration data (hundreds+ samples)
Parameters:
- n_estimators: Number of trees (default: 100)
- max_depth: Maximum tree depth (default: 5)
- min_samples_leaf: Minimum samples in leaf (default: 10)
- clip_quantile: Quantile for weight clipping (default: 0.05). Set to None to disable clipping.
Comparison¶
# Compare weight estimators on complex shift
from nonconform import logistic_weight_estimator, forest_weight_estimator
estimators = {
'Logistic': logistic_weight_estimator(),
'Forest': forest_weight_estimator(n_estimators=100),
}
for name, weight_est in estimators.items():
detector = ConformalDetector(
detector=base_detector,
strategy=Split(n_calib=0.2),
aggregation="median",
weight_estimator=weight_est,
seed=42,
)
detector.fit(X_train)
_ = detector.compute_p_values(X_test_shifted)
wcs_mask = weighted_false_discovery_control(
result=detector.last_result,
alpha=0.05,
pruning=Pruning.DETERMINISTIC,
seed=42,
)
print(f"{name}: {wcs_mask.sum()} discoveries")
General recommendations:
- Start with logistic_weight_estimator() (faster, more interpretable)
- Switch to forest_weight_estimator() if:
- Distribution shift is highly non-linear
- You have >500 calibration samples
- Logistic weights show poor discrimination
BootstrapBaggedWeightEstimator¶
Wraps any base weight estimator with bootstrap bagging for improved stability in extreme imbalance scenarios. It is most relevant when the calibration set is much larger than the test batch, where standalone importance weights can become spiky and overly influential:
BootstrapBaggedWeightEstimator currently uses scoring_mode="frozen" (default and only supported mode). After fit(calibration_samples, test_samples), it can return stored weights only for that exact calibration/test batch pair; scoring arbitrary new batches requires refitting.
from nonconform import forest_weight_estimator
from nonconform.weighting import BootstrapBaggedWeightEstimator
# Bootstrap bagging with forest base (best for extreme imbalance)
weight_est = BootstrapBaggedWeightEstimator(
base_estimator=forest_weight_estimator(n_estimators=50),
n_bootstraps=50,
clip_quantile=0.05,
)
detector = ConformalDetector(
detector=base_detector,
strategy=Split(n_calib=1000),
aggregation="median",
weight_estimator=weight_est,
seed=42,
)
How It Works¶
Bootstrap bagging creates an ensemble of weight estimators:
- For each bootstrap iteration (n_bootstraps times):
- Resample both calibration and test sets to balanced size
- Fit the base estimator on the bootstrap sample
- Score ALL original instances (perfect coverage)
-
Store log-weights for each instance
-
After all iterations:
- Aggregate using geometric mean (exp of mean log-weights)
- Apply clipping to maintain bounded weights
Every instance receives exactly n_bootstraps weight estimates, ensuring symmetric coverage regardless of set size ratios.
When to Use¶
DO use BootstrapBaggedWeightEstimator when:
- Extreme imbalance: Large calibration set (>1000) with small test batches (<50)
- Common in online/streaming detection
-
Example: 1000 calibration samples, 25 test instances
-
High-stakes applications: Where weight quality is critical
- Medical diagnosis with small patient batches
- Fraud detection with limited transactions
-
Safety-critical systems
-
Severe distribution shift: When base estimators produce extreme weights
DO NOT use for:
- Balanced or moderate imbalance: Marginal benefit (2-3% improvement) doesn't justify 2-5x computational overhead
- Large test sets: Benefits diminish with larger batches
- Latency-sensitive production: Significant computational cost (20-50x slower)
Performance Benchmarks¶
Empirical testing shows context-dependent value:
Balanced Scenario (1000 calib vs 1000 test)¶
| Metric | Base | Bagged-50 | Improvement |
|---|---|---|---|
| Weight Std | 2.884 | 2.957 | -2.5% (worse) |
| Extreme Weights | 0 | 0 | No change |
| Time | 0.14s | 0.34s | 2.4x slower |
Verdict: Not recommended for balanced sets.
Extreme Imbalance (1000 calib vs 25 test)¶
| Metric | Logistic Base | Logistic Bagged-50 | Improvement |
|---|---|---|---|
| Weight Std | 1.604 | 0.841 | 48% better |
| Extreme Weights | 612 | 385 | 37% reduction |
| Recall | 0.067 | 0.200 | 3x better |
| Time | 0.14s | 0.34s | 2.4x slower |
| Metric | Forest Base | Forest Bagged-50 | Improvement |
|---|---|---|---|
| Weight Std | 0.153 | 0.259 | Slightly higher but stable |
| Extreme Weights | 599 | 0 | 100% elimination |
| Recall | 0.333 | 1.000 | Perfect detection |
| FDR | 0.000 | 0.190 | Acceptable trade-off |
| Time | 0.24s | 6.4s | 27x slower |
Verdict: Strongly recommended for extreme imbalance. Best combination: forest_weight_estimator + Bagging.
Configuration Parameters¶
n_bootstraps (default: 100): - Number of bootstrap iterations - Higher = more stable, but slower - Recommended: 20-50 for small test batches, 50-100 for critical applications
clip_quantile (default: 0.05): - Adaptive quantile-based clipping - Clips to (quantile, 1-quantile) percentiles - Use when weight distribution is unknown - Set to None to disable clipping
Advanced Example: Streaming Detection¶
For online/streaming anomaly detection with small batches:
from nonconform import ConformalDetector, Split, forest_weight_estimator
from nonconform.enums import Pruning
from nonconform.fdr import weighted_false_discovery_control
from nonconform.weighting import BootstrapBaggedWeightEstimator
from pyod.models.iforest import IForest
# Configuration for small batch streaming
weight_est = BootstrapBaggedWeightEstimator(
base_estimator=forest_weight_estimator(n_estimators=50, max_depth=10),
n_bootstraps=50,
clip_quantile=0.05, # Adaptive clipping
)
detector = ConformalDetector(
detector=IForest(),
strategy=Split(n_calib=1000), # Large calibration set
aggregation="median",
weight_estimator=weight_est,
seed=42,
)
# Train on historical data
detector.fit(X_historical)
# Process small incoming batches
for X_batch in stream_data(batch_size=25):
p_values = detector.compute_p_values(X_batch)
# Apply weighted FDR control
discoveries = weighted_false_discovery_control(
result=detector.last_result,
alpha=0.1,
pruning=Pruning.DETERMINISTIC,
seed=42
)
print(f"Detected {discoveries.sum()} anomalies in batch of {len(X_batch)}")
Cost-Benefit Analysis¶
| Configuration | Time | Quality | Use Case |
|---|---|---|---|
| Logistic (Base) | 0.14s | Baseline | Standard balanced scenarios |
| Logistic + Bagging(50) | 0.34s | +48% weight stability | Moderate imbalance, quality focus |
| Forest (Base) | 0.24s | Good for non-linear | Standard scenarios |
| Forest + Bagging(50) | 6.4s | Perfect detection | Extreme imbalance, premium quality |
Recommendation: Use forest_weight_estimator + BootstrapBaggedWeightEstimator when:
- Calibration set is 40x larger than test batch (e.g., 1000:25)
- Missing anomalies is very costly
- Computational budget allows 20-50x overhead
- Online/streaming detection with small batches
Decision Guide¶
Which weight estimator should I use?
┌─ Is your test batch very small (<50) AND calibration large (>1000)?
│
├─ YES → BootstrapBaggedWeightEstimator(
│ forest_weight_estimator(50), n_bootstraps=50
│ )
│ Cost: High (6-7s), Quality: Best (perfect detection)
│
└─ NO → Standard weight estimators
│
├─ Linear/moderate shift → logistic_weight_estimator()
│ Cost: Low (0.14s), Quality: Good
│
└─ Complex/non-linear shift → forest_weight_estimator(50)
Cost: Medium (0.24s), Quality: Better
Strategy Selection¶
Different strategies can be used with weighted conformal detection:
from nonconform import CrossValidation, JackknifeBootstrap
# JaB+ strategy for stability
jab_strategy = JackknifeBootstrap(n_bootstraps=50)
jab_detector = ConformalDetector(
detector=base_detector,
strategy=jab_strategy,
aggregation="median",
weight_estimator=logistic_weight_estimator(),
seed=42
)
# Cross-validation strategy for efficiency
cv_strategy = CrossValidation(k=5)
cv_detector = ConformalDetector(
detector=base_detector,
strategy=cv_strategy,
aggregation="median",
weight_estimator=logistic_weight_estimator(),
seed=42
)
Weighted Conformal Selection¶
Weighted conformal p-values are valid on their own. To obtain finite-sample FDR control under covariate shift, combine them with Weighted Conformal Selection (WCS) [Jin & Candès, 2023]:
from nonconform.enums import Pruning
from nonconform.fdr import weighted_false_discovery_control
# Collect weighted p-values and cached statistics
weighted_detector.compute_p_values(X_test_shifted)
wcs_mask = weighted_false_discovery_control(
result=weighted_detector.last_result,
alpha=0.05,
pruning=Pruning.DETERMINISTIC,
seed=42,
)
print(f"WCS-selected anomalies: {wcs_mask.sum()} of {len(wcs_mask)}")
After any call to compute_p_values() or score_samples(), the detector caches
the relevant arrays (p_values, scores, weights) inside detector.last_result.
Passing this object to weighted_false_discovery_control avoids plumbing the raw
arrays manually.
For explicit array-first workflows, use:
from nonconform.enums import Pruning
from nonconform.fdr import (
weighted_false_discovery_control_from_arrays,
)
from nonconform.scoring import calculate_weighted_p_val
# WCS from precomputed p-values + arrays
wcs_from_arrays = weighted_false_discovery_control_from_arrays(
p_values=weighted_detector.last_result.p_values,
test_scores=weighted_detector.last_result.test_scores,
calib_scores=weighted_detector.last_result.calib_scores,
test_weights=weighted_detector.last_result.test_weights,
calib_weights=weighted_detector.last_result.calib_weights,
alpha=0.05,
pruning=Pruning.DETERMINISTIC,
seed=42,
)
# WCS with explicit empirical p-value computation
p_values_empirical = calculate_weighted_p_val(
scores=weighted_detector.last_result.test_scores,
calibration_set=weighted_detector.last_result.calib_scores,
test_weights=weighted_detector.last_result.test_weights,
calib_weights=weighted_detector.last_result.calib_weights,
tie_break="classical",
)
wcs_empirical = weighted_false_discovery_control_from_arrays(
p_values=p_values_empirical,
test_scores=weighted_detector.last_result.test_scores,
calib_scores=weighted_detector.last_result.calib_scores,
test_weights=weighted_detector.last_result.test_weights,
calib_weights=weighted_detector.last_result.calib_weights,
alpha=0.05,
pruning=Pruning.DETERMINISTIC,
)
Pruning Modes¶
The pruning parameter controls how ties and randomization are handled in the WCS procedure [Jin & Candès, 2023]:
Pruning.DETERMINISTIC¶
wcs_mask = weighted_false_discovery_control(
result=weighted_detector.last_result,
alpha=0.05,
pruning=Pruning.DETERMINISTIC,
seed=42, # seed has no effect for deterministic mode
)
Behavior: Uses a deterministic threshold without randomization. When there are tied p-values at the threshold, includes all or none based on deterministic rule.
When to use: - Reproducibility is critical - You don't want any randomness in selections - Reporting results that must be exactly reproducible
Trade-off: May be slightly conservative (reject fewer hypotheses) compared to randomized methods.
Pruning.HOMOGENEOUS¶
wcs_mask = weighted_false_discovery_control(
result=weighted_detector.last_result,
alpha=0.05,
pruning=Pruning.HOMOGENEOUS,
seed=42, # controls randomization
)
Behavior: Draws a single uniform random variable and applies the same randomized threshold to all test instances. Handles ties by probabilistically including tied instances.
When to use: - Default randomized method - Want exact FDR control in expectation - Acceptable to have some randomness
Trade-off: Less conservative than DETERMINISTIC, but results vary across random seeds.
Pruning.HETEROGENEOUS¶
wcs_mask = weighted_false_discovery_control(
result=weighted_detector.last_result,
alpha=0.05,
pruning=Pruning.HETEROGENEOUS,
seed=42, # controls randomization
)
Behavior: Draws independent uniform random variables for each test instance. Provides the most flexible randomization.
When to use: - Maximum power (fewer false negatives) - Most aggressive FDR control - Research settings where slight variance is acceptable
Trade-off: Highest variance across random seeds, but best expected power.
Comparison of Pruning Methods¶
from nonconform.enums import Pruning
from nonconform.fdr import weighted_false_discovery_control
pruning_methods = [
Pruning.DETERMINISTIC,
Pruning.HOMOGENEOUS,
Pruning.HETEROGENEOUS
]
weighted_detector.compute_p_values(X_test_shifted)
for pruning_method in pruning_methods:
wcs_mask = weighted_false_discovery_control(
result=weighted_detector.last_result,
alpha=0.05,
pruning=pruning_method,
seed=42,
)
print(f"{pruning_method.name}: {wcs_mask.sum()} detections")
Expected relationship: Typically HETEROGENEOUS ≥ HOMOGENEOUS ≥ DETERMINISTIC in terms of number of detections, though this can vary with data.
Performance Considerations¶
Computational Cost¶
Weighted conformal detection has additional overhead: - Weight estimation via logistic regression - Weighted p-value computation
import time
# Compare computation times
def time_detector(detector, X_train, X_test):
start_time = time.time()
detector.fit(X_train)
fit_time = time.time() - start_time
start_time = time.time()
p_values = detector.compute_p_values(X_test)
predict_time = time.time() - start_time
return fit_time, predict_time
# Standard vs Weighted timing
standard_fit, standard_pred = time_detector(standard_detector, X_train, X_test)
weighted_fit, weighted_pred = time_detector(weighted_detector, X_train, X_test)
print(f"Standard: Fit={standard_fit:.2f}s, Predict={standard_pred:.2f}s")
print(f"Weighted: Fit={weighted_fit:.2f}s, Predict={weighted_pred:.2f}s")
print(f"Overhead: {((weighted_fit + weighted_pred) / (standard_fit + standard_pred) - 1) * 100:.1f}%")
Memory Usage¶
Weighted conformal detection requires storing: - Calibration samples for weight computation - Calibration scores for p-value calculation
For large datasets, consider: - Using a subset of calibration samples for weight estimation - Implementing online/streaming versions
Best Practices¶
1. Validate Distribution Shift¶
Always check if distribution shift is actually present:
# Use statistical tests to detect shift
from scipy.stats import ks_2samp
def detect_feature_shift(X_train, X_test):
"""Detect distribution shift in individual features."""
shift_detected = []
p_values = []
for i in range(X_train.shape[1]):
statistic, p_value = ks_2samp(X_train[:, i], X_test[:, i])
shift_detected.append(p_value < 0.05)
p_values.append(p_value)
print(f"Features with significant shift: {sum(shift_detected)}/{len(shift_detected)}")
return shift_detected, p_values
shift_features, shift_p_values = detect_feature_shift(X_train, X_test_shifted)
2. Combine with Weighted Conformal Selection¶
from nonconform.enums import Pruning
from nonconform.fdr import weighted_false_discovery_control
weighted_p_values = weighted_detector.compute_p_values(X_test_shifted)
wcs_mask = weighted_false_discovery_control(
result=weighted_detector.last_result,
alpha=0.05,
pruning=Pruning.DETERMINISTIC,
seed=42,
)
print(f"WCS-controlled discoveries: {wcs_mask.sum()}")
3. Monitor Weight Quality¶
Extreme weights can indicate poor weight estimation:
def check_weight_quality(detector, X_calib, X_test):
"""Check for extreme weights that might indicate poor estimation."""
# This is a conceptual example - actual implementation would require
# access to the internal weights computed by the detector
# Rule of thumb: weights should typically be between 0.1 and 10
# Extreme weights (< 0.01 or > 100) suggest problems
pass
4. Use Appropriate Base Detectors¶
Some detectors work better with weighted conformal: - Good: Distance-based methods (LOF, KNN) that are sensitive to distribution - Moderate: Tree-based methods (Isolation Forest) that are somewhat robust - Challenging: Neural networks that might already adapt to shift
Advanced Applications¶
Multi-domain Adaptation¶
# Handle multiple domains with different shift patterns
domains = ['domain_A', 'domain_B', 'domain_C']
domain_detectors = {}
for domain in domains:
detector = ConformalDetector(
detector=base_detector,
strategy=strategy,
aggregation="median",
weight_estimator=logistic_weight_estimator(),
seed=42
)
detector.fit(X_train) # Common training set
domain_detectors[domain] = detector
# Predict on domain-specific test sets with WCS
from nonconform.enums import Pruning
from nonconform.fdr import weighted_false_discovery_control
for domain in domains:
X_test_domain = load_domain_data(domain) # Load domain-specific test data
_ = domain_detectors[domain].compute_p_values(X_test_domain)
wcs_mask = weighted_false_discovery_control(
result=domain_detectors[domain].last_result,
alpha=0.05,
pruning=Pruning.DETERMINISTIC,
seed=42,
)
print(f"{domain}: {wcs_mask.sum()} discoveries")
Online Adaptation¶
from nonconform.enums import Pruning
from nonconform.fdr import weighted_false_discovery_control
# Adapt to gradual distribution shift over time
def online_weighted_detection(detector, data_stream, window_size=1000):
"""Online weighted conformal detection with sliding window."""
detections = []
for i, (X_batch, _) in enumerate(data_stream):
if i == 0:
# Initialize with first batch
detector.fit(X_batch)
else:
# Use sliding window for calibration
if i * len(X_batch) > window_size:
start_idx = (i * len(X_batch)) - window_size
X_calib = get_recent_data(start_idx, window_size)
detector.fit(X_calib)
# Predict on current batch with WCS
_ = detector.compute_p_values(X_batch)
wcs_mask = weighted_false_discovery_control(
result=detector.last_result,
alpha=0.05,
pruning=Pruning.DETERMINISTIC,
seed=42,
)
detections.append(wcs_mask.sum())
return detections
Troubleshooting¶
Common Issues¶
- Poor Weight Estimation
- Insufficient calibration data
- High-dimensional data with small samples
-
Solution: Increase calibration size or use dimensionality reduction
-
Extreme P-values
- All p-values near 0 or 1
-
Solution: Check for severe distribution shift or model mismatch
-
Inconsistent Results
- High variance in detection counts
- Solution: Use bootstrap strategy or increase sample size
Debugging Tools¶
def debug_weighted_conformal(detector, X_train, X_test):
"""Debug weighted conformal detection issues."""
print("=== Weighted Conformal Debug Report ===")
# Check data properties
print(f"Training samples: {len(X_train)}")
print(f"Test samples: {len(X_test)}")
print(f"Feature dimensions: {X_train.shape[1]}")
# Fit detector
detector.fit(X_train)
# Check calibration set size
print(f"Calibration samples: {len(detector.calibration_set)}")
if len(detector.calibration_set) < 50:
print("WARNING: Small calibration set may lead to unreliable weights")
# Get predictions
p_values = detector.compute_p_values(X_test)
# Check p-value distribution
print(f"P-value range: [{p_values.min():.4f}, {p_values.max():.4f}]")
print(f"P-value mean: {p_values.mean():.4f}")
print(f"P-value std: {p_values.std():.4f}")
if p_values.std() < 0.01:
print("WARNING: Very low p-value variance - check for issues")
print("=== End Debug Report ===")
# Example usage
debug_weighted_conformal(weighted_detector, X_train, X_test_shifted)
References¶
-
Jin, Y., & Candès, E. J. (2023). Model-free Selective Inference Under Covariate Shift via Weighted Conformal p-values. Biometrika, 110(4), 1090-1106. arXiv:2307.09291. [Foundational paper on weighted conformal inference and WCS procedure]
-
Tibshirani, R. J., Barber, R. F., Candes, E., & Ramdas, A. (2019). Conformal Prediction Under Covariate Shift. Advances in Neural Information Processing Systems, 32. arXiv:1904.06019. [Early work on conformal prediction with covariate shift]
-
Genovese, C. R., Roeder, K., & Wasserman, L. (2006). False Discovery Control with p-value Weighting. Biometrika, 93(3), 509-524. [Theoretical foundation for weighted FDR control]
Next Steps¶
- Learn about FDR control for multiple testing scenarios
- Explore different conformalization strategies for various use cases
- Read about best practices for robust anomaly detection
- Check the troubleshooting guide for common issues
- See input validation for parameter constraints and edge cases