Weighted Conformal P-values¶

This guide explains how to use weighted conformal p-values in nonconform for handling distribution shift and covariate shift scenarios.

Overview¶

Weighted conformal p-values extend classical conformal prediction to handle covariate shift scenarios. Key assumption: The method assumes that only the marginal distribution P(X) changes between calibration and test data, while the conditional distribution P(Y|X) - the relationship between features and anomaly status - remains constant. This assumption is crucial for the validity of weighted conformal inference.

The ConformalDetector with a weight_estimator parameter automatically estimates importance weights using logistic regression to distinguish between calibration and test samples, then uses these weights to compute adjusted p-values.

Basic Usage¶

import numpy as np
from nonconform.estimation import ConformalDetector
from nonconform.estimation.weight import LogisticWeightEstimator
from nonconform.strategy.split import Split
from nonconform.utils.func.enums import Aggregation
from pyod.models.lof import LOF

# Initialize base detector
base_detector = LOF()
strategy = Split(n_calib=0.2)

# Create weighted conformal detector
detector = ConformalDetector(
    detector=base_detector,
    strategy=strategy,
    aggregation=Aggregation.MEDIAN,
    weight_estimator=LogisticWeightEstimator(seed=42),
    seed=42
)

# Fit on training data
detector.fit(X_train)

# Get weighted p-values for test data
# The detector automatically computes importance weights
p_values = detector.predict(X_test, raw=False)

How It Works¶

The weighted conformal method works through the following steps:

1. Calibration¶

During fitting, the detector: - Uses the specified strategy to split data and train models - Computes calibration scores on held-out calibration data - Stores calibration samples for later weight computation

2. Weight Estimation¶

During prediction, the detector: - Trains a logistic regression model to distinguish calibration from test samples - Uses the predicted probabilities to estimate importance weights - Applies weights to both calibration and test instances

3. Weighted P-value Calculation¶

The p-values are computed using weighted empirical distribution functions:

# Simplified version of the weighted p-value calculation
def weighted_p_value(test_score, calibration_scores, calibration_weights, test_weight):
    """
    Calculate weighted conformal p-value with proper tie handling.

    The p-value represents the probability of observing a score
    at least as extreme as the test score under the weighted
    calibration distribution.
    """
    # Count calibration scores strictly greater than test score
    weighted_rank = np.sum(calibration_weights[calibration_scores > test_score])

    # Handle ties: add random fraction of tied weights (coin flip approach)
    tied_weights = np.sum(calibration_weights[calibration_scores == test_score])
    weighted_rank += np.random.uniform(0, 1) * tied_weights

    # Add test instance weight (always included for conformal guarantee)
    weighted_rank += test_weight
    total_weight = np.sum(calibration_weights) + test_weight

    return weighted_rank / total_weight

When to Use Weighted Conformal¶

Covariate Shift Scenarios¶

Use weighted conformal detection when:

Domain Adaptation: Training on one domain, testing on another
Temporal Shift: Data distribution changes over time
Sample Selection Bias: Test data is not representative of training data
Stratified Sampling: Different sampling rates for different subgroups

Examples of Distribution Shift¶

# Example 1: Temporal shift
# Training data from 2020, test data from 2024
detector.fit(X_train_2020)
p_values_2024 = detector.predict(X_test_2024, raw=False)

# Example 2: Geographic shift
# Training on US data, testing on European data
detector.fit(X_us)
p_values_europe = detector.predict(X_europe, raw=False)

# Example 3: Sensor drift
# Calibration data before sensor drift, test data after
detector.fit(X_before_drift)
p_values_after_drift = detector.predict(X_after_drift, raw=False)

Comparison with Standard Conformal¶

# Standard conformal detector
standard_detector = ConformalDetector(
    detector=base_detector,
    strategy=strategy,
    aggregation=Aggregation.MEDIAN,
    seed=42
)

# Weighted conformal detector
weighted_detector = ConformalDetector(
    detector=base_detector,
    strategy=strategy,
    aggregation=Aggregation.MEDIAN,
    weight_estimator=LogisticWeightEstimator(seed=42),
    seed=42
)

# Fit both on training data
standard_detector.fit(X_train)
weighted_detector.fit(X_train)

# Compare on shifted test data
standard_p_values = standard_detector.predict(X_test_shifted, raw=False)
weighted_p_values = weighted_detector.predict(X_test_shifted, raw=False)

# Apply FDR control for proper comparison
from scipy.stats import false_discovery_control

standard_fdr = false_discovery_control(standard_p_values, method='bh')
weighted_fdr = false_discovery_control(weighted_p_values, method='bh')

print(f"Standard conformal detections: {(standard_fdr < 0.05).sum()}")
print(f"Weighted conformal detections: {(weighted_fdr < 0.05).sum()}")

Different Aggregation Strategies¶

The choice of aggregation method can affect performance under distribution shift:

# Compare different aggregation methods
aggregation_methods = [Aggregation.MEAN, Aggregation.MEDIAN, Aggregation.MAX]

for agg_method in aggregation_methods:
    detector = ConformalDetector(
        detector=base_detector,
        strategy=strategy,
        aggregation=agg_method,
        weight_estimator=LogisticWeightEstimator(seed=42),
        seed=42
    )
    detector.fit(X_train)
    p_vals = detector.predict(X_test_shifted, raw=False)

    print(f"{agg_method.value}: {(p_vals < 0.05).sum()} detections")

Strategy Selection¶

Different strategies can be used with weighted conformal detection:

from nonconform.strategy.bootstrap import Bootstrap
from nonconform.strategy.cross_val import CrossValidation

# Bootstrap strategy for stability
bootstrap_strategy = Bootstrap(n_bootstraps=100, resampling_ratio=0.8)
bootstrap_detector = ConformalDetector(
    detector=base_detector,
    strategy=bootstrap_strategy,
    aggregation=Aggregation.MEDIAN,
    weight_estimator=LogisticWeightEstimator(seed=42),
    seed=42
)

# Cross-validation strategy for efficiency
cv_strategy = CrossValidation(k=5)
cv_detector = ConformalDetector(
    detector=base_detector,
    strategy=cv_strategy,
    aggregation=Aggregation.MEDIAN,
    weight_estimator=LogisticWeightEstimator(seed=42),
    seed=42
)

Performance Considerations¶

Computational Cost¶

Weighted conformal detection has additional overhead: - Weight estimation via logistic regression - Weighted p-value computation

import time

# Compare computation times
def time_detector(detector, X_train, X_test):
    start_time = time.time()
    detector.fit(X_train)
    fit_time = time.time() - start_time

    start_time = time.time()
    p_values = detector.predict(X_test, raw=False)
    predict_time = time.time() - start_time

    return fit_time, predict_time

# Standard vs Weighted timing
standard_fit, standard_pred = time_detector(standard_detector, X_train, X_test)
weighted_fit, weighted_pred = time_detector(weighted_detector, X_train, X_test)

print(f"Standard: Fit={standard_fit:.2f}s, Predict={standard_pred:.2f}s")
print(f"Weighted: Fit={weighted_fit:.2f}s, Predict={weighted_pred:.2f}s")
print(f"Overhead: {((weighted_fit + weighted_pred) / (standard_fit + standard_pred) - 1) * 100:.1f}%")

Memory Usage¶

Weighted conformal detection requires storing: - Calibration samples for weight computation - Calibration scores for p-value calculation

For large datasets, consider: - Using a subset of calibration samples for weight estimation - Implementing online/streaming versions

Best Practices¶

1. Validate Distribution Shift¶

Always check if distribution shift is actually present:

# Use statistical tests to detect shift
from scipy.stats import ks_2samp

def detect_feature_shift(X_train, X_test):
    """Detect distribution shift in individual features."""
    shift_detected = []
    p_values = []

    for i in range(X_train.shape[1]):
        statistic, p_value = ks_2samp(X_train[:, i], X_test[:, i])
        shift_detected.append(p_value < 0.05)
        p_values.append(p_value)

    print(f"Features with significant shift: {sum(shift_detected)}/{len(shift_detected)}")
    return shift_detected, p_values

shift_features, shift_p_values = detect_feature_shift(X_train, X_test_shifted)

2. Combine with FDR Control¶

from scipy.stats import false_discovery_control

# Apply FDR control to weighted p-values
adjusted_p_values = false_discovery_control(weighted_p_values, method='bh', alpha=0.05)
discoveries = adjusted_p_values < 0.05

print(f"Raw detections: {(weighted_p_values < 0.05).sum()}")
print(f"FDR-controlled discoveries: {discoveries.sum()}")

3. Monitor Weight Quality¶

Extreme weights can indicate poor weight estimation:

def check_weight_quality(detector, X_calib, X_test):
    """Check for extreme weights that might indicate poor estimation."""
    # This is a conceptual example - actual implementation would require
    # access to the internal weights computed by the detector

    # Rule of thumb: weights should typically be between 0.1 and 10
    # Extreme weights (< 0.01 or > 100) suggest problems
    pass

4. Use Appropriate Base Detectors¶

Some detectors work better with weighted conformal: - Good: Distance-based methods (LOF, KNN) that are sensitive to distribution - Moderate: Tree-based methods (Isolation Forest) that are somewhat robust - Challenging: Neural networks that might already adapt to shift

Advanced Applications¶

Multi-domain Adaptation¶

# Handle multiple domains with different shift patterns
domains = ['domain_A', 'domain_B', 'domain_C']
domain_detectors = {}

for domain in domains:
    detector = ConformalDetector(
        detector=base_detector,
        strategy=strategy,
        aggregation=Aggregation.MEDIAN,
        weight_estimator=LogisticWeightEstimator(seed=42),
        seed=42
    )
    detector.fit(X_train)  # Common training set
    domain_detectors[domain] = detector

# Predict on domain-specific test sets
for domain in domains:
    X_test_domain = load_domain_data(domain)  # Load domain-specific test data
    p_values = domain_detectors[domain].predict(X_test_domain, raw=False)
    print(f"{domain}: {(p_values < 0.05).sum()} detections")

Online Adaptation¶

# Adapt to gradual distribution shift over time
def online_weighted_detection(detector, data_stream, window_size=1000):
    """Online weighted conformal detection with sliding window."""
    detections = []

    for i, (X_batch, _) in enumerate(data_stream):
        if i == 0:
            # Initialize with first batch
            detector.fit(X_batch)
        else:
            # Use sliding window for calibration
            if i * len(X_batch) > window_size:
                start_idx = (i * len(X_batch)) - window_size
                X_calib = get_recent_data(start_idx, window_size)
                detector.fit(X_calib)

            # Predict on current batch
            p_values = detector.predict(X_batch, raw=False)
            batch_detections = (p_values < 0.05).sum()
            detections.append(batch_detections)

    return detections

Troubleshooting¶

Common Issues¶

Poor Weight Estimation
Insufficient calibration data
High-dimensional data with small samples
Solution: Increase calibration size or use dimensionality reduction
Extreme P-values
All p-values near 0 or 1
Solution: Check for severe distribution shift or model mismatch
Inconsistent Results
High variance in detection counts
Solution: Use bootstrap strategy or increase sample size

Debugging Tools¶

def debug_weighted_conformal(detector, X_train, X_test):
    """Debug weighted conformal detection issues."""
    print("=== Weighted Conformal Debug Report ===")

    # Check data properties
    print(f"Training samples: {len(X_train)}")
    print(f"Test samples: {len(X_test)}")
    print(f"Feature dimensions: {X_train.shape[1]}")

    # Fit detector
    detector.fit(X_train)

    # Check calibration set size
    print(f"Calibration samples: {len(detector.calibration_set)}")

    if len(detector.calibration_set) < 50:
        print("WARNING: Small calibration set may lead to unreliable weights")

    # Get predictions
    p_values = detector.predict(X_test, raw=False)

    # Check p-value distribution
    print(f"P-value range: [{p_values.min():.4f}, {p_values.max():.4f}]")
    print(f"P-value mean: {p_values.mean():.4f}")
    print(f"P-value std: {p_values.std():.4f}")

    if p_values.std() < 0.01:
        print("WARNING: Very low p-value variance - check for issues")

    print("=== End Debug Report ===")

# Example usage
debug_weighted_conformal(weighted_detector, X_train, X_test_shifted)

Next Steps¶

Learn about FDR control for multiple testing scenarios
Explore different conformalization strategies for various use cases
Read about best practices for robust anomaly detection
Check the troubleshooting guide for common issues