Skip to content

Troubleshooting Guide

Common issues and solutions for nonconform.

Common Issues and Solutions

1. ImportError: Cannot import symbols from nonconform

Problem: Getting import errors for detector or strategy classes.

Solution: Import public classes from the package root:

from nonconform import ConformalDetector, Split, CrossValidation, JackknifeBootstrap

from pyod.models.lof import LOF

detector = ConformalDetector(
    detector=LOF(),
    strategy=Split(n_calib=0.2),
    aggregation="median",
    seed=42
)

2. AttributeError: ConformalDetector has no method predict

Problem: Calling methods or parameters that are not part of the detector interface.

Solution: Use compute_p_values(...) for conformal p-values and score_samples(...) for raw detector scores:

p_values = detector.compute_p_values(X)  # Conformal p-values
scores = detector.score_samples(X)       # Raw anomaly scores

3. Memory Issues

Problem: Out of memory with large datasets or certain detectors.

Solutions: - Use batch processing for large datasets - Consider using more memory-efficient detectors (e.g., IsolationForest instead of KNN) - Reduce the calibration set size - Use sparse data structures when possible

import itertools

def process_in_batches(detector, X, batch_size=1000):
    """Process large datasets in batches."""
    results = []
    for batch in itertools.batched(X, batch_size):
        batch_results = detector.compute_p_values(batch)
        results.extend(batch_results)
    return np.array(results)

4. Slow Performance

Problem: Slow processing, especially with large datasets.

Solutions: - Use faster detectors (e.g., IsolationForest, LOF) - Reduce the calibration set size - Use batch processing - Configure logging to hide progress bars: logging.getLogger('nonconform').setLevel(logging.WARNING) - Profile your code to identify bottlenecks

import time

# Time your detector
start_time = time.time()
detector.fit(X_train)
fit_time = time.time() - start_time

start_time = time.time()
p_values = detector.compute_p_values(X_test)
predict_time = time.time() - start_time

print(f"Fit time: {fit_time:.2f}s, Predict time: {predict_time:.2f}s")

5. Invalid P-values

Problem: P-values are miscalibrated or extreme.

Solutions: - Ensure your calibration data is representative of the normal class - Check for data leakage between training and calibration sets - Verify that the detector is properly fitted - Consider using a different conformal strategy - Check for violations of the exchangeability assumption

def validate_p_values(p_values):
    """Validate p-value distribution."""
    print(f"P-value range: [{p_values.min():.4f}, {p_values.max():.4f}]")
    print(f"P-value mean: {p_values.mean():.4f}")
    print(f"P-value std: {p_values.std():.4f}")

    # Check for uniform distribution (expected under null hypothesis)
    from scipy.stats import kstest
    ks_stat, ks_p = kstest(p_values, 'uniform')
    print(f"KS test for uniformity: stat={ks_stat:.4f}, p={ks_p:.4f}")

    if ks_p < 0.05:
        print("WARNING: P-values may not be well-calibrated")

6. High False Discovery Rate

Problem: Too many false positives even with FDR control.

Solutions: - Increase the calibration set size - Use a more conservative alpha level for FDR control - Consider using weighted conformal p-values if there's covariate shift - Try different detectors - Check for data quality issues

# Use more conservative FDR control
discoveries = detector.select(X_test, alpha=0.01)

# Monitor empirical FDR if ground truth is available
if y_true is not None:
    false_positives = np.sum(discoveries & (y_true == 0))
    empirical_fdr = false_positives / max(1, discoveries.sum())
    print(f"Empirical FDR: {empirical_fdr:.3f}")

7. Low Detection Power

Problem: Missing too many anomalies.

Solutions: - Use less conservative alpha levels - Use more powerful detectors - Consider using ensemble methods - Try different conformal strategies (e.g., bootstrap, cross-validation) - Check if the anomalies are well-separated from normal data

# Try multiple strategies for comparison
from nonconform import CrossValidation, JackknifeBootstrap, Split

strategies = {
    'Split': Split(n_calib=0.2),
    'JaB+': JackknifeBootstrap(n_bootstraps=50),
    'CV': CrossValidation(k=5)
}

for name, strategy in strategies.items():
    detector = ConformalDetector(
        detector=base_detector,
        strategy=strategy,
        aggregation="median",
        seed=42
    )
    detector.fit(X_train)
    detections = detector.select(X_test, alpha=0.05).sum()
    print(f"{name}: {detections} discoveries")

8. Strategy Import Issues

Problem: Cannot import strategy classes.

Solution: Import all strategies from the package root:

from nonconform import Split, CrossValidation, JackknifeBootstrap

Available Strategies

  • Split - Simple train/calibration split
  • CrossValidation - K-fold cross-validation (use high k for leave-one-out)
  • JackknifeBootstrap - Jackknife+-after-Bootstrap (JaB+)

9. Invalid Strategy Parameters

Problem: Passing unsupported keyword arguments to strategy constructors.

Solution: Use the supported constructor parameters:

Split(n_calib=0.2)
CrossValidation(k=5)
JackknifeBootstrap(n_bootstraps=50)

10. Integration Issues

Problem: Problems integrating with other libraries or custom detectors.

Solutions: - Ensure your detector implements the AnomalyDetector protocol (fit, decision_function, get_params, set_params) - Verify that the detector's output format matches expectations - Use a valid aggregation string ("mean", "median", "minimum", "maximum") - Use score_polarity to define score direction before conformalization. - Valid score_polarity values are "higher_is_anomalous", "higher_is_normal", and "auto" (or omit it). - If omitted, known sklearn normality detector families default to "higher_is_normal", while PyOD and custom detectors outside recognized families default to "higher_is_anomalous". - Set score_polarity explicitly for custom detectors when you want deterministic behavior; use "auto" for strict family validation.

# Correct usage of aggregation strings
detector = ConformalDetector(
    detector=custom_detector,
    strategy=strategy,
    aggregation="median",
    score_polarity="higher_is_anomalous",
    seed=42
)

Debugging Tips

1. Enable Verbose Mode

import logging

# Enable progress bars and detailed output
logging.getLogger('nonconform').setLevel(logging.INFO)

detector = ConformalDetector(
    detector=base_detector,
    strategy=strategy,
    aggregation="median",
    seed=42
)

# For debugging, use DEBUG level for maximum verbosity
logging.getLogger('nonconform').setLevel(logging.DEBUG)

2. Check Intermediate Results

# Get raw scores before p-value conversion
raw_scores = detector.score_samples(X_test)
p_values = detector.compute_p_values(X_test)

print(f"Raw scores range: [{raw_scores.min():.4f}, {raw_scores.max():.4f}]")
print(f"P-values range: [{p_values.min():.4f}, {p_values.max():.4f}]")

# Check calibration set
print(f"Calibration set size: {len(detector.calibration_set)}")
print(f"Calibration scores range: [{min(detector.calibration_set):.4f}, {max(detector.calibration_set):.4f}]")

3. Validate Data

def validate_input_data(X):
    """Validate input data for common issues."""
    print("=== Data Validation ===")
    print(f"Shape: {X.shape}")
    print(f"Data type: {X.dtype}")

    # Check for NaN values
    nan_count = np.isnan(X).sum()
    print(f"NaN values: {nan_count}")

    # Check for infinite values
    inf_count = np.isinf(X).sum()
    print(f"Infinite values: {inf_count}")

    # Check data ranges
    print(f"Data range: [{X.min():.4f}, {X.max():.4f}]")

    # Check for constant features
    constant_features = np.sum(X.std(axis=0) == 0)
    print(f"Constant features: {constant_features}")

    if nan_count > 0 or inf_count > 0:
        print("WARNING: Data contains NaN or infinite values")

    if constant_features > 0:
        print("WARNING: Data contains constant features")

# Example usage
validate_input_data(X_train)
validate_input_data(X_test)

4. Monitor Memory Usage

import psutil
import os

def print_memory_usage(label=""):
    """Print current memory usage."""
    process = psutil.Process(os.getpid())
    memory_mb = process.memory_info().rss / 1024 / 1024
    print(f"Memory usage {label}: {memory_mb:.2f} MB")

# Monitor memory during processing
print_memory_usage("before fitting")
detector.fit(X_train)
print_memory_usage("after fitting")
p_values = detector.compute_p_values(X_test)
print_memory_usage("after prediction")

5. Debug Weighted Conformal Issues

def debug_weighted_conformal(detector, X_train, X_test):
    """Debug weighted conformal detection specifically."""
    print("=== Weighted Conformal Debug ===")

    # Check if it's actually a conformal detector
    from nonconform import ConformalDetector
    if not isinstance(detector, ConformalDetector):
        print("WARNING: Not a ConformalDetector")
        return

    # Fit and check calibration samples
    detector.fit(X_train)

    if hasattr(detector, 'calibration_samples'):
        print(f"Calibration samples stored: {len(detector.calibration_samples)}")
        if len(detector.calibration_samples) == 0:
            print("ERROR: No calibration samples stored")

    # Check for distribution shift
    from sklearn.linear_model import LogisticRegression
    from sklearn.model_selection import cross_val_score

    X_combined = np.vstack([X_train, X_test])
    y_combined = np.hstack([np.zeros(len(X_train)), np.ones(len(X_test))])

    clf = LogisticRegression(random_state=42)
    scores = cross_val_score(clf, X_combined, y_combined, cv=5)
    shift_score = scores.mean()

    print(f"Distribution shift score: {shift_score:.3f}")
    if shift_score > 0.7:
        print("Significant shift detected - weighted conformal recommended")
    elif shift_score < 0.6:
        print("Minimal shift - standard conformal may suffice")

Performance Optimization

1. Batch Processing

import itertools

def optimized_batch_processing(detector, X, batch_size=1000):
    """Optimized batch processing for large datasets."""
    n_samples = len(X)
    results = np.empty(n_samples)

    start_idx = 0
    for i, batch in enumerate(itertools.batched(X, batch_size)):
        batch_results = detector.compute_p_values(batch)
        end_idx = start_idx + len(batch)
        results[start_idx:end_idx] = batch_results
        start_idx = end_idx

        if i % 10 == 0:  # Progress update
            print(f"Processed {i + 1} batches")

    return results

2. Strategy-Specific Optimizations

from nonconform import CrossValidation, JackknifeBootstrap, Split

# For large datasets, use split strategy with smaller calibration
strategy = Split(n_calib=0.1)  # Smaller calibration set

# For small datasets, use bootstrap for stability
strategy = JackknifeBootstrap(n_bootstraps=50)

# For medium datasets, use cross-validation
strategy = CrossValidation(k=5)

3. Detector Selection for Performance

# Fast detectors for large datasets
fast_detectors = [
    IsolationForest(contamination=0.1, n_jobs=-1),  # Parallel processing
    LOF(contamination=0.1, n_jobs=-1),
    OCSVM(contamination=0.1)
]

# Avoid expensive detectors for large datasets
# - KNN with large k
# - Complex neural networks
# - High-dimensional methods without dimensionality reduction

Getting Help

If you encounter other issues:

  1. Verify imports: Use package-root imports for detector and strategy classes
  2. Verify parameters: Ensure constructor argument names are valid
  3. Check the GitHub Issues for similar problems
  4. Search the Discussions for solutions
  5. Create a new issue with:
  6. A minimal reproducible example
  7. Expected vs actual behavior
  8. System information (Python version, nonconform version, etc.)
  9. Relevant error messages

Logging Configuration

nonconform uses Python's standard logging framework to control progress bars and informational output, plus the verbose flag on ConformalDetector for aggregation progress.

Basic Logging Setup

import logging

# Show progress bars and info messages (default for development)
logging.getLogger('nonconform').setLevel(logging.INFO)

# Hide progress bars (recommended for production)
logging.getLogger('nonconform').setLevel(logging.WARNING)

# Show everything including debug info
logging.getLogger('nonconform').setLevel(logging.DEBUG)

Common Logging Scenarios

Production Environment:

import logging

# Hide all progress bars and info messages
logging.getLogger('nonconform').setLevel(logging.WARNING)

detector = ConformalDetector(detector=IForest(), strategy=Split())
detector.fit(X_train)  # No progress output

Development Environment:

import logging

# Show progress bars for monitoring
logging.basicConfig(level=logging.INFO)

detector = ConformalDetector(detector=IForest(), strategy=CrossValidation(k=5))
detector.fit(X_train)  # Shows CV fold progress

Selective Logging:

import logging

# Show progress but hide specific module info
logging.getLogger('nonconform').setLevel(logging.INFO)
logging.getLogger('nonconform.resampling.bootstrap').setLevel(logging.WARNING)

# This will show aggregation progress but hide bootstrap configuration details

Debug Mode:

import logging

# Maximum verbosity for troubleshooting
logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)

# This will show all internal operations and warnings
detector = ConformalDetector(detector=LOF(), strategy=JackknifeBootstrap(n_bootstraps=50))
detector.fit(X_train)

Logger Hierarchy

nonconform uses the following logger hierarchy: - nonconform: Root logger for all nonconform output - nonconform.resampling.*: Strategy-specific logging - nonconform.weighting.*: Weight-estimation logging - nonconform.fdr: Weighted FDR control logging - nonconform.adapters: Detector adapter and score-polarity logging - nonconform._internal.*: Internal utility logging

You can configure specific loggers for fine-grained control over output.