Troubleshooting Guide¶
Common issues and solutions for nonconform.
Common Issues and Solutions¶
1. ImportError: Cannot import symbols from nonconform¶
Problem: Getting import errors for detector or strategy classes.
Solution: Import public classes from the package root:
from nonconform import ConformalDetector, Split, CrossValidation, JackknifeBootstrap
from pyod.models.lof import LOF
detector = ConformalDetector(
detector=LOF(),
strategy=Split(n_calib=0.2),
aggregation="median",
seed=42
)
2. AttributeError: ConformalDetector has no method predict¶
Problem: Calling methods or parameters that are not part of the detector interface.
Solution: Use compute_p_values(...) for conformal p-values and score_samples(...) for raw detector scores:
p_values = detector.compute_p_values(X) # Conformal p-values
scores = detector.score_samples(X) # Raw anomaly scores
3. Memory Issues¶
Problem: Out of memory with large datasets or certain detectors.
Solutions: - Use batch processing for large datasets - Consider using more memory-efficient detectors (e.g., IsolationForest instead of KNN) - Reduce the calibration set size - Use sparse data structures when possible
import itertools
def process_in_batches(detector, X, batch_size=1000):
"""Process large datasets in batches."""
results = []
for batch in itertools.batched(X, batch_size):
batch_results = detector.compute_p_values(batch)
results.extend(batch_results)
return np.array(results)
4. Slow Performance¶
Problem: Slow processing, especially with large datasets.
Solutions:
- Use faster detectors (e.g., IsolationForest, LOF)
- Reduce the calibration set size
- Use batch processing
- Configure logging to hide progress bars: logging.getLogger('nonconform').setLevel(logging.WARNING)
- Profile your code to identify bottlenecks
import time
# Time your detector
start_time = time.time()
detector.fit(X_train)
fit_time = time.time() - start_time
start_time = time.time()
p_values = detector.compute_p_values(X_test)
predict_time = time.time() - start_time
print(f"Fit time: {fit_time:.2f}s, Predict time: {predict_time:.2f}s")
5. Invalid P-values¶
Problem: P-values are miscalibrated or extreme.
Solutions: - Ensure your calibration data is representative of the normal class - Check for data leakage between training and calibration sets - Verify that the detector is properly fitted - Consider using a different conformal strategy - Check for violations of the exchangeability assumption
def validate_p_values(p_values):
"""Validate p-value distribution."""
print(f"P-value range: [{p_values.min():.4f}, {p_values.max():.4f}]")
print(f"P-value mean: {p_values.mean():.4f}")
print(f"P-value std: {p_values.std():.4f}")
# Check for uniform distribution (expected under null hypothesis)
from scipy.stats import kstest
ks_stat, ks_p = kstest(p_values, 'uniform')
print(f"KS test for uniformity: stat={ks_stat:.4f}, p={ks_p:.4f}")
if ks_p < 0.05:
print("WARNING: P-values may not be well-calibrated")
6. High False Discovery Rate¶
Problem: Too many false positives even with FDR control.
Solutions: - Increase the calibration set size - Use a more conservative alpha level for FDR control - Consider using weighted conformal p-values if there's covariate shift - Try different detectors - Check for data quality issues
# Use more conservative FDR control
discoveries = detector.select(X_test, alpha=0.01)
# Monitor empirical FDR if ground truth is available
if y_true is not None:
false_positives = np.sum(discoveries & (y_true == 0))
empirical_fdr = false_positives / max(1, discoveries.sum())
print(f"Empirical FDR: {empirical_fdr:.3f}")
7. Low Detection Power¶
Problem: Missing too many anomalies.
Solutions: - Use less conservative alpha levels - Use more powerful detectors - Consider using ensemble methods - Try different conformal strategies (e.g., bootstrap, cross-validation) - Check if the anomalies are well-separated from normal data
# Try multiple strategies for comparison
from nonconform import CrossValidation, JackknifeBootstrap, Split
strategies = {
'Split': Split(n_calib=0.2),
'JaB+': JackknifeBootstrap(n_bootstraps=50),
'CV': CrossValidation(k=5)
}
for name, strategy in strategies.items():
detector = ConformalDetector(
detector=base_detector,
strategy=strategy,
aggregation="median",
seed=42
)
detector.fit(X_train)
detections = detector.select(X_test, alpha=0.05).sum()
print(f"{name}: {detections} discoveries")
8. Strategy Import Issues¶
Problem: Cannot import strategy classes.
Solution: Import all strategies from the package root:
Available Strategies
Split- Simple train/calibration splitCrossValidation- K-fold cross-validation (use high k for leave-one-out)JackknifeBootstrap- Jackknife+-after-Bootstrap (JaB+)
9. Invalid Strategy Parameters¶
Problem: Passing unsupported keyword arguments to strategy constructors.
Solution: Use the supported constructor parameters:
10. Integration Issues¶
Problem: Problems integrating with other libraries or custom detectors.
Solutions:
- Ensure your detector implements the AnomalyDetector protocol (fit, decision_function, get_params, set_params)
- Verify that the detector's output format matches expectations
- Use a valid aggregation string ("mean", "median", "minimum", "maximum")
- Use score_polarity to define score direction before conformalization.
- Valid score_polarity values are "higher_is_anomalous", "higher_is_normal", and "auto" (or omit it).
- If omitted, known sklearn normality detector families default to "higher_is_normal", while PyOD and custom detectors outside recognized families default to "higher_is_anomalous".
- Set score_polarity explicitly for custom detectors when you want deterministic behavior; use "auto" for strict family validation.
# Correct usage of aggregation strings
detector = ConformalDetector(
detector=custom_detector,
strategy=strategy,
aggregation="median",
score_polarity="higher_is_anomalous",
seed=42
)
Debugging Tips¶
1. Enable Verbose Mode¶
import logging
# Enable progress bars and detailed output
logging.getLogger('nonconform').setLevel(logging.INFO)
detector = ConformalDetector(
detector=base_detector,
strategy=strategy,
aggregation="median",
seed=42
)
# For debugging, use DEBUG level for maximum verbosity
logging.getLogger('nonconform').setLevel(logging.DEBUG)
2. Check Intermediate Results¶
# Get raw scores before p-value conversion
raw_scores = detector.score_samples(X_test)
p_values = detector.compute_p_values(X_test)
print(f"Raw scores range: [{raw_scores.min():.4f}, {raw_scores.max():.4f}]")
print(f"P-values range: [{p_values.min():.4f}, {p_values.max():.4f}]")
# Check calibration set
print(f"Calibration set size: {len(detector.calibration_set)}")
print(f"Calibration scores range: [{min(detector.calibration_set):.4f}, {max(detector.calibration_set):.4f}]")
3. Validate Data¶
def validate_input_data(X):
"""Validate input data for common issues."""
print("=== Data Validation ===")
print(f"Shape: {X.shape}")
print(f"Data type: {X.dtype}")
# Check for NaN values
nan_count = np.isnan(X).sum()
print(f"NaN values: {nan_count}")
# Check for infinite values
inf_count = np.isinf(X).sum()
print(f"Infinite values: {inf_count}")
# Check data ranges
print(f"Data range: [{X.min():.4f}, {X.max():.4f}]")
# Check for constant features
constant_features = np.sum(X.std(axis=0) == 0)
print(f"Constant features: {constant_features}")
if nan_count > 0 or inf_count > 0:
print("WARNING: Data contains NaN or infinite values")
if constant_features > 0:
print("WARNING: Data contains constant features")
# Example usage
validate_input_data(X_train)
validate_input_data(X_test)
4. Monitor Memory Usage¶
import psutil
import os
def print_memory_usage(label=""):
"""Print current memory usage."""
process = psutil.Process(os.getpid())
memory_mb = process.memory_info().rss / 1024 / 1024
print(f"Memory usage {label}: {memory_mb:.2f} MB")
# Monitor memory during processing
print_memory_usage("before fitting")
detector.fit(X_train)
print_memory_usage("after fitting")
p_values = detector.compute_p_values(X_test)
print_memory_usage("after prediction")
5. Debug Weighted Conformal Issues¶
def debug_weighted_conformal(detector, X_train, X_test):
"""Debug weighted conformal detection specifically."""
print("=== Weighted Conformal Debug ===")
# Check if it's actually a conformal detector
from nonconform import ConformalDetector
if not isinstance(detector, ConformalDetector):
print("WARNING: Not a ConformalDetector")
return
# Fit and check calibration samples
detector.fit(X_train)
if hasattr(detector, 'calibration_samples'):
print(f"Calibration samples stored: {len(detector.calibration_samples)}")
if len(detector.calibration_samples) == 0:
print("ERROR: No calibration samples stored")
# Check for distribution shift
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
X_combined = np.vstack([X_train, X_test])
y_combined = np.hstack([np.zeros(len(X_train)), np.ones(len(X_test))])
clf = LogisticRegression(random_state=42)
scores = cross_val_score(clf, X_combined, y_combined, cv=5)
shift_score = scores.mean()
print(f"Distribution shift score: {shift_score:.3f}")
if shift_score > 0.7:
print("Significant shift detected - weighted conformal recommended")
elif shift_score < 0.6:
print("Minimal shift - standard conformal may suffice")
Performance Optimization¶
1. Batch Processing¶
import itertools
def optimized_batch_processing(detector, X, batch_size=1000):
"""Optimized batch processing for large datasets."""
n_samples = len(X)
results = np.empty(n_samples)
start_idx = 0
for i, batch in enumerate(itertools.batched(X, batch_size)):
batch_results = detector.compute_p_values(batch)
end_idx = start_idx + len(batch)
results[start_idx:end_idx] = batch_results
start_idx = end_idx
if i % 10 == 0: # Progress update
print(f"Processed {i + 1} batches")
return results
2. Strategy-Specific Optimizations¶
from nonconform import CrossValidation, JackknifeBootstrap, Split
# For large datasets, use split strategy with smaller calibration
strategy = Split(n_calib=0.1) # Smaller calibration set
# For small datasets, use bootstrap for stability
strategy = JackknifeBootstrap(n_bootstraps=50)
# For medium datasets, use cross-validation
strategy = CrossValidation(k=5)
3. Detector Selection for Performance¶
# Fast detectors for large datasets
fast_detectors = [
IsolationForest(contamination=0.1, n_jobs=-1), # Parallel processing
LOF(contamination=0.1, n_jobs=-1),
OCSVM(contamination=0.1)
]
# Avoid expensive detectors for large datasets
# - KNN with large k
# - Complex neural networks
# - High-dimensional methods without dimensionality reduction
Getting Help¶
If you encounter other issues:
- Verify imports: Use package-root imports for detector and strategy classes
- Verify parameters: Ensure constructor argument names are valid
- Check the GitHub Issues for similar problems
- Search the Discussions for solutions
- Create a new issue with:
- A minimal reproducible example
- Expected vs actual behavior
- System information (Python version, nonconform version, etc.)
- Relevant error messages
Logging Configuration¶
nonconform uses Python's standard logging framework to control progress bars and informational output, plus the verbose flag on ConformalDetector for aggregation progress.
Basic Logging Setup¶
import logging
# Show progress bars and info messages (default for development)
logging.getLogger('nonconform').setLevel(logging.INFO)
# Hide progress bars (recommended for production)
logging.getLogger('nonconform').setLevel(logging.WARNING)
# Show everything including debug info
logging.getLogger('nonconform').setLevel(logging.DEBUG)
Common Logging Scenarios¶
Production Environment:
import logging
# Hide all progress bars and info messages
logging.getLogger('nonconform').setLevel(logging.WARNING)
detector = ConformalDetector(detector=IForest(), strategy=Split())
detector.fit(X_train) # No progress output
Development Environment:
import logging
# Show progress bars for monitoring
logging.basicConfig(level=logging.INFO)
detector = ConformalDetector(detector=IForest(), strategy=CrossValidation(k=5))
detector.fit(X_train) # Shows CV fold progress
Selective Logging:
import logging
# Show progress but hide specific module info
logging.getLogger('nonconform').setLevel(logging.INFO)
logging.getLogger('nonconform.resampling.bootstrap').setLevel(logging.WARNING)
# This will show aggregation progress but hide bootstrap configuration details
Debug Mode:
import logging
# Maximum verbosity for troubleshooting
logging.basicConfig(
level=logging.DEBUG,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
# This will show all internal operations and warnings
detector = ConformalDetector(detector=LOF(), strategy=JackknifeBootstrap(n_bootstraps=50))
detector.fit(X_train)
Logger Hierarchy¶
nonconform uses the following logger hierarchy:
- nonconform: Root logger for all nonconform output
- nonconform.resampling.*: Strategy-specific logging
- nonconform.weighting.*: Weight-estimation logging
- nonconform.fdr: Weighted FDR control logging
- nonconform.adapters: Detector adapter and score-polarity logging
- nonconform._internal.*: Internal utility logging
You can configure specific loggers for fine-grained control over output.