Troubleshooting Guide¶
This guide addresses common issues you might encounter while using nonconform and provides solutions.
Common Issues and Solutions¶
1. ImportError: Cannot import DetectorConfig¶
Problem: Getting import errors when trying to use DetectorConfig.
Solution: DetectorConfig has been removed. Use direct parameters instead:
# Old API (deprecated)
from nonconform.estimation.configuration import DetectorConfig
detector = ConformalDetector(
detector=LOF(),
strategy=Split(calib_size=0.2),
config=DetectorConfig(alpha=0.1)
)
# New API
from nonconform.estimation.standard import ConformalDetector
from nonconform.strategy.split import SplitStrategy
from nonconform.utils.func.enums import Aggregation
detector = ConformalDetector(
detector=LOF(),
strategy=Split(calib_size=0.2),
aggregation=Aggregation.MEDIAN,
seed=42
)
2. AttributeError: predict() has no parameter 'output'¶
Problem: Using the old output parameter in predict() method.
Solution: Replace output
with raw
parameter:
# Old API (deprecated)
p_values = detector.predict(X, output="p-value")
scores = detector.predict(X, output="score")
# New API
p_values = detector.predict(X, raw=False) # Get p-values
scores = detector.predict(X, raw=True) # Get raw scores
3. Memory Issues¶
Problem: Running out of memory when using large datasets or certain detectors.
Solutions: - Use batch processing for large datasets - Consider using more memory-efficient detectors (e.g., IsolationForest instead of KNN) - Reduce the calibration set size - Use sparse data structures when possible
import itertools
def process_in_batches(detector, X, batch_size=1000):
"""Process large datasets in batches."""
results = []
for batch in itertools.batched(X, batch_size):
batch_results = detector.predict(batch, raw=False)
results.extend(batch_results)
return np.array(results)
4. Slow Performance¶
Problem: Processing takes too long, especially with large datasets.
Solutions:
- Use faster detectors (e.g., IsolationForest, LOF)
- Reduce the calibration set size
- Use batch processing
- Configure logging to hide progress bars: logging.getLogger('nonconform').setLevel(logging.WARNING)
- Profile your code to identify bottlenecks
import time
# Time your detector
start_time = time.time()
detector.fit(X_train)
fit_time = time.time() - start_time
start_time = time.time()
p_values = detector.predict(X_test, raw=False)
predict_time = time.time() - start_time
print(f"Fit time: {fit_time:.2f}s, Predict time: {predict_time:.2f}s")
5. Invalid P-values¶
Problem: P-values don't seem to be properly calibrated or all values are extreme.
Solutions: - Ensure your calibration data is representative of the normal class - Check for data leakage between training and calibration sets - Verify that the detector is properly fitted - Consider using a different conformal strategy - Check for violations of the exchangeability assumption
def validate_p_values(p_values):
"""Validate p-value distribution."""
print(f"P-value range: [{p_values.min():.4f}, {p_values.max():.4f}]")
print(f"P-value mean: {p_values.mean():.4f}")
print(f"P-value std: {p_values.std():.4f}")
# Check for uniform distribution (expected under null hypothesis)
from scipy.stats import kstest
ks_stat, ks_p = kstest(p_values, 'uniform')
print(f"KS test for uniformity: stat={ks_stat:.4f}, p={ks_p:.4f}")
if ks_p < 0.05:
print("WARNING: P-values may not be well-calibrated")
6. High False Discovery Rate¶
Problem: Too many false positives even with FDR control.
Solutions: - Increase the calibration set size - Use a more conservative α level for FDR control - Consider using weighted conformal p-values if there's covariate shift - Try different detectors - Check for data quality issues
from scipy.stats import false_discovery_control
# Use more conservative FDR control
adjusted_p_values = false_discovery_control(p_values, method='by', alpha=0.01) # More conservative
discoveries = adjusted_p_values < 0.01
# Monitor empirical FDR if ground truth is available
if y_true is not None:
false_positives = np.sum(discoveries & (y_true == 0))
empirical_fdr = false_positives / max(1, discoveries.sum())
print(f"Empirical FDR: {empirical_fdr:.3f}")
7. Low Detection Power¶
Problem: Missing too many anomalies.
Solutions: - Use less conservative α levels - Use more powerful detectors - Consider using ensemble methods - Try different conformal strategies (e.g., bootstrap, cross-validation) - Check if the anomalies are well-separated from normal data
# Try multiple strategies for comparison
from nonconform.strategy.bootstrap import Bootstrap
from nonconform.strategy.cross_val import CrossValidation
strategies = {
'Split': Split(calib_size=0.2),
'Bootstrap': Bootstrap(n_bootstraps=100, resampling_ratio=0.8),
'CV': CrossValidation(k=5)
}
for name, strategy in strategies.items():
detector = ConformalDetector(
detector=base_detector,
strategy=strategy,
aggregation=Aggregation.MEDIAN,
seed=42
)
detector.fit(X_train)
p_vals = detector.predict(X_test, raw=False)
detections = (p_vals < 0.05).sum()
print(f"{name}: {detections} detections")
8. Strategy Import Issues¶
Problem: Cannot import strategy classes with old import paths.
Solution: Update import statements to use new module structure:
# Old imports (deprecated)
# Removed - use individual imports
# New imports
from nonconform.strategy.split import Split
from nonconform.strategy.cross_val import CrossValidation
from nonconform.strategy.jackknife import Jackknife
from nonconform.strategy.bootstrap import Bootstrap
9. Parameter Name Changes¶
Problem: Using old parameter names that have been renamed.
Solution: Update parameter names:
# Old parameter names
Split(calibration_size=0.2) # -> calib_size=0.2
CrossValidation(n_splits=5) # -> k=5
Bootstrap(sample_ratio=0.8) # -> resampling_ratio=0.8
10. Integration Issues¶
Problem: Problems integrating with other libraries or custom detectors.
Solutions: - Ensure your detector implements the PyOD BaseDetector interface - Check for version compatibility - Verify that the detector's output format matches expectations - Use the correct aggregation enum values
from nonconform.utils.func.enums import Aggregation
# Correct usage of aggregation enums
detector = ConformalDetector(
detector=custom_detector,
strategy=strategy,
aggregation=Aggregation.MEDIAN, # Not "median"
seed=42
)
Debugging Tips¶
1. Enable Verbose Mode¶
import logging
# Enable progress bars and detailed output
logging.getLogger('nonconform').setLevel(logging.INFO)
detector = ConformalDetector(
detector=base_detector,
strategy=strategy,
aggregation=Aggregation.MEDIAN,
seed=42
)
# For debugging, use DEBUG level for maximum verbosity
logging.getLogger('nonconform').setLevel(logging.DEBUG)
2. Check Intermediate Results¶
# Get raw scores before p-value conversion
raw_scores = detector.predict(X_test, raw=True)
p_values = detector.predict(X_test, raw=False)
print(f"Raw scores range: [{raw_scores.min():.4f}, {raw_scores.max():.4f}]")
print(f"P-values range: [{p_values.min():.4f}, {p_values.max():.4f}]")
# Check calibration set
print(f"Calibration set size: {len(detector.calibration_set)}")
print(f"Calibration scores range: [{min(detector.calibration_set):.4f}, {max(detector.calibration_set):.4f}]")
3. Validate Data¶
def validate_input_data(X):
"""Validate input data for common issues."""
print("=== Data Validation ===")
print(f"Shape: {X.shape}")
print(f"Data type: {X.dtype}")
# Check for NaN values
nan_count = np.isnan(X).sum()
print(f"NaN values: {nan_count}")
# Check for infinite values
inf_count = np.isinf(X).sum()
print(f"Infinite values: {inf_count}")
# Check data ranges
print(f"Data range: [{X.min():.4f}, {X.max():.4f}]")
# Check for constant features
constant_features = np.sum(X.std(axis=0) == 0)
print(f"Constant features: {constant_features}")
if nan_count > 0 or inf_count > 0:
print("WARNING: Data contains NaN or infinite values")
if constant_features > 0:
print("WARNING: Data contains constant features")
# Example usage
validate_input_data(X_train)
validate_input_data(X_test)
4. Monitor Memory Usage¶
import psutil
import os
def print_memory_usage(label=""):
"""Print current memory usage."""
process = psutil.Process(os.getpid())
memory_mb = process.memory_info().rss / 1024 / 1024
print(f"Memory usage {label}: {memory_mb:.2f} MB")
# Monitor memory during processing
print_memory_usage("before fitting")
detector.fit(X_train)
print_memory_usage("after fitting")
p_values = detector.predict(X_test, raw=False)
print_memory_usage("after prediction")
5. Debug Weighted Conformal Issues¶
def debug_weighted_conformal(detector, X_train, X_test):
"""Debug weighted conformal detection specifically."""
print("=== Weighted Conformal Debug ===")
# Check if it's actually a weighted detector
from nonconform.estimation.weighted import ConformalDetector
if not isinstance(detector, ConformalDetector):
print("WARNING: Not a ConformalDetector")
return
# Fit and check calibration samples
detector.fit(X_train)
if hasattr(detector, 'calibration_samples'):
print(f"Calibration samples stored: {len(detector.calibration_samples)}")
if len(detector.calibration_samples) == 0:
print("ERROR: No calibration samples stored")
# Check for distribution shift
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
X_combined = np.vstack([X_train, X_test])
y_combined = np.hstack([np.zeros(len(X_train)), np.ones(len(X_test))])
clf = LogisticRegression(random_state=42)
scores = cross_val_score(clf, X_combined, y_combined, cv=5)
shift_score = scores.mean()
print(f"Distribution shift score: {shift_score:.3f}")
if shift_score > 0.7:
print("Significant shift detected - weighted conformal recommended")
elif shift_score < 0.6:
print("Minimal shift - standard conformal may suffice")
Performance Optimization¶
1. Batch Processing¶
import itertools
def optimized_batch_processing(detector, X, batch_size=1000):
"""Optimized batch processing for large datasets."""
n_samples = len(X)
results = np.empty(n_samples)
start_idx = 0
for i, batch in enumerate(itertools.batched(X, batch_size)):
batch_results = detector.predict(batch, raw=False)
end_idx = start_idx + len(batch)
results[start_idx:end_idx] = batch_results
start_idx = end_idx
if i % 10 == 0: # Progress update
print(f"Processed {i + 1} batches")
return results
2. Strategy-Specific Optimizations¶
# For large datasets, use split strategy with smaller calibration
strategy = SplitStrategy(calibration_size=0.1) # Smaller calibration set
# For small datasets, use bootstrap for stability
strategy = BootstrapStrategy(n_bootstraps=50, sample_ratio=0.8)
# For medium datasets, use cross-validation
strategy = CrossValidationStrategy(n_splits=5)
3. Detector Selection for Performance¶
# Fast detectors for large datasets
fast_detectors = [
IsolationForest(contamination=0.1, n_jobs=-1), # Parallel processing
LOF(contamination=0.1, n_jobs=-1),
OCSVM(contamination=0.1)
]
# Avoid expensive detectors for large datasets
# - KNN with large k
# - Complex neural networks
# - High-dimensional methods without dimensionality reduction
Getting Help¶
If you encounter issues not covered in this guide:
- Check the New API: Ensure you're using the updated API with direct parameters instead of DetectorConfig
- Update Import Statements: Use the new module structure for strategy imports
- Verify Parameter Names: Check that parameter names match the new API
- Check the GitHub Issues for similar problems
- Search the Discussions for solutions
- Create a new issue with:
- A minimal reproducible example using the new API
- Expected vs actual behavior
- System information (Python version, nonconform version, etc.)
- Relevant error messages
- Whether you're migrating from the old API
Migration Checklist¶
When migrating from older versions of nonconform:
- [ ] Remove
DetectorConfig
imports and usage - [ ] Update detector initialization to use direct parameters
- [ ] Change
output="p-value"
toraw=False
- [ ] Change
output="score"
toraw=True
- [ ] Update strategy imports to use new module structure
- [ ] Replace old parameter names with new ones
- [ ] Add FDR control using
scipy.stats.false_discovery_control
- [ ] Test with small datasets first
- [ ] Update any custom code that depends on the old API
- [ ] Replace
silent=True/False
with logging configuration
Logging Configuration¶
nonconform uses Python's standard logging framework to control progress bars and informational output. This provides more flexibility than the old silent
parameter.
Basic Logging Setup¶
import logging
# Show progress bars and info messages (default for development)
logging.getLogger('nonconform').setLevel(logging.INFO)
# Hide progress bars (recommended for production)
logging.getLogger('nonconform').setLevel(logging.WARNING)
# Show everything including debug info
logging.getLogger('nonconform').setLevel(logging.DEBUG)
Common Logging Scenarios¶
Production Environment:
import logging
# Hide all progress bars and info messages
logging.getLogger('nonconform').setLevel(logging.WARNING)
detector = ConformalDetector(detector=IForest(), strategy=Split())
detector.fit(X_train) # No progress output
Development Environment:
import logging
# Show progress bars for monitoring
logging.basicConfig(level=logging.INFO)
detector = ConformalDetector(detector=IForest(), strategy=CrossValidation(k=5))
detector.fit(X_train) # Shows CV fold progress
Selective Logging:
import logging
# Show progress but hide specific module info
logging.getLogger('nonconform').setLevel(logging.INFO)
logging.getLogger('nonconform.strategy.bootstrap').setLevel(logging.WARNING)
# This will show aggregation progress but hide bootstrap configuration details
Debug Mode:
import logging
# Maximum verbosity for troubleshooting
logging.basicConfig(
level=logging.DEBUG,
format='%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
# This will show all internal operations and warnings
detector = ConformalDetector(detector=LOF(), strategy=Bootstrap())
detector.fit(X_train)
Logger Hierarchy¶
nonconform uses the following logger hierarchy:
- nonconform
: Root logger for all nonconform output
- nonconform.estimation.*
: Detector-specific logging
- nonconform.strategy.*
: Strategy-specific logging
- nonconform.utils.*
: Utility function logging
You can configure specific loggers for fine-grained control over output.