Skip to content

False Discovery Rate Control

This guide explains how to use False Discovery Rate (FDR) control in nonconform for multiple testing scenarios using scipy.stats.false_discovery_control.

Overview

FDR control is a statistical method for handling multiple hypothesis testing. In anomaly detection, it helps control the proportion of false positives among all detected anomalies. Instead of using a fixed significance level α for all tests, FDR control adjusts the threshold to maintain a desired false discovery rate.

Basic Usage

import numpy as np
from scipy.stats import false_discovery_control
from nonconform.estimation.standard import ConformalDetector
from nonconform.strategy.split import Split
from nonconform.utils.func.enums import Aggregation
from pyod.models.lof import LOF

# Initialize detector
base_detector = LOF()
strategy = Split(n_calib=0.2)

detector = ConformalDetector(
    detector=base_detector,
    strategy=strategy,
    aggregation=Aggregation.MEDIAN,
    seed=42
)

# Fit detector and get p-values
detector.fit(X_train)
p_values = detector.predict(X_test, raw=False)

# Apply FDR control
adjusted_p_values = false_discovery_control(p_values, method='bh', alpha=0.05)
discoveries = adjusted_p_values < 0.05

print(f"Original detections: {(p_values < 0.05).sum()}")
print(f"FDR-controlled discoveries: {discoveries.sum()}")

Available Methods

The scipy.stats.false_discovery_control function supports several methods:

Benjamini-Hochberg (BH)

  • Method: 'bh'
  • Description: Most commonly used FDR control method
  • Assumptions: Independent tests, or tests satisfying positive regression dependence on subsets (PRDS). Note that PRDS is more restrictive than general positive dependence - it requires that for any subset of hypotheses, the joint distribution of p-values is positively dependent.
  • Usage: false_discovery_control(p_values, method='bh')

Benjamini-Yekutieli (BY)

  • Method: 'by'
  • Description: More conservative method for arbitrary dependence
  • Assumptions: Works under any dependency structure
  • Usage: false_discovery_control(p_values, method='by')
# Compare different methods
bh_adjusted = false_discovery_control(p_values, method='bh', alpha=0.05)
by_adjusted = false_discovery_control(p_values, method='by', alpha=0.05)

bh_discoveries = (bh_adjusted < 0.05).sum()
by_discoveries = (by_adjusted < 0.05).sum()

print(f"BH discoveries: {bh_discoveries}")
print(f"BY discoveries: {by_discoveries}")

Setting FDR Levels

You can control the desired FDR level using the alpha parameter:

# Different FDR levels
fdr_levels = [0.01, 0.05, 0.1, 0.2]

for alpha in fdr_levels:
    adjusted_p_vals = false_discovery_control(p_values, method='bh', alpha=alpha)
    discoveries = (adjusted_p_vals < alpha).sum()
    print(f"FDR level {alpha}: {discoveries} discoveries")

When to Use FDR Control

Multiple Testing Scenarios

Use FDR control when: - Testing multiple hypotheses simultaneously - Analyzing high-dimensional data - Processing multiple datasets or time series - Running ensemble methods with multiple detectors

Benefits

  1. Controlled False Discovery Rate: Maintains the expected proportion of false positives
  2. Increased Power: Often more powerful than family-wise error rate (FWER) control
  3. Scalability: Works well with large numbers of tests

Practical Examples

High-dimensional Anomaly Detection

# When analyzing many features independently
n_features = X.shape[1]
feature_p_values = []

for i in range(n_features):
    # Analyze each feature separately
    X_feature = X[:, [i]]
    detector.fit(X_feature)
    p_vals = detector.predict(X_feature, raw=False)
    feature_p_values.extend(p_vals)

# Apply FDR control across all features
all_adjusted = false_discovery_control(feature_p_values, method='bh', alpha=0.05)

Multiple Time Series

# When analyzing multiple time series
time_series_data = [ts1, ts2, ts3, ...]  # Multiple time series
all_p_values = []

for ts in time_series_data:
    detector.fit(ts)
    p_vals = detector.predict(ts, raw=False)
    all_p_values.extend(p_vals)

# Control FDR across all time series
adjusted_p_vals = false_discovery_control(all_p_values, method='bh', alpha=0.05)

Integration with Conformal Prediction

FDR control works naturally with conformal prediction p-values:

from nonconform.estimation.weighted import ConformalDetector

# Use with weighted conformal detection
weighted_detector = ConformalDetector(
    detector=base_detector,
    strategy=strategy,
    aggregation=Aggregation.MEDIAN,
    seed=42
)

weighted_detector.fit(X_train)
weighted_p_values = weighted_detector.predict(X_test, raw=False)

# Apply FDR control to weighted p-values
weighted_adjusted = false_discovery_control(weighted_p_values, method='bh', alpha=0.05)
weighted_discoveries = weighted_adjusted < 0.05

Performance Evaluation

Evaluate the effectiveness of FDR control:

def evaluate_fdr_control(p_values, true_labels, alpha=0.05):
    """Evaluate FDR control performance."""
    # Apply FDR control
    adjusted_p_vals = false_discovery_control(p_values, method='bh', alpha=alpha)
    discoveries = adjusted_p_vals < alpha

    # Calculate metrics
    true_positives = np.sum(discoveries & (true_labels == 1))
    false_positives = np.sum(discoveries & (true_labels == 0))

    if discoveries.sum() > 0:
        empirical_fdr = false_positives / discoveries.sum()
        precision = true_positives / discoveries.sum()
    else:
        empirical_fdr = 0
        precision = 0

    recall = true_positives / np.sum(true_labels == 1) if np.sum(true_labels == 1) > 0 else 0

    return {
        'discoveries': discoveries.sum(),
        'true_positives': true_positives,
        'false_positives': false_positives,
        'empirical_fdr': empirical_fdr,
        'precision': precision,
        'recall': recall
    }

# Example usage
results = evaluate_fdr_control(p_values, y_true, alpha=0.05)
print(f"Empirical FDR: {results['empirical_fdr']:.3f}")
print(f"Precision: {results['precision']:.3f}")
print(f"Recall: {results['recall']:.3f}")

Best Practices

1. Choose Appropriate FDR Level

  • Conservative: α = 0.01 for critical applications
  • Standard: α = 0.05 for most applications
  • Liberal: α = 0.1 when false positives are less costly

2. Method Selection

  • Use BH for most applications (independent or positively dependent tests)
  • Use BY when tests may have negative dependence or when more conservative control is needed

3. Combine with Domain Knowledge

# Incorporate prior knowledge about anomaly prevalence
expected_anomaly_rate = 0.02  # 2% expected anomalies
adjusted_alpha = min(0.05, expected_anomaly_rate * 2)  # Adjust FDR level

adjusted_p_vals = false_discovery_control(p_values, method='bh', alpha=adjusted_alpha)

4. Monitor Performance

# Track FDR control performance over time
fdr_history = []
for batch in data_batches:
    p_vals = detector.predict(batch, raw=False)
    adj_p_vals = false_discovery_control(p_vals, method='bh', alpha=0.05)
    discoveries = adj_p_vals < 0.05

    if len(true_labels_batch) > 0:  # If ground truth available
        metrics = evaluate_fdr_control(p_vals, true_labels_batch)
        fdr_history.append(metrics['empirical_fdr'])

Common Pitfalls

1. Inappropriate Independence Assumptions

  • BH assumes independence or positive dependence
  • Use BY if negative dependence is suspected

2. Multiple Rounds of Testing

  • Don't apply FDR control multiple times to the same data
  • If doing sequential testing, use specialized methods

3. Ignoring Effect Sizes

  • FDR control doesn't consider magnitude of anomalies
  • Consider combining with effect size thresholds

Advanced Usage

Combining Multiple Detection Methods

from scipy.stats import combine_pvalues

# Get p-values from multiple detectors
detectors = [LOF(), KNN(), OCSVM()]
p_values_list = []

for detector in detectors:
    conf_detector = ConformalDetector(
        detector=detector,
        strategy=strategy,
        aggregation=Aggregation.MEDIAN,
        seed=42
    )
    conf_detector.fit(X_train)
    p_vals = conf_detector.predict(X_test, raw=False)
    p_values_list.append(p_vals)

# Combine p-values using Fisher's method
combined_stats, combined_p_values = combine_pvalues(
    np.array(p_values_list).T,
    method='fisher'
)

# Apply FDR control to combined p-values
final_adjusted = false_discovery_control(combined_p_values, method='bh', alpha=0.05)
final_discoveries = final_adjusted < 0.05

Online FDR Control for Streaming Data

For dynamic settings with streaming data batches, the optional online-fdr package provides methods that adapt to temporal dependencies while maintaining FDR control.

Installation and Basic Usage

# Install FDR dependencies
# pip install nonconform[fdr]

from onlinefdr import Alpha_investing, LORD

# Example with streaming conformal p-values
def streaming_anomaly_detection(data_stream, detector, alpha=0.05):
    """Online FDR control for streaming anomaly detection."""

    # Initialize online FDR method
    # Alpha-investing: adapts alpha based on discoveries
    online_fdr = Alpha_investing(alpha=alpha, w0=0.05)

    discoveries = []

    for batch in data_stream:
        # Get p-values for current batch
        p_values = detector.predict(batch, raw=False)

        # Apply online FDR control
        for p_val in p_values:
            decision = online_fdr.run_single(p_val)
            discoveries.append(decision)

    return discoveries

LORD (Levels based On Recent Discovery) Method

# LORD method: more aggressive when recent discoveries
lord_fdr = LORD(alpha=0.05, tau=0.5)

# Process streaming data with temporal adaptation
for t, (batch, p_values) in enumerate(stream_with_pvalues):
    for p_val in p_values:
        # LORD adapts rejection threshold based on recent discoveries
        reject = lord_fdr.run_single(p_val)

        if reject:
            print(f"Anomaly detected at time {t} with p-value {p_val:.4f}")

Statistical Assumptions for Online FDR

Key Requirements: - Independence assumption: Test statistics should be independent or satisfy specific dependency structures - Sequential testing: Methods designed for sequential hypothesis testing scenarios - Temporal stability: Underlying anomaly detection model should be reasonably stable

When NOT to use online FDR: - Strong temporal dependencies in p-values without proper correction - Concept drift affecting p-value calibration - Non-stationary data streams requiring model retraining

Best practice: Combine with windowed model retraining and exchangeability monitoring for robust streaming anomaly detection.

Next Steps