Edit this page View source of this page

False Discovery Rate Control¶

FDR control is the decision layer for batch anomaly detection. Use it when you turn many conformal p-values into many anomaly flags and want to control the expected fraction of false flags among the points you investigate.

What is FDR and Why Does It Matter?¶

When you test many observations for anomalies, some will look anomalous by chance even if they are truly normal. If you tested 1,000 truly normal, well-calibrated p-values one by one at alpha = 0.05, you would expect about 50 false positives before any multiple-testing correction.

The false discovery proportion in one realized batch is the fraction of false positives among all observations you flag as anomalies:

\[\text{FDP} = \frac{\text{False Positives}}{\text{Total Discoveries}}\]

More precisely, that displayed fraction is the realized false discovery proportion (FDP); FDR is the expected FDP over repeated data draws.

An equivalent operational interpretation of the expected proportion is:

\[\text{FDR} \approx \frac{\text{Wasted Effort (chasing false positives)}}{\text{Total Investigation Effort}}\]

False Discovery Rate (FDR) control adjusts the selection threshold so that the expected false-positive proportion among discoveries stays below a target level, such as 5%, when the p-values and dependence assumptions are valid. This differs from controlling false positives per individual test: FDR controls the average error proportion among the points you actually flag.

Example

Suppose your pipeline flags 100 observations as anomalies with alpha = 0.05 FDR control and the statistical assumptions hold.

Target: expected false discovery proportion at or below 5%
Realized false alarms in one batch can be lower or higher

Now compare this to an uncontrolled setup that flags 200 observations, where 50 are false positives:

False positives: 50/200 = 25% realized FDP
This means 1 in 4 investigations is wasted effort

Quick Start¶

detector.select() is the recommended single-call entry point. It combines p-value computation with the appropriate FDR-controlled selection procedure, automatically dispatching to weighted selection when a weight_estimator is configured:

detector.fit(X_train)
mask = detector.select(X_test, alpha=0.05)

For the weighted case with custom pruning:

from nonconform.enums import Pruning

mask = detector.select(
    X_test,
    alpha=0.05,
    pruning=Pruning.DETERMINISTIC,
    seed=42,
)

When you need raw p-values for custom downstream analysis (multi-alpha sweeps, diagnostics, or a separately justified combination workflow), use compute_p_values(...) plus SciPy BH:

from scipy.stats import false_discovery_control

p_values = detector.compute_p_values(X_test)
decisions = false_discovery_control(p_values, method="bh") <= 0.05

Note

detector.last_result is populated by the most recent detector.compute_p_values(...) or detector.select(...) call. See Weighted Conformal Selection below for a complete runnable example.

Selection Entry Points¶

Primary (recommended): detector.select(X_test, alpha=...) - dispatches automatically based on detector configuration; no manual result-bundle handling required.

Advanced/low-level options (for custom workflows):

Standard (exchangeable): apply BH directly via scipy.stats.false_discovery_control(...) to conformal p-values.
Weighted (covariate shift with importance weights): weighted_false_discovery_control(result=...) or weighted_false_discovery_control_from_arrays(...).

Parameter Roles (`delta` vs `alpha`)¶

When using ConditionalEmpirical, keep these roles separate:

delta: calibration confidence/failure budget inside the conditional p-value map.
alpha: target FDR level in the final selection rule.

They do not need to be equal. A common pattern is to tune delta for p-value calibration behavior and alpha for operational false discovery tolerance.

Guarantee Scope for BH-Style Selection¶

BH-style selection applied to conformal p-values has guarantees that depend on:

how valid/calibrated those p-values are,
exchangeability (or the relevant data-shift assumptions for weighted methods),
and BH dependence assumptions (independence or PRDS).

For standard split conformal outlier p-values, Bates et al. prove the PRDS property needed for BH under their assumptions. This does not mean arbitrary post-processing is safe: shared calibration data can make generic p-value combination procedures invalid without additional justification.

In other words, the selection routine itself does not create validity from invalid inputs; it preserves guarantees under the assumptions above.

Input situation	Recommended path
Standard exchangeable conformal p-values	`detector.select(...)` or SciPy BH on `compute_p_values(...)`
Weighted covariate-shift workflow	`detector.select(...)` with a `weight_estimator` so WCS is used
Arbitrary dependent or post-processed p-values	Do not assume BH validity without a separate justification
Streaming decisions over time	Use an online FDR method, not a fixed-batch BH shortcut

Strict validation for weighted inputs

Weighted FDR routines fail fast on invalid inputs. They now raise ValueError when:

score/weight arrays are not 1D numeric arrays of matching lengths
any score/weight/p-value contains non-finite values
any weight is negative
total calibration weight is not strictly positive
result.metadata["kde"] is present but malformed (missing keys, invalid shapes, non-monotone grid/CDF, or non-positive total weight)

from scipy.stats import false_discovery_control
from nonconform.fdr import (
    weighted_false_discovery_control,
    weighted_false_discovery_control_from_arrays,
)

# Standard BH selection from explicit p-values
cs_mask = false_discovery_control(result.p_values, method="bh") <= 0.05

# Strict WCS from cached result bundle
wcs_from_result = weighted_false_discovery_control(
    result=result,
    alpha=0.05,
)

# Strict WCS from explicit arrays
wcs_mask = weighted_false_discovery_control_from_arrays(
    p_values=result.p_values,
    test_scores=result.test_scores,
    calib_scores=result.calib_scores,
    test_weights=result.test_weights,
    calib_weights=result.calib_weights,
    alpha=0.05,
)

Basic Usage¶

from nonconform import ConformalDetector, Split

from pyod.models.lof import LOF

detector = ConformalDetector(
    detector=LOF(),
    strategy=Split(n_calib=0.2),
    aggregation="median",
    seed=42,
)

detector.fit(X_train)

# FDR-controlled selection at 5% - single call
discoveries = detector.select(X_test, alpha=0.05)

print(f"FDR-controlled discoveries: {discoveries.sum()}")

Weighted Conformal Selection¶

When calibration and test distributions differ in a way that matches the covariate-shift assumptions, configure a weight_estimator and call select() - it automatically dispatches to Weighted Conformalized Selection (WCS):

from nonconform import ConformalDetector, JackknifeBootstrap, logistic_weight_estimator
from nonconform.enums import Pruning
from pyod.models.iforest import IForest

detector = ConformalDetector(
    detector=IForest(random_state=1),
    strategy=JackknifeBootstrap(n_bootstraps=50),
    weight_estimator=logistic_weight_estimator(),
    seed=1,
)

detector.fit(X_train)

selected = detector.select(
    X_test,
    alpha=0.1,
    pruning=Pruning.DETERMINISTIC,
    seed=1,
)

print(f"Selected points: {selected.sum()} / {len(selected)}")

The pruning parameter controls the second-stage WCS pruning rule. DETERMINISTIC uses a fixed rule. HOMOGENEOUS and HETEROGENEOUS use shared or independent randomness. Set seed for reproducible randomized pruning decisions.

Available Methods¶

For direct BH control on conformal p-values, use scipy.stats.false_discovery_control. SciPy documents method="bh" for Benjamini-Hochberg and method="by" for the more conservative Benjamini-Yekutieli dependency-robust adjustment.

Benjamini-Hochberg (BH)¶

Method: 'bh'
Description: Most commonly used FDR control method
Assumptions: Independent tests, or tests satisfying positive regression dependence on subsets (PRDS). In plain terms, PRDS means small p-values tend to occur together in a positively dependent way; it is stricter than generic "positive dependence." Standard split conformal outlier p-values satisfy PRDS in the Bates et al. setting.
Usage: false_discovery_control(p_values, method='bh')

from scipy.stats import false_discovery_control

# BH control on conformal p-values
bh_adjusted = false_discovery_control(p_values, method='bh')
bh_discoveries = (bh_adjusted < 0.05).sum()

print(f"BH discoveries: {bh_discoveries}")

Setting FDR Levels¶

You can control the desired FDR level using the alpha parameter:

from scipy.stats import false_discovery_control

# Different FDR levels
fdr_levels = [0.01, 0.05, 0.1, 0.2]

for alpha in fdr_levels:
    discoveries = (false_discovery_control(p_values, method="bh") <= alpha).sum()
    print(f"FDR level {alpha}: {discoveries} discoveries")

When to Use FDR Control¶

Use FDR control whenever you make more than one test-level anomaly decision. This includes both batch decisions made simultaneously and decisions accumulated over time.

Core Rule¶

One test: a per-test threshold may be enough.
Multiple tests: control FDR to bound the expected fraction of false discoveries among flagged points.

Why¶

Controlled false discoveries: bounds expected false-positive proportion among detections.
Practical power trade-off: usually more powerful than stricter family-wise error control.
Scales to many tests: suitable for modern high-throughput anomaly workflows.

Sequential Note¶

If decisions are made over time (not a fixed batch), use procedures designed for online settings (see Online FDR Control for Streaming Data).

Integration with Conformal Prediction¶

select() dispatches automatically - standard or weighted - based on the detector's configuration:

from nonconform import ConformalDetector, Split, logistic_weight_estimator
from nonconform.enums import Pruning
from pyod.models.lof import LOF

base_detector = LOF()
strategy = Split(n_calib=0.2)

# Standard: BH-style FDR selection on conformal p-values
standard_detector = ConformalDetector(
    detector=base_detector,
    strategy=strategy,
    aggregation="median",
    seed=42,
)
standard_detector.fit(X_train)
standard_mask = standard_detector.select(X_test, alpha=0.05)

# Weighted: WCS (handles covariate shift via importance weights)
weighted_detector = ConformalDetector(
    detector=base_detector,
    strategy=strategy,
    aggregation="median",
    weight_estimator=logistic_weight_estimator(),
    seed=42,
)
weighted_detector.fit(X_train)
weighted_mask = weighted_detector.select(
    X_test,
    alpha=0.05,
    pruning=Pruning.DETERMINISTIC,
    seed=42,
)

print(f"Standard detections: {standard_mask.sum()}")
print(f"Weighted detections: {weighted_mask.sum()}")

Performance Evaluation¶

Evaluate the effectiveness of FDR control using nonconform's built-in metrics:

from scipy.stats import false_discovery_control
from nonconform.metrics import false_discovery_rate, statistical_power

def evaluate_fdr_control(p_values, true_labels, alpha=0.05):
    """Evaluate FDR control performance."""
    # Apply FDR control
    discoveries = false_discovery_control(p_values, method="bh") <= alpha

    # Calculate metrics using nonconform functions
    empirical_fdr = false_discovery_rate(true_labels, discoveries)
    power = statistical_power(true_labels, discoveries)

    return {
        'discoveries': discoveries.sum(),
        'empirical_fdr': empirical_fdr,
        'power': power
    }

# Example usage
results = evaluate_fdr_control(p_values, y_true, alpha=0.05)
print(f"Discoveries: {results['discoveries']}")
print(f"Empirical FDR: {results['empirical_fdr']:.3f}")
print(f"Statistical Power: {results['power']:.3f}")

Best Practices¶

1. Choose Appropriate FDR Level¶

Very strict: alpha = 0.01 only when false positives are extremely costly (often too strict for exploratory workflows)
Standard: alpha = 0.05 for most applications
Exploratory / higher-recall: alpha = 0.10 when missing anomalies is costlier than investigating additional false positives

2. Method Selection¶

Use detector.select(...) for most conformal workflows
Use BH via SciPy for manual p-value thresholding workflows
Use BY only when you need a conservative fallback for dependence that is not covered by the BH assumptions and you accept reduced power

3. Combine with Domain Knowledge¶

from scipy.stats import false_discovery_control

# Incorporate prior knowledge about anomaly prevalence
expected_anomaly_rate = 0.02  # 2% expected anomalies
adjusted_alpha = min(0.05, expected_anomaly_rate * 2)  # Adjust FDR level

discoveries = false_discovery_control(p_values, method="bh") <= adjusted_alpha

4. Monitor Performance¶

from scipy.stats import false_discovery_control

# Track FDR control performance over time
fdr_history = []
for batch in data_batches:
    p_vals = detector.compute_p_values(batch)
    discoveries = false_discovery_control(p_vals, method="bh") <= 0.05

    if len(true_labels_batch) > 0:  # If ground truth available
        metrics = evaluate_fdr_control(p_vals, true_labels_batch)
        fdr_history.append(metrics['empirical_fdr'])

Common Pitfalls¶

1. Inappropriate Independence Assumptions¶

BH assumes independence or positive dependence
Re-check assumptions or move to methods designed for your dependence structure

2. Multiple Rounds of Testing¶

Don't apply FDR control multiple times to the same data
If doing sequential testing, use specialized methods

Online FDR Control for Streaming Data¶

For dynamic settings with streaming data batches, the optional online-fdr package provides methods that adapt to temporal dependencies while maintaining FDR control.

Do not conflate this with martingale alarm thresholds such as ville_threshold or restarted_ville_threshold in Exchangeability Martingales: those provide anytime false-alarm control on evidence processes, not FDR control across multiple tested hypotheses.

Installation and Basic Usage¶

# Install FDR dependencies
# pip install nonconform[fdr]

from online_fdr.investing.alpha.alpha import Gai

# Example with streaming conformal p-values
def streaming_anomaly_detection(data_stream, detector, alpha=0.05):
    """Online FDR control for streaming anomaly detection."""

    # Initialize online FDR method
    # GAI: alpha-investing style online FDR control
    online_fdr = Gai(alpha=alpha, wealth=alpha / 2)

    discoveries = []

    for batch in data_stream:
        # Get p-values for current batch
        p_values = detector.compute_p_values(batch)

        # Apply online FDR control
        for p_val in p_values:
            decision = online_fdr.test_one(float(p_val))
            discoveries.append(decision)

    return discoveries

LORD (Levels based On Recent Discovery) Method¶

from online_fdr.investing.lord.three import LordThree

# LORD 3: alpha allocation adapts over the testing stream
lord_fdr = LordThree(alpha=0.05, wealth=0.04, reward=0.05)

# Process streaming data with temporal adaptation
for t, (batch, p_values) in enumerate(stream_with_pvalues):
    for p_val in p_values:
        # LORD adapts rejection threshold based on recent discoveries
        reject = lord_fdr.test_one(float(p_val))

        if reject:
            print(f"Anomaly detected at time {t} with p-value {p_val:.4f}")

Statistical Assumptions for Online FDR¶

Key Requirements: - Independence assumption: Test statistics should be independent or satisfy specific dependency structures - Sequential testing: Methods designed for sequential hypothesis testing scenarios - Temporal stability: Underlying anomaly detection model should be reasonably stable

When NOT to use online FDR: - Strong temporal dependencies in p-values without proper correction - Concept drift affecting p-value calibration - Non-stationary data streams requiring model retraining

Best practice: Combine with windowed model retraining and exchangeability monitoring for robust streaming anomaly detection.

References¶

Benjamini, Y., & Hochberg, Y. (1995). Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. Journal of the Royal Statistical Society: Series B, 57(1), 289-300.
Benjamini, Y., & Yekutieli, D. (2001). The Control of the False Discovery Rate in Multiple Testing under Dependency. The Annals of Statistics, 29(4), 1165-1188.
Bates, S., Candès, E., Lei, L., Romano, Y., & Sesia, M. (2023). Testing for Outliers with Conformal p-values. The Annals of Statistics, 51(1), 149-178.
Jin, Y., & Candès, E. J. (2023). Model-free Selective Inference Under Covariate Shift via Weighted Conformal p-values. Biometrika, 110(4), 1090-1106.
SciPy documentation. scipy.stats.false_discovery_control.

Next Steps¶

Learn about weighted conformal p-values for handling distribution shift
Explore different conformalization strategies for various scenarios
Read about best practices for robust anomaly detection