Skip to content

Online FDR: Online False Discovery Rate Control Algorithms

python License Code style: black PyPI version

online-fdr is a comprehensive Python library for controlling False Discovery Rate (FDR) and Family-Wise Error Rate (FWER) in online multiple hypothesis testing scenarios. Unlike traditional methods that require all p-values upfront, this library provides truly online algorithms that make decisions sequentially as data arrives.

Why Online FDR Control?

In many modern applications, hypotheses arrive sequentially and decisions must be made in real-time:

Interim analyses as patient data accumulates, allowing for early stopping or protocol modifications while maintaining statistical validity.

Continuous experimentation in tech companies where new variants are tested as they're developed, requiring immediate go/no-go decisions.

Sequential gene discovery studies where new candidates are evaluated as they're identified through various screening methods.

Real-time anomaly detection in trading systems where suspicious patterns must be flagged immediately as they occur.

Ongoing feature testing and optimization where user behavior changes need rapid assessment for business decisions.

Key Features

  • โœ… True Online Processing: Make immediate decisions without waiting for future data
  • ๐Ÿ”’ Rigorous Statistical Guarantees: Maintain FDR control under various dependency structures
  • ๐Ÿ”„ Unified API: Consistent interface across all methods with test_one() for sequential testing
  • ๐Ÿ“Š Comprehensive Method Coverage: State-of-the-art algorithms from recent literature
  • ๐Ÿš€ Performance Optimized: Efficient implementations suitable for high-throughput applications
  • ๐Ÿ“š Rich Documentation: Detailed mathematical explanations and practical examples

Quick Installation

pip install online-fdr

Quick Start Example

from online_fdr.investing.addis.addis import Addis
from online_fdr.utils.generation import DataGenerator, GaussianLocationModel

# Initialize a data generator for demonstration
dgp = GaussianLocationModel(alt_mean=3.0, alt_std=1.0, one_sided=True)
generator = DataGenerator(n=1000, pi0=0.9, dgp=dgp)  # 10% alternatives

# Create an online FDR procedure  
addis = Addis(alpha=0.05, wealth=0.025, lambda_=0.25, tau=0.5)

# Test hypotheses sequentially
discoveries = []
for i in range(100):
    p_value, label = generator.sample_one()
    is_discovery = addis.test_one(p_value)

    if is_discovery:
        discoveries.append(i)
        print(f"Discovery at test {i}: p-value = {p_value:.4f}")

print(f"Made {len(discoveries)} discoveries")

Available Methods

Sequential Testing (One-by-One)

Method Family Methods Best For
Alpha Investing GAI, SAFFRON, ADDIS High-throughput screening
LORD LORD3, LORD++, D-LORD, Discard, Memory Decay Time series with trends
LOND LOND Independent/weakly dependent p-values
Alpha Spending Bonferroni, LORD3 spending Conservative control

Batch Testing

Method Description Best For
BatchBH Classic Benjamini-Hochberg Independent p-values
BatchStoreyBH Adaptive Storey-BH procedure Unknown null proportion
BatchPRDS Positive regression dependency Positively correlated tests
BatchBY Benjamini-Yekutieli Arbitrary dependence

Mathematical Guarantees

All implemented methods provide rigorous theoretical guarantees:

FDR Control

For FDR control methods: \(\mathbb{E}[\text{FDR}] \leq \alpha\) under specified conditions

FWER Control

For alpha spending methods: \(\mathbb{P}(\text{FWER} > 0) \leq \alpha\)

Getting Started

Start with our Quick Start Guide for a hands-on introduction to the library.

Explore the Theory Section for mathematical foundations and algorithm details.

Jump to Examples for real-world use cases and method comparisons.

Check the API Reference for detailed class and method documentation.

Acknowledgements

This library is inspired by and validated against the R package onlineFDR.

Key differentiator: Our implementation provides a truly online API with test_one() method calls, enabling real-time sequential applications. The R package requires pre-collected data arrays.

Support

  • ๐Ÿ“– Documentation: Comprehensive guides and API reference
  • ๐Ÿ› Issues: Report bugs on GitHub Issues
  • ๐Ÿ’ฌ Discussions: Ask questions in GitHub Discussions
  • ๐Ÿ“ง Contact: Reach out to the maintainers for collaboration opportunities

License

This project is licensed under the BSD 3-Clause License - see the LICENSE file for details.