Online FDR: Online False Discovery Rate Control Algorithms¶

online-fdr is a comprehensive Python library for controlling False Discovery Rate (FDR) and Family-Wise Error Rate (FWER) in online multiple hypothesis testing scenarios. Unlike traditional methods that require all p-values upfront, this library provides truly online algorithms that make decisions sequentially as data arrives.

Why Online FDR Control?¶

In many modern applications, hypotheses arrive sequentially and decisions must be made in real-time:

Clinical TrialsA/B TestingGenomicsFinanceWeb Analytics

Interim analyses as patient data accumulates, allowing for early stopping or protocol modifications while maintaining statistical validity.

Continuous experimentation in tech companies where new variants are tested as they're developed, requiring immediate go/no-go decisions.

Sequential gene discovery studies where new candidates are evaluated as they're identified through various screening methods.

Real-time anomaly detection in trading systems where suspicious patterns must be flagged immediately as they occur.

Ongoing feature testing and optimization where user behavior changes need rapid assessment for business decisions.

Key Features¶

✅ True Online Processing: Make immediate decisions without waiting for future data
🔒 Rigorous Statistical Guarantees: Maintain FDR control under various dependency structures
🔄 Unified API: Consistent interface across all methods with test_one() for sequential testing
📊 Comprehensive Method Coverage: State-of-the-art algorithms from recent literature
🚀 Performance Optimized: Efficient implementations suitable for high-throughput applications
📚 Rich Documentation: Detailed mathematical explanations and practical examples

Quick Installation¶

pip install online-fdr

Quick Start Example¶

from online_fdr.investing.addis.addis import Addis
from online_fdr.utils.generation import DataGenerator, GaussianLocationModel

# Initialize a data generator for demonstration
dgp = GaussianLocationModel(alt_mean=3.0, alt_std=1.0, one_sided=True)
generator = DataGenerator(n=1000, pi0=0.9, dgp=dgp)  # 10% alternatives

# Create an online FDR procedure  
addis = Addis(alpha=0.05, wealth=0.025, lambda_=0.25, tau=0.5)

# Test hypotheses sequentially
discoveries = []
for i in range(100):
    p_value, label = generator.sample_one()
    is_discovery = addis.test_one(p_value)

    if is_discovery:
        discoveries.append(i)
        print(f"Discovery at test {i}: p-value = {p_value:.4f}")

print(f"Made {len(discoveries)} discoveries")

Available Methods¶

Sequential Testing (One-by-One)¶

Method Family	Methods	Best For
Alpha Investing	GAI, SAFFRON, ADDIS	High-throughput screening
LORD	LORD3, LORD++, D-LORD, Discard, Memory Decay	Time series with trends
LOND	LOND	Independent/weakly dependent p-values
Alpha Spending	Bonferroni, LORD3 spending	Conservative control

Batch Testing¶

Method	Description	Best For
BatchBH	Classic Benjamini-Hochberg	Independent p-values
BatchStoreyBH	Adaptive Storey-BH procedure	Unknown null proportion
BatchPRDS	Positive regression dependency	Positively correlated tests
BatchBY	Benjamini-Yekutieli	Arbitrary dependence

Mathematical Guarantees¶

All implemented methods provide rigorous theoretical guarantees:

FDR Control

For FDR control methods: \(\mathbb{E}[\text{FDR}] \leq \alpha\) under specified conditions

FWER Control

For alpha spending methods: \(\mathbb{P}(\text{FWER} > 0) \leq \alpha\)

Getting Started¶

New UsersResearchersPractitionersDevelopers

Start with our Quick Start Guide for a hands-on introduction to the library.

Explore the Theory Section for mathematical foundations and algorithm details.

Jump to Examples for real-world use cases and method comparisons.

Check the API Reference for detailed class and method documentation.

Acknowledgements¶

This library is inspired by and validated against the R package onlineFDR.

Key differentiator: Our implementation provides a truly online API with test_one() method calls, enabling real-time sequential applications. The R package requires pre-collected data arrays.

Support¶

📖 Documentation: Comprehensive guides and API reference
🐛 Issues: Report bugs on GitHub Issues
💬 Discussions: Ask questions in GitHub Discussions
📧 Contact: Reach out to the maintainers for collaboration opportunities

License¶

This project is licensed under the BSD 3-Clause License - see the LICENSE file for details.