Mathematical Theory¶
This section provides the mathematical foundations underlying online false discovery rate (FDR) control methods. Understanding the theory helps in choosing appropriate methods, setting parameters correctly, and interpreting results.
Overview¶
Online FDR control addresses the fundamental challenge of multiple hypothesis testing in sequential settings where:
- Hypotheses arrive over time: \(H_1, H_2, H_3, \ldots\)
- Decisions must be immediate: Can't wait for future p-values
- Statistical guarantees must hold: FDR ≤ α at all stopping times
This creates a fundamentally different problem from traditional batch multiple testing correction.
Core Mathematical Framework¶
Problem Setup¶
Let \((H_i, p_i)\) be the sequence of null hypotheses and corresponding p-values arriving at times \(i = 1, 2, 3, \ldots\).
Online Testing Rule: A sequence of (possibly random) thresholds \((\alpha_i)_{i \geq 1}\) where: - \(\alpha_i\) depends only on \((p_1, R_1), \ldots, (p_{i-1}, R_{i-1})\) - \(R_i = \mathbf{1}(p_i \leq \alpha_i)\) is the rejection indicator
False Discovery Rate¶
For any stopping time \(T\), define:
where: - \(V(T) = \sum_{i=1}^T (1-H_i) R_i\) (false discoveries) - \(R(T) = \sum_{i=1}^T R_i\) (total discoveries) - \(H_i = 1\) if \(H_i\) is false (alternative), 0 if true (null)
Goal: Design \((\alpha_i)\) such that \(\text{FDR}(T) \leq \alpha\) for all \(T\).
Theoretical Foundations¶
FDR Control Theory¶
Mathematical principles of FDR control
- Benjamini-Hochberg procedure and extensions
- Online vs batch FDR control differences
- Stopping time considerations
- Dependence structures and their impact
Algorithm Design¶
Core algorithmic approaches and their analysis
- Alpha investing framework
- Wealth dynamics and martingale theory
- Sequential testing with discarding
- Adaptive estimation techniques
Statistical Guarantees¶
Theoretical guarantees and their assumptions
- Independence and dependence assumptions
- Uniform and conservative null hypotheses
- Finite-sample vs asymptotic results
- Robustness to model misspecification
Key Theoretical Results¶
Alpha Investing Martingale¶
The foundation of alpha investing methods relies on a wealth martingale:
Theorem (Foster & Stine, 2008): Define wealth process \(W_i\) where: - \(W_0 = w_0 \geq 0\) (initial wealth)
- \(W_i = W_{i-1} + \psi_i\) where \(\psi_i\) is the wealth change from test \(i\)
If \(\mathbb{E}[\psi_i | \mathcal{F}_{i-1}] \leq 0\) for all \(i\), then \((W_i)\) is a supermartingale, and:
This leads to FDR control when wealth is properly allocated.
SAFFRON Adaptive Estimation¶
Theorem (Ramdas et al., 2018): The SAFFRON procedure with adaptive null proportion estimation \(\hat{\pi}_0(t)\) satisfies:
Under suitable conditions on the estimator, this yields \(\text{FDR}(T) \leq \alpha\).
ADDIS Discarding Analysis¶
Theorem (Tian & Ramdas, 2019): For conservative nulls with \(F_0(p) \leq p\), ADDIS with discarding threshold \(\tau\) satisfies:
where mFDR is the modified FDR among non-discarded tests.
Common Mathematical Constructions¶
1. Gamma Sequences¶
Many methods use probability sequences \((\gamma_i)\) with: - \(\gamma_i \geq 0\) and \(\sum_{i=1}^{\infty} \gamma_i = 1\) - Often \(\gamma_i = C \log(\max(i,2))/(i \exp(\sqrt{\log i}))\)
These control the allocation of alpha-wealth across tests.
2. Candidate Selection¶
Methods like SAFFRON and ADDIS use candidate sets: - \(C_i = \{j \leq i : p_j \leq \lambda\}\) (candidates by time \(i\)) - Rejection thresholds apply only to candidates - Allows adaptive focusing on promising hypotheses
3. Wealth Dynamics¶
General wealth update formula: \(\(W_i = W_{i-1} - \text{cost}_i + \text{reward}_i \cdot R_i\)\)
where: - \(\text{cost}_i\): alpha spent on test \(i\) - \(\text{reward}_i\): alpha earned from discovery \(i\)
Dependency Structures¶
Independent P-values¶
Assumption: \(p_1, p_2, \ldots\) are mutually independent
Methods: All methods work, optimal power achievable
Theory: Classical martingale arguments apply directly
Positive Dependence¶
Assumption: Test statistics exhibit positive correlation
Examples: - Genomic data from nearby regions - Related biomarkers
- Overlapping populations
Theory: FDR control often preserved under positive dependence
Arbitrary Dependence¶
Assumption: No restrictions on dependence structure
Methods: Conservative variants (Benjamini-Yekutieli, dependent LOND)
Theory: Requires more sophisticated analysis, often less powerful
Temporal Dependence¶
Special case: Sequential correlation in online settings
Methods: LORD with memory decay, time-aware procedures
Key Theoretical Insights¶
1. Online vs Batch Trade-offs¶
Batch optimal: Benjamini-Hochberg achieves minimum FDR for given power
Online constraint: Cannot achieve same optimality due to limited information
Gap: Online methods typically have 5-15% power loss vs batch
2. Conservative Nulls Advantage¶
When null p-values satisfy \(F_0(p) \leq p\) (conservative): - Traditional methods lose power significantly - ADDIS-type discarding methods maintain or gain power - Real applications often have conservative nulls
3. Wealth Management Principle¶
Optimal alpha-wealth allocation follows: - Spend cautiously early: Build up discoveries for momentum - Spend aggressively after success: Leverage discovery clusters
- Save for dry periods: Maintain ability to detect sparse signals
Advanced Topics¶
Asynchronous Testing¶
Tests that start and finish at random times: - Martingale theory extends with random stopping
- Requires careful handling of information flow - Applications: Clinical trials, A/B testing platforms
Group Sequential Methods¶
Testing families of related hypotheses: - Hierarchical testing structures - Closed testing procedures - Applications: Clinical trials with multiple endpoints
Nonparametric Methods¶
FDR control without distributional assumptions: - Rank-based procedures - Permutation methods
- Bootstrap approaches
Computational Considerations¶
Algorithm Complexity¶
Most online FDR methods have: - Time per test: \(O(1)\) to \(O(\log i)\) - Memory: \(O(1)\) to \(O(i)\) for history-dependent methods - Numerical stability: Well-behaved for reasonable parameters
Implementation Challenges¶
- Floating point precision: Wealth can become very small
- Parameter selection: Often requires domain knowledge
- Monitoring: Need real-time performance tracking
Open Research Questions¶
Theoretical Gaps¶
- Optimal online procedures: What's the fundamental limit of online FDR control?
- Dependence modeling: Better handling of complex correlation structures
- Adaptive parameter selection: Theory-guided parameter tuning
Practical Extensions¶
- Multi-objective control: Simultaneous FDR and power optimization
- Contextual information: Using covariate information in decisions
- Distributed testing: FDR control across multiple data centers
Mathematical Notation Reference¶
Symbol | Meaning |
---|---|
\(H_i\) | \(i\)-th null hypothesis |
\(p_i\) | \(i\)-th p-value |
\(\alpha_i\) | \(i\)-th significance threshold |
\(R_i\) | \(i\)-th rejection indicator |
\(V(T)\) | False discoveries by time \(T\) |
\(R(T)\) | Total discoveries by time \(T\) |
\(W_i\) | Alpha-wealth at time \(i\) |
\(\gamma_i\) | \(i\)-th element of gamma sequence |
\(\mathcal{F}_i\) | Filtration (information) up to time \(i\) |
\(\pi_0\) | Proportion of true null hypotheses |
\(\lambda\) | Candidate threshold |
\(\tau\) | Discarding threshold |
Further Reading¶
Each subsection provides detailed mathematical development:
- FDR Control Theory: Fundamental principles and classical results
- Algorithm Design: Mathematical construction of online methods
- Statistical Guarantees: Formal theorems and their proofs
For implementation details, see the API Reference.
For practical applications, explore Examples.