Concepts¶
Understanding the fundamental concepts behind online FDR control is crucial for choosing the right methods and interpreting results correctly. This section covers the key theoretical foundations and practical considerations.
Multiple Testing Problem¶
The Challenge¶
When testing multiple hypotheses simultaneously, the probability of making at least one false discovery increases rapidly:
For \(m\) independent tests at level \(\alpha = 0.05\): - 1 test: 5% chance of false positive - 10 tests: 40% chance
- 100 tests: 99.4% chance
This inflation of Type I error necessitates multiple testing corrections.
Error Rate Definitions¶
Family-Wise Error Rate (FWER)
The probability of making at least one false discovery: \(\(\text{FWER} = P(\text{number of false positives} \geq 1)\)\)
False Discovery Rate (FDR)
The expected proportion of false discoveries among all discoveries: \(\(\text{FDR} = E\left[\frac{\text{number of false positives}}{\max(\text{number of discoveries}, 1)}\right]\)\)
When to Use Each¶
Metric | Best For | Trade-off |
---|---|---|
FWER | Safety-critical applications (medical devices, drug approval) | Very conservative, low power |
FDR | Exploratory research (genomics, screening studies) | More powerful, allows some false positives |
Online vs Batch Testing¶
Batch Testing Paradigm¶
Traditional approach: Collect all p-values first, then apply correction
# Batch approach
p_values = [0.01, 0.03, 0.08, 0.005, 0.12] # All collected first
rejections = benjamini_hochberg(p_values, alpha=0.05)
Characteristics: - ✅ Optimal power for given FDR level - ✅ Simple to understand and implement - ❌ Requires waiting for all tests - ❌ No early stopping possible
Online Testing Paradigm¶
Modern approach: Make decisions immediately as each p-value arrives
# Online approach
method = Addis(alpha=0.05, wealth=0.025, lambda_=0.25, tau=0.5)
for p_value in p_value_stream: # Process one at a time
decision = method.test_one(p_value) # Immediate decision
if decision:
print(f"Significant result: p = {p_value}")
Characteristics: - ✅ Immediate decisions possible - ✅ Can stop early if needed - ✅ Suitable for streaming data - ❌ May have slightly lower power - ❌ More complex parameter tuning
When to Choose Each¶
- Hypotheses arrive sequentially over time
- Early stopping is valuable
- Streaming/real-time processing needed
- Interim analyses required
- All p-values available upfront
- Maximum power is essential
- Simple implementation preferred
- Post-hoc analysis of completed study
Alpha Spending vs Alpha Investing¶
Online FDR methods fall into two main paradigms:
Alpha Spending¶
Concept: Pre-allocate your total α budget across tests
# Example: Bonferroni spending
alpha_per_test = total_alpha / expected_number_of_tests
for p_value in p_values:
if p_value <= alpha_per_test:
reject()
Properties: - ✅ Very conservative, guarantees FWER control - ✅ Simple to understand
- ❌ Low power, especially early in sequence - ❌ Requires knowing expected number of tests
Methods: Bonferroni spending, Holm-Bonferroni, Alpha spending functions
Alpha Investing¶
Concept: Start with wealth, earn more from discoveries, spend on rejections
# Conceptual alpha investing
initial_wealth = 0.05
current_wealth = initial_wealth
for p_value in p_values:
alpha_t = f(current_wealth, past_results) # Adaptive threshold
if p_value <= alpha_t:
reject()
current_wealth += reward # Earn from discovery
else:
current_wealth -= cost # Pay for testing (optional)
Properties: - ✅ Adaptive thresholds based on past success - ✅ Higher power than spending methods - ✅ FDR control (not FWER) - ❌ More complex to understand - ❌ More parameters to tune
Methods: GAI, SAFFRON, ADDIS, LORD family, LOND
Dependency Structures¶
The dependence between test statistics critically affects method choice and performance.
Independence¶
Assumption: Test statistics (or p-values) are mutually independent
Examples: - Tests on completely different subjects - Non-overlapping genomic regions
- Independent A/B test variants
Suitable methods: Most methods work well under independence
Positive Dependence¶
Assumption: Test statistics tend to be positively correlated
Examples: - Overlapping genomic regions - Related biomarkers - Tests on subgroups of same population
Special methods: PRDS, Benjamini-Yekutieli for batch; most online methods handle this
Arbitrary Dependence¶
Assumption: No restrictions on dependence structure
Examples: - Complex correlation structures - Time series with unknown dependence - Tests with feedback loops
Conservative methods: Benjamini-Yekutieli, dependent LOND variants
Temporal Dependence¶
Special case: Sequential dependence in online settings
Examples: - Time series analysis - Sequential clinical trials - Adaptive experimentation
Specialized methods: LORD with memory decay, dependent LOND
Key Parameters and Tuning¶
Universal Parameters¶
Alpha (α)
Meaning: Target FDR level
Typical values: 0.05, 0.1, 0.2
Tuning: Set based on application tolerance for false discoveries
Alpha Investing Parameters¶
Initial Wealth (W₀)
Meaning: Starting "budget" for rejections
Typical values: α/4 to α/2
Effect: Higher values → more early power
Lambda (λ)
Meaning: Threshold for "candidate" discoveries (ADDIS/SAFFRON)
Typical values: 0.25 to 0.5
Effect: Lower values → more candidates but higher bar for rejection
Tau (τ)
Meaning: Discarding threshold for large p-values (ADDIS)
Typical values: 0.5 to 0.8
Effect: Higher values → fewer discarded tests
Reward/Payoff ®
Meaning: Wealth gained from each discovery (LORD family)
Typical values: 0.05 to 0.5
Effect: Higher values → more aggressive after discoveries
Performance Metrics¶
Traditional Metrics¶
False Discovery Rate (FDR) \(\(\text{FDR} = E\left[\frac{V}{\max(R, 1)}\right]\)\) where \(V\) = false positives, \(R\) = total rejections
Power (Sensitivity) \(\(\text{Power} = E\left[\frac{\text{True Positives}}{\text{Total Alternatives}}\right]\)\)
Online-Specific Metrics¶
Modified FDR (mFDR) For online settings with discarding: \(\(\text{mFDR} = E\left[\frac{V}{\max(R, 1)}\right] \text{ among non-discarded tests}\)\)
Temporal FDR FDR calculated at specific time points during sequential testing
Expected Discovery Time Average time until first discovery (for streaming applications)
Common Misconceptions¶
Misconception 1: Online methods are always worse
Truth: Online methods can have comparable power to batch methods, especially when designed for the specific dependency structure.
Misconception 2: More parameters = better performance
Truth: Simple methods often perform well. Complex parameterizations require careful tuning.
Misconception 3: FDR = False Positive Rate
Truth: FDR is conditional on making discoveries. False positive rate is marginal.
Misconception 4: Independence assumption is always violated
Truth: Many real applications have near-independence. Don't over-engineer for dependence.
Choosing Methods: Decision Framework¶
graph TD
A[Start] --> B{Streaming data?}
B -->|Yes| C[Online Methods]
B -->|No| D{All p-values available?}
D -->|Yes| E{Maximum power needed?}
D -->|No| C
E -->|Yes| F[Batch Methods]
E -->|No| C
C --> G{Independence reasonable?}
G -->|Yes| H[ADDIS/SAFFRON]
G -->|No| I{Known dependence structure?}
I -->|Positive| J[LOND/LORD variants]
I -->|Arbitrary| K[Conservative LOND]
F --> L{Null proportion known?}
L -->|Yes| M[Benjamini-Hochberg]
L -->|No| N{Dependence?}
N -->|Independent| O[Storey-BH]
N -->|Dependent| P[Benjamini-Yekutieli]
Summary¶
Understanding these concepts helps you:
- Choose appropriate methods for your data structure
- Set reasonable parameters for good performance
- Interpret results correctly in your application context
- Design simulation studies for method validation
Next: Learn about specific Sequential Testing Methods or jump to Examples to see these concepts in action.