GAI: Generalized Alpha-Investing¶
GAI (Generalized Alpha-Investing) extends the original alpha-investing procedure of Foster and Stine (2008) for sequential control of expected false discoveries, using SAFFRON-style update rules for improved power.
Original Papers
Foster, D., and R. Stine. "α-investing: a procedure for sequential control of expected false discoveries." Journal of the Royal Statistical Society (Series B), 70(2):429-444, 2008.
Ramdas, A., T. Zrnic, M. J. Wainwright, and M. I. Jordan. "SAFFRON: an adaptive algorithm for online control of the FDR." Proceedings of the 35th International Conference on Machine Learning (ICML), 2018.
Overview¶
The Alpha-Investing Revolution¶
Alpha-investing introduced a paradigm shift in sequential hypothesis testing: instead of fixed significance levels, the procedure earns back probability when discoveries are made. This creates a dynamic system where successful discoveries enable more powerful future testing.
Key Innovation¶
The fundamental insight is that when you reject a null hypothesis, you gain evidence that not all hypotheses are null, justifying spending more α-wealth on future tests. This creates a virtuous cycle where discoveries beget more discoveries.
GAI Enhancement¶
This implementation combines the original alpha-investing philosophy with SAFFRON's gamma sequence and update rules, providing a more principled approach to wealth allocation while maintaining the core alpha-investing benefits.
Class Reference¶
online_fdr.investing.alpha.alpha.Gai
¶
Bases: AbstractSequentialTest
GAI: Generalized Alpha-Investing for online FDR control with SAFFRON updates.
Generalized Alpha-Investing (GAI) extends the original alpha-investing procedure of Foster and Stine (2008) for sequential control of expected false discoveries. This implementation uses SAFFRON-style update rules for improved power while maintaining the core alpha-investing philosophy.
Alpha-investing resembles alpha-spending but with a key difference: when a test rejects a null hypothesis, the procedure earns additional probability toward subsequent tests. This allows incorporation of domain knowledge and improved power over non-adaptive methods.
The GAI framework has become fundamental for online hypothesis testing, providing a robust, computationally efficient approach that requires no parametric assumptions about underlying null and alternative distributions.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
alpha | float | Target FDR level (e.g., 0.05 for 5% FDR). Must be in (0, 1). | required |
wealth | float | Initial alpha-wealth for purchasing rejection thresholds. Must satisfy 0 ≤ wealth ≤ alpha. | required |
Attributes:
Name | Type | Description |
---|---|---|
alpha0 | float | Original target FDR level. |
wealth0 | float | Initial wealth allocation. |
num_test | int | Number of hypotheses tested so far. |
candidates | list[bool] | Boolean list indicating which tests were candidates. |
reject_idx | list[int] | Indices of rejected hypotheses. |
Examples:
>>> # Basic usage
>>> gai = Gai(alpha=0.05, wealth=0.025)
>>> decision = gai.test_one(0.01) # Test a small p-value
>>> print(f"Rejected: {decision}")
>>> # Sequential testing with wealth dynamics
>>> p_values = [0.001, 0.3, 0.02, 0.8, 0.005]
>>> decisions = [gai.test_one(p) for p in p_values]
>>> discoveries = sum(decisions)
References
Foster, D., and R. Stine (2008). "α-investing: a procedure for sequential control of expected false discoveries." Journal of the Royal Statistical Society (Series B), 70(2):429-444.
Ramdas, A., T. Zrnic, M. J. Wainwright, and M. I. Jordan (2018). "SAFFRON: an adaptive algorithm for online control of the FDR." Proceedings of the 35th International Conference on Machine Learning (ICML), Proceedings of Machine Learning Research, vol. 80, pp. 4286-4294, PMLR.
Source code in online_fdr/investing/alpha/alpha.py
6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 |
|
Usage Examples¶
Basic Alpha-Investing¶
from online_fdr.investing.alpha.alpha import Gai
# Create GAI instance
gai = Gai(alpha=0.05, wealth=0.025)
# Test individual p-values
p_values = [0.001, 0.15, 0.03, 0.8, 0.02, 0.45, 0.006]
print("Generalized Alpha-Investing:")
discoveries = []
for i, p_value in enumerate(p_values):
decision = gai.test_one(p_value)
if decision:
discoveries.append(i + 1)
print(f"✓ Test {i+1}: p={p_value:.3f} → DISCOVERY!")
else:
print(f" Test {i+1}: p={p_value:.3f} → no rejection")
print(f"\nTotal discoveries: {len(discoveries)}")
print(f"Discovery indices: {discoveries}")
Understanding Wealth Dynamics¶
def demonstrate_gai_mechanism():
"""Show how GAI differs from fixed-level testing."""
print("GAI vs Fixed-Level Testing:")
print("=" * 35)
test_sequence = [0.001, 0.8, 0.02, 0.9, 0.005, 0.7, 0.015]
# GAI with dynamic thresholds
gai = Gai(alpha=0.1, wealth=0.05) # Higher values for visibility
print("GAI (Dynamic Thresholds):")
gai_discoveries = 0
for i, p_val in enumerate(test_sequence, 1):
# Calculate threshold before testing
gai.num_test += 1
threshold = gai.calc_alpha_t()
decision = p_val <= threshold
if decision:
gai_discoveries += 1
gai.candidates.append(p_val <= gai.alpha0) # Assuming p_val as candidate check
gai.reject_idx.append(gai.num_test)
print(f"Test {i}: p={p_val:.3f}, threshold={threshold:.6f} → {'REJECT' if decision else 'ACCEPT'}")
# Fixed-level testing for comparison
print(f"\nFixed-Level (α=0.05):")
fixed_discoveries = 0
for i, p_val in enumerate(test_sequence, 1):
decision = p_val <= 0.05
if decision:
fixed_discoveries += 1
print(f"Test {i}: p={p_val:.3f}, threshold=0.050000 → {'REJECT' if decision else 'ACCEPT'}")
print(f"\nComparison:")
print(f"GAI discoveries: {gai_discoveries}")
print(f"Fixed-level discoveries: {fixed_discoveries}")
print(f"Power advantage: {gai_discoveries - fixed_discoveries}")
demonstrate_gai_mechanism()
Incorporating Prior Knowledge¶
def gai_with_prior_knowledge():
"""Demonstrate how GAI can incorporate domain knowledge."""
print("GAI with Prior Knowledge:")
print("=" * 30)
# Simulate a scenario where you expect more promising tests later
# (e.g., genomics where genes are ordered by biological relevance)
# Early tests: mostly null
early_tests = [0.8, 0.7, 0.9, 0.6, 0.75]
# Later tests: mix of null and alternative
later_tests = [0.001, 0.02, 0.8, 0.005, 0.03, 0.9, 0.007]
# Conservative start to preserve wealth for later promising tests
gai = Gai(alpha=0.05, wealth=0.01) # Lower initial wealth
print("Early phase (expected mostly nulls):")
early_discoveries = 0
for i, p_val in enumerate(early_tests, 1):
decision = gai.test_one(p_val)
if decision:
early_discoveries += 1
print(f"✓ Test {i}: p={p_val:.3f} → DISCOVERY")
else:
print(f" Test {i}: p={p_val:.3f} → no rejection (conservative)")
print(f"\nTransition to promising region...")
print("Later phase (expected more alternatives):")
later_discoveries = 0
for i, p_val in enumerate(later_tests, len(early_tests) + 1):
decision = gai.test_one(p_val)
if decision:
later_discoveries += 1
print(f"✓ Test {i}: p={p_val:.3f} → DISCOVERY")
else:
print(f" Test {i}: p={p_val:.3f} → no rejection")
print(f"\nResults:")
print(f"Early discoveries: {early_discoveries}")
print(f"Later discoveries: {later_discoveries}")
print(f"Total discoveries: {early_discoveries + later_discoveries}")
print("GAI preserved wealth for the promising region!")
gai_with_prior_knowledge()
Comparison with Other Alpha-Investing Methods¶
from online_fdr.investing.saffron.saffron import Saffron
from online_fdr.investing.lord.three import LordThree
def compare_alpha_investing_family():
"""Compare different alpha-investing approaches."""
print("Alpha-Investing Family Comparison:")
print("=" * 40)
# Test sequence with varied difficulty
p_values = [0.002, 0.8, 0.01, 0.9, 0.005, 0.7, 0.03, 0.6, 0.008]
# Different alpha-investing approaches
methods = {
'GAI': Gai(alpha=0.05, wealth=0.025),
'SAFFRON': Saffron(alpha=0.05, wealth=0.025, lambda_=0.5),
'LORD 3': LordThree(alpha=0.05, wealth=0.025, reward=0.025)
}
results = {}
for method_name, method in methods.items():
decisions = [method.test_one(p) for p in p_values]
discoveries = sum(decisions)
discovery_indices = [i+1 for i, d in enumerate(decisions) if d]
results[method_name] = {
'discoveries': discoveries,
'indices': discovery_indices
}
print(f"{method_name:>8}: {discoveries} discoveries at positions {discovery_indices}")
return results
compare_alpha_investing_family()
Simulating Industrial A/B Testing¶
from online_fdr.utils.generation import DataGenerator, GaussianLocationModel
def simulate_ab_testing_with_gai():
"""Simulate GAI in an industrial A/B testing environment."""
print("Industrial A/B Testing with GAI:")
print("=" * 36)
# Simulate A/B testing scenario:
# - Many tests run simultaneously
# - Most are null (no real effect)
# - Some have real but small effects
# - Occasional large effects
dgp = GaussianLocationModel(alt_mean=1.5, alt_std=1.0, one_sided=True)
generator = DataGenerator(n=200, pi0=0.95, dgp=dgp) # 95% nulls (realistic)
gai = Gai(alpha=0.05, wealth=0.025)
true_positives = 0
false_positives = 0
test_count = 0
print("Running A/B tests sequentially...")
# Simulate first 100 tests
for i in range(100):
p_value, is_alternative = generator.sample_one()
decision = gai.test_one(p_value)
test_count += 1
if decision:
if is_alternative:
true_positives += 1
result_type = "TRUE effect ✓"
else:
false_positives += 1
result_type = "FALSE alarm ✗"
# Show significant results
effect_type = "REAL" if is_alternative else "NULL"
print(f"Test {i+1:3d}: p={p_value:.4f} ({effect_type}) → SIGNIFICANT ({result_type})")
# Calculate business metrics
total_discoveries = true_positives + false_positives
empirical_fdr = false_positives / max(total_discoveries, 1)
power = true_positives / max(sum(generator.sample_one()[1] for _ in range(100)), 1) # Approximate
print(f"\nA/B Testing Campaign Results:")
print(f"Total tests run: {test_count}")
print(f"Significant effects found: {total_discoveries}")
print(f"True discoveries (real effects): {true_positives}")
print(f"False alarms: {false_positives}")
print(f"False Discovery Rate: {empirical_fdr:.3f}")
print(f"Target FDR: {gai.alpha0}")
print(f"FDR controlled: {'✓' if empirical_fdr <= gai.alpha0 else '✗'}")
# Business interpretation
print(f"\nBusiness Impact:")
if true_positives > 0:
print(f"✓ Found {true_positives} real improvements to implement")
if false_positives > 0:
print(f"⚠ {false_positives} false alarms avoided implementing bad changes")
efficiency = true_positives / max(total_discoveries, 1)
print(f"Discovery efficiency: {efficiency:.1%}")
simulate_ab_testing_with_gai()
Mathematical Foundation¶
Core Alpha-Investing Principle¶
The fundamental equation of alpha-investing is the wealth update rule:
where the payout compensates for the discovery, enabling future testing.
GAI Threshold Formula¶
GAI uses SAFFRON-style gamma sequences to set thresholds:
The wealth allocation adapts based on candidate history and discovery patterns.
Theoretical Guarantees¶
Theorem (Alpha-Investing FDR Control): Under independence, GAI controls the False Discovery Rate (FDR) at level α.
The proof relies on the martingale property of the wealth process under the null hypothesis.
Historical Context and Evolution¶
Original Alpha-Investing (Foster & Stine, 2008)¶
- Introduced the wealth-based paradigm
- Controlled mFDR (marginal FDR) rather than FDR
- Simple payout rules
Generalized Alpha-Investing¶
- Extended to various payout schemes
- Incorporated prior weights and penalties
- Better theoretical understanding
Modern Variants (GAI)¶
- SAFFRON-style update rules
- Improved power characteristics
- Maintains original philosophy with better performance
Best Practices¶
Parameter Selection¶
Wealth Selection Guidelines
- Conservative: W₀ = α/4 (preserves wealth for later)
- Moderate: W₀ = α/2 (balanced approach)
- Aggressive: W₀ = α (spends wealth early)
Domain Knowledge Integration
- Start conservatively if expecting null-heavy early tests
- Higher initial wealth if early alternatives are expected
- Consider test ordering when possible
When to Use GAI¶
Good Use Cases
- Industrial A/B testing: Many tests, mostly null, need efficiency
- Sequential screening: Tests arrive over time, immediate decisions needed
- Prior knowledge: Can order tests by expected promise
- Computational constraints: Simpler than adaptive methods
Consider Alternatives
- Unknown π₀: SAFFRON adapts better to null proportion
- Conservative nulls: ADDIS handles better
- Batch setting: Standard BH procedures are optimal
Common Pitfalls¶
- Over-aggressive early spending: Leaves no wealth for later discoveries
- Under-conservative start: Misses early opportunities
- Ignoring test ordering: Random order wastes the alpha-investing advantage
- Wrong wealth initialization: Mismatch with discovery expectations
References¶
-
Foster, D. P., and R. A. Stine (2008). "α-investing: a procedure for sequential control of expected false discoveries." Journal of the Royal Statistical Society: Series B, 70(2):429-444.
-
Ramdas, A., T. Zrnic, M. J. Wainwright, and M. I. Jordan (2018). "SAFFRON: an adaptive algorithm for online control of the FDR." Proceedings of the 35th International Conference on Machine Learning (ICML), PMLR, 80:4286-4294.
-
Aharoni, E., and D. Rosset (2014). "Generalized α-investing: definitions, optimality results and application to public databases." Journal of the Royal Statistical Society: Series B, 76(4):771-794.
-
Li, L., and J. G. Canner (2007). "Modified alpha-investing: a procedure for multiple testing with prior knowledge." Computational Statistics & Data Analysis, 51(7):3598-3607.