Quick Start Guide¶

This guide will get you up and running with online-fdr in just a few minutes. We'll walk through the core concepts and show you how to perform online FDR control with real examples.

Basic Concepts¶

Before diving into code, let's understand the key concepts:

Online vs Batch Testing

Batch Testing: Collect all p-values first, then apply multiple testing correction
Online Testing: Make decisions immediately as each p-value arrives

FDR Control

False Discovery Rate (FDR) is the expected proportion of false discoveries among all discoveries: \(\(\text{FDR} = \mathbb{E}\left[\frac{\text{False Positives}}{\max(\text{Total Discoveries}, 1)}\right]\)\)

Your First Online Test¶

Let's start with the simplest possible example using ADDIS:

from online_fdr.investing.addis.addis import Addis

# Create an ADDIS procedure with 5% FDR control
addis = Addis(alpha=0.05, wealth=0.025, lambda_=0.25, tau=0.5)

# Test individual p-values as they arrive
p_values = [0.001, 0.1, 0.03, 0.8, 0.02]

for i, p_val in enumerate(p_values):
    decision = addis.test_one(p_val)
    print(f"Test {i+1}: p={p_val:5.3f} → {'REJECT' if decision else 'ACCEPT'}")

Output:

Test 1: p=0.001 → REJECT
Test 2: p=0.100 → ACCEPT  
Test 3: p=0.030 → REJECT
Test 4: p=0.800 → ACCEPT
Test 5: p=0.020 → ACCEPT

Realistic Simulation¶

Let's create a more realistic scenario with simulated data:

from online_fdr.investing.addis.addis import Addis
from online_fdr.utils.generation import DataGenerator, GaussianLocationModel

# Set up data generation
# 90% null hypotheses, 10% alternatives with effect size 3
dgp = GaussianLocationModel(alt_mean=3.0, alt_std=1.0, one_sided=True)
generator = DataGenerator(n=200, pi0=0.9, dgp=dgp)

# Initialize ADDIS
addis = Addis(alpha=0.1, wealth=0.05, lambda_=0.25, tau=0.5)

# Simulate sequential testing
discoveries = []
true_discoveries = []
false_discoveries = []

print("Sequential Online Testing with ADDIS")
print("=" * 50)

for i in range(50):  # Test first 50 hypotheses
    p_value, is_alternative = generator.sample_one()
    decision = addis.test_one(p_value)

    if decision:  # We made a discovery
        discoveries.append(i + 1)
        if is_alternative:
            true_discoveries.append(i + 1)
            result = "✓ TRUE discovery"
        else:
            false_discoveries.append(i + 1)
            result = "✗ FALSE discovery"

        print(f"Test {i+1:2d}: p={p_value:.4f} → DISCOVERY {result}")

# Calculate performance metrics
n_discoveries = len(discoveries)
n_false = len(false_discoveries)
empirical_fdr = n_false / max(n_discoveries, 1)

print(f"\nResults after 50 tests:")
print(f"Total discoveries: {n_discoveries}")
print(f"False discoveries: {n_false}")
print(f"Empirical FDR: {empirical_fdr:.3f}")
print(f"Target FDR: {addis.alpha0}")

Comparing Different Methods¶

Let's compare several online FDR methods on the same data:

from online_fdr.investing.addis.addis import Addis
from online_fdr.investing.lord.three import LordThree
from online_fdr.investing.saffron.saffron import Saffron
from online_fdr.utils.generation import DataGenerator, GaussianLocationModel

# Setup
dgp = GaussianLocationModel(alt_mean=2.5, alt_std=1.0, one_sided=True)
alpha = 0.1

# Initialize different methods
methods = {
    'ADDIS': Addis(alpha=alpha, wealth=0.05, lambda_=0.25, tau=0.5),
    'LORD3': LordThree(alpha=alpha, wealth=0.05, reward=0.05),
    'SAFFRON': Saffron(alpha=alpha, wealth=0.05, lambda_=0.5)
}

# Test all methods on the same sequence
generator = DataGenerator(n=100, pi0=0.8, dgp=dgp)
p_values = [generator.sample_one()[0] for _ in range(30)]

print("Method Comparison")
print("=" * 60)

for name, method in methods.items():
    discoveries = 0
    for i, p_val in enumerate(p_values):
        if method.test_one(p_val):
            discoveries += 1

    print(f"{name:>8}: {discoveries:2d} discoveries")

Batch vs Online Comparison¶

See the difference between batch and online approaches:

from online_fdr.batching.bh import BatchBH
from online_fdr.investing.lord.three import LordThree
from online_fdr.utils.generation import DataGenerator, GaussianLocationModel

# Generate a fixed set of p-values
dgp = GaussianLocationModel(alt_mean=2.0, alt_std=1.0, one_sided=True)  
generator = DataGenerator(n=100, pi0=0.85, dgp=dgp)
p_values = [generator.sample_one()[0] for _ in range(20)]

print("Batch vs Online Comparison")
print("=" * 40)
print("P-values:", [f"{p:.3f}" for p in p_values[:10]], "...")

# Batch method: sees all p-values at once
batch_bh = BatchBH(alpha=0.1)
batch_results = batch_bh.test_batch(p_values)
batch_discoveries = sum(batch_results)

print(f"\nBatch BH discoveries: {batch_discoveries}")
print("Rejected p-values:", [f"{p:.3f}" for p, r in zip(p_values, batch_results) if r])

# Online method: sees p-values one by one
lord3 = LordThree(alpha=0.1, wealth=0.05, reward=0.05)
online_discoveries = 0
online_rejected = []

for p_val in p_values:
    if lord3.test_one(p_val):
        online_discoveries += 1
        online_rejected.append(p_val)

print(f"\nOnline LORD3 discoveries: {online_discoveries}")
print("Rejected p-values:", [f"{p:.3f}" for p in online_rejected])

Working with Real Data¶

Here's how to use your own p-values:

from online_fdr.investing.addis.addis import Addis

# Your p-values from real experiments
my_p_values = [0.032, 0.001, 0.145, 0.003, 0.234, 0.089, 0.012]

# Initialize method
addis = Addis(alpha=0.05, wealth=0.025, lambda_=0.25, tau=0.5)

# Test sequentially
significant_tests = []
for i, p_val in enumerate(my_p_values):
    if addis.test_one(p_val):
        significant_tests.append((i, p_val))
        print(f"Significant: Test {i} with p-value {p_val}")

print(f"\nFound {len(significant_tests)} significant results")

Key Takeaways¶

What You've Learned

Online testing makes decisions immediately without waiting for future p-values
All methods use the same test_one(p_value) interface
Different methods have different power characteristics
ADDIS is a good default choice for most applications
Method parameters affect the power/conservatism trade-off

Next Steps¶

Now that you understand the basics:

Learn More About MethodsExplore Advanced ExamplesUnderstand the TheoryBrowse API Documentation

Read about Sequential Testing Methods and Batch Testing Methods

Check out Real-world Examples for domain-specific applications

Dive into Mathematical Theory behind the algorithms

Explore the full API Reference for all available methods

Common Patterns¶

Here are some common usage patterns to get you started:

Pattern 1: Simple Online Testing¶

from online_fdr.investing.addis.addis import Addis

method = Addis(alpha=0.05, wealth=0.025, lambda_=0.25, tau=0.5)
for p_value in your_p_values:
    if method.test_one(p_value):
        print(f"Significant result: p = {p_value}")

Pattern 2: Performance Evaluation¶

from online_fdr.utils.evaluation import calculate_sfdr, calculate_power

true_positives = false_positives = false_negatives = 0
for p_value, is_true_alternative in your_labeled_data:
    decision = method.test_one(p_value)
    # Update counters based on decision and true_alternative

sfdr = calculate_sfdr(true_positives, false_positives)
power = calculate_power(true_positives, false_negatives)

Pattern 3: Method Comparison¶

methods = [Addis(...), LordThree(...), Saffron(...)]
for method in methods:
    discoveries = sum(method.test_one(p) for p in p_values)
    print(f"{method.__class__.__name__}: {discoveries} discoveries")

Ready to explore more advanced features? Check out our detailed User Guide!