Batch Testing Methods¶
Batch testing methods process all p-values simultaneously, applying multiple testing correction after collecting the complete set of hypotheses. These methods are optimal when all data is available upfront and provide the highest statistical power for a given FDR level.
Overview¶
Batch methods differ from sequential methods in their information advantage:
Batch Methods
- See all p-values before making decisions
- Can sort and optimize rejection thresholds
- Achieve optimal power for given FDR level
- Best for post-hoc analysis
Sequential Methods
- Must decide immediately on each p-value
- Cannot use future information
- Slightly lower power but work in real-time
- Best for streaming applications
Mathematical Foundation¶
The False Discovery Rate (FDR) is defined as:
where: - \(V\) = number of false discoveries (Type I errors) - \(R\) = total number of discoveries - \(R \vee 1 = \max(R, 1)\) to avoid division by zero
Batch methods aim to control \(\text{FDR} \leq \alpha\) while maximizing power.
Available Methods¶
Benjamini-Hochberg (BH) Procedure¶
The gold standard for FDR control under independence
from online_fdr.batching.bh import BatchBH
# Create Benjamini-Hochberg instance
bh = BatchBH(alpha=0.05)
# Test a batch of p-values
p_values = [0.001, 0.02, 0.15, 0.03, 0.8, 0.005, 0.4]
decisions = bh.test_batch(p_values)
# Show results
for i, (p_val, decision) in enumerate(zip(p_values, decisions)):
print(f"Test {i+1}: p={p_val:.3f} → {'REJECT' if decision else 'ACCEPT'}")
print(f"Total discoveries: {sum(decisions)}")
Mathematical Algorithm¶
The BH procedure works by:
- Sort p-values: \(p_{(1)} \leq p_{(2)} \leq \ldots \leq p_{(m)}\)
- Find threshold: \(k = \max\{i : p_{(i)} \leq \frac{i \alpha}{m}\}\)
- Reject hypotheses: \(H_{(1)}, \ldots, H_{(k)}\)
# Demonstrate the BH algorithm step-by-step
import numpy as np
def demonstrate_bh_algorithm(p_values, alpha=0.05):
"""Show step-by-step Benjamini-Hochberg procedure."""
m = len(p_values)
# Step 1: Sort p-values and keep track of original indices
sorted_indices = np.argsort(p_values)
sorted_p_values = np.array(p_values)[sorted_indices]
print("Step 1: Sorted p-values")
for i, (idx, p_val) in enumerate(zip(sorted_indices, sorted_p_values)):
print(f" Rank {i+1}: p_{{{idx+1}}} = {p_val:.4f}")
# Step 2: Calculate BH thresholds
print(f"\nStep 2: BH thresholds (α={alpha})")
bh_thresholds = [(i+1) * alpha / m for i in range(m)]
significant_up_to = -1
for i, (p_val, threshold) in enumerate(zip(sorted_p_values, bh_thresholds)):
is_significant = p_val <= threshold
print(f" Rank {i+1}: p={p_val:.4f} vs {threshold:.4f} → {'✓' if is_significant else '✗'}")
if is_significant:
significant_up_to = i
# Step 3: Determine rejections
print(f"\nStep 3: Rejections")
if significant_up_to >= 0:
print(f"Reject first {significant_up_to + 1} sorted hypotheses")
rejected_indices = sorted_indices[:significant_up_to + 1]
print(f"Original indices rejected: {sorted(rejected_indices + 1)}")
else:
print("No rejections")
rejected_indices = []
return rejected_indices
# Example usage
p_vals = [0.001, 0.02, 0.15, 0.03, 0.8, 0.005]
demonstrate_bh_algorithm(p_vals, alpha=0.1)
Storey's Adaptive Benjamini-Hochberg¶
Adaptive FDR control when null proportion is unknown
The Storey procedure estimates the proportion of true null hypotheses (\(\hat{\pi}_0\)) and uses this for more powerful FDR control.
from online_fdr.batching.storey_bh import BatchStoreyBH
# Create Storey-BH instance
storey_bh = BatchStoreyBH(
alpha=0.05, # Target FDR level
lambda_=0.5 # Threshold for π₀ estimation
)
# Compare with regular BH
regular_bh = BatchBH(alpha=0.05)
# Test both methods
p_values = [0.001, 0.02, 0.15, 0.03, 0.8, 0.005, 0.4, 0.9, 0.7, 0.6]
storey_decisions = storey_bh.test_batch(p_values)
regular_decisions = regular_bh.test_batch(p_values)
print("Storey-BH vs Regular BH:")
print(f"Storey-BH discoveries: {sum(storey_decisions)}")
print(f"Regular BH discoveries: {sum(regular_decisions)}")
print(f"Storey advantage: {sum(storey_decisions) - sum(regular_decisions)} more discoveries")
Mathematical Details¶
Storey's method estimates: \(\(\hat{\pi}_0(\lambda) = \frac{\#\{i : p_i > \lambda\}}{(1-\lambda) \cdot m}\)\)
Then uses adjusted BH threshold: \(\(\text{Reject } H_i \text{ if } p_i \leq \frac{i \alpha}{m \hat{\pi}_0}\)\)
Benjamini-Yekutieli (BY) Procedure¶
FDR control under arbitrary dependence
from online_fdr.batching.by import BatchBY
# Create BY instance
by = BatchBY(alpha=0.05)
# BY is more conservative than BH for dependent tests
p_values = [0.01, 0.03, 0.05, 0.08, 0.12]
# Compare BH vs BY
bh = BatchBH(alpha=0.05)
bh_decisions = bh.test_batch(p_values)
by_decisions = by.test_batch(p_values)
print("BH vs BY comparison (dependent case):")
print(f"BH discoveries: {sum(bh_decisions)}")
print(f"BY discoveries: {sum(by_decisions)}")
print("BY is more conservative for dependent tests")
Mathematical Formula¶
BY uses the harmonic series correction: \(\(c(m) = \sum_{i=1}^m \frac{1}{i} \approx \log(m) + \gamma\)\)
Reject \(H_i\) if \(p_i \leq \frac{i \alpha}{m \cdot c(m)}\)
PRDS: Positive Regression Dependency on Subsets¶
Optimized for positively dependent test statistics
from online_fdr.batching.prds import BatchPRDS
# Create PRDS instance
prds = BatchPRDS(alpha=0.05)
# PRDS works well when test statistics are positively correlated
# (e.g., overlapping genomic regions, related biomarkers)
p_values = [0.002, 0.01, 0.04, 0.06, 0.3, 0.7]
prds_decisions = prds.test_batch(p_values)
bh_decisions = BatchBH(alpha=0.05).test_batch(p_values)
print("PRDS vs BH (positive dependence case):")
print(f"PRDS discoveries: {sum(prds_decisions)}")
print(f"BH discoveries: {sum(bh_decisions)}")
Method Comparison¶
When to Use Each Method¶
Method | Best For | Assumptions | Power | Conservatism |
---|---|---|---|---|
BH | Independent tests | Independence | High | Moderate |
Storey-BH | Unknown π₀ | Independence | Highest | Low |
BY | Arbitrary dependence | Any dependence | Low | High |
PRDS | Positive dependence | PRDS condition | High | Moderate |
Performance Simulation¶
import numpy as np
from online_fdr.batching.bh import BatchBH
from online_fdr.batching.storey_bh import BatchStoreyBH
from online_fdr.batching.by import BatchBY
def compare_batch_methods(n_tests=1000, n_alternatives=100, alpha=0.05):
"""Compare batch methods on simulated data."""
# Simulate p-values
np.random.seed(42)
# Null p-values (uniform)
null_p_values = np.random.uniform(0, 1, n_tests - n_alternatives)
# Alternative p-values (beta distribution, skewed toward 0)
alt_p_values = np.random.beta(0.5, 10, n_alternatives)
# Combine and shuffle
all_p_values = np.concatenate([null_p_values, alt_p_values])
true_alternatives = np.concatenate([
np.zeros(n_tests - n_alternatives, dtype=bool),
np.ones(n_alternatives, dtype=bool)
])
# Shuffle to mix nulls and alternatives
shuffle_idx = np.random.permutation(n_tests)
all_p_values = all_p_values[shuffle_idx]
true_alternatives = true_alternatives[shuffle_idx]
# Test all methods
methods = {
'BH': BatchBH(alpha=alpha),
'Storey-BH': BatchStoreyBH(alpha=alpha, lambda_=0.5),
'BY': BatchBY(alpha=alpha)
}
results = {}
for name, method in methods.items():
decisions = method.test_batch(all_p_values.tolist())
# Calculate metrics
true_positives = np.sum(decisions & true_alternatives)
false_positives = np.sum(decisions & ~true_alternatives)
total_discoveries = true_positives + false_positives
empirical_fdr = false_positives / max(total_discoveries, 1)
power = true_positives / n_alternatives
results[name] = {
'discoveries': total_discoveries,
'true_positives': true_positives,
'false_positives': false_positives,
'empirical_fdr': empirical_fdr,
'power': power
}
return results
# Run comparison
comparison_results = compare_batch_methods()
print("Batch Method Comparison Results:")
print("=" * 50)
for method, metrics in comparison_results.items():
print(f"{method:>10}: {metrics['discoveries']:3d} discoveries, "
f"FDR={metrics['empirical_fdr']:.3f}, "
f"Power={metrics['power']:.3f}")
Advanced Usage¶
Custom Threshold Calculations¶
def custom_bh_analysis(p_values, alpha=0.05):
"""Perform detailed BH analysis with visualization."""
import matplotlib.pyplot as plt
import numpy as np
m = len(p_values)
sorted_p = np.sort(p_values)
# Calculate BH line
ranks = np.arange(1, m + 1)
bh_line = ranks * alpha / m
# Find rejections
rejections = sorted_p <= bh_line
if np.any(rejections):
last_rejection = np.max(np.where(rejections)[0])
n_rejections = last_rejection + 1
else:
n_rejections = 0
# Create visualization
plt.figure(figsize=(10, 6))
plt.plot(ranks, sorted_p, 'bo-', label='Sorted p-values', markersize=4)
plt.plot(ranks, bh_line, 'r--', label=f'BH line (slope = α/m = {alpha/m:.4f})')
if n_rejections > 0:
plt.fill_between(ranks[:n_rejections], 0, sorted_p[:n_rejections],
alpha=0.3, color='green', label=f'Rejections ({n_rejections})')
plt.xlabel('Rank (i)')
plt.ylabel('p-value')
plt.title('Benjamini-Hochberg Procedure Visualization')
plt.legend()
plt.grid(True, alpha=0.3)
plt.show()
return n_rejections
# Example usage (requires matplotlib)
# p_vals = [0.001, 0.01, 0.03, 0.05, 0.08, 0.15, 0.3, 0.6]
# rejections = custom_bh_analysis(p_vals, alpha=0.1)
Batch Processing Pipeline¶
def batch_testing_pipeline(data, test_function, method='BH', alpha=0.05):
"""Complete pipeline for batch hypothesis testing."""
# Step 1: Calculate p-values
print("Step 1: Calculating p-values...")
p_values = []
for i, sample in enumerate(data):
p_val = test_function(sample)
p_values.append(p_val)
if i < 5: # Show first few
print(f" Sample {i+1}: p = {p_val:.4f}")
print(f" ... calculated {len(p_values)} p-values total")
# Step 2: Apply FDR correction
print(f"\nStep 2: Applying {method} correction (α={alpha})...")
if method == 'BH':
correction_method = BatchBH(alpha=alpha)
elif method == 'Storey-BH':
correction_method = BatchStoreyBH(alpha=alpha, lambda_=0.5)
elif method == 'BY':
correction_method = BatchBY(alpha=alpha)
else:
raise ValueError(f"Unknown method: {method}")
decisions = correction_method.test_batch(p_values)
# Step 3: Summary statistics
total_tests = len(p_values)
discoveries = sum(decisions)
discovery_rate = discoveries / total_tests
print(f" Total tests: {total_tests}")
print(f" Discoveries: {discoveries}")
print(f" Discovery rate: {discovery_rate:.3f}")
# Step 4: Return results
results = {
'p_values': p_values,
'decisions': decisions,
'method': method,
'alpha': alpha,
'discoveries': discoveries,
'discovery_rate': discovery_rate
}
return results
# Example usage
def dummy_test_function(sample):
"""Dummy test function returning random p-value."""
import random
return random.uniform(0, 1)
# Simulate some data
dummy_data = [f"sample_{i}" for i in range(100)]
# Run pipeline
results = batch_testing_pipeline(
dummy_data,
dummy_test_function,
method='BH',
alpha=0.1
)
Handling Special Cases¶
Empty Results¶
def safe_batch_testing(p_values, alpha=0.05):
"""Safely handle edge cases in batch testing."""
if not p_values:
print("Warning: No p-values provided")
return []
if any(p < 0 or p > 1 for p in p_values):
print("Error: Invalid p-values detected")
return None
# Proceed with testing
bh = BatchBH(alpha=alpha)
decisions = bh.test_batch(p_values)
if not any(decisions):
print(f"No discoveries at α={alpha} level")
return decisions
# Test edge cases
print("Testing edge cases:")
print("Empty list:", safe_batch_testing([]))
print("All large p-values:", safe_batch_testing([0.9, 0.8, 0.7]))
print("All small p-values:", safe_batch_testing([0.001, 0.002, 0.003]))
Multiple Testing Correction with Covariates¶
def stratified_fdr_control(p_values, groups, alpha=0.05):
"""Apply FDR control within groups/strata."""
unique_groups = list(set(groups))
all_decisions = [False] * len(p_values)
print(f"Applying stratified FDR control across {len(unique_groups)} groups:")
for group in unique_groups:
# Get indices for this group
group_indices = [i for i, g in enumerate(groups) if g == group]
group_p_values = [p_values[i] for i in group_indices]
# Apply BH within group
bh = BatchBH(alpha=alpha)
group_decisions = bh.test_batch(group_p_values)
# Map back to full results
for i, decision in zip(group_indices, group_decisions):
all_decisions[i] = decision
print(f" Group {group}: {len(group_p_values)} tests, "
f"{sum(group_decisions)} discoveries")
return all_decisions
# Example usage
p_vals = [0.01, 0.3, 0.02, 0.8, 0.005, 0.4, 0.001, 0.9]
groups = ['A', 'A', 'B', 'B', 'A', 'B', 'A', 'B']
stratified_results = stratified_fdr_control(p_vals, groups, alpha=0.1)
print(f"\nOverall discoveries: {sum(stratified_results)}")
Integration with Statistical Tests¶
Example: Multiple t-tests with FDR Control¶
import numpy as np
from scipy import stats
from online_fdr.batching.bh import BatchBH
def multiple_t_tests_with_fdr(data_groups, control_group, alpha=0.05):
"""Perform multiple t-tests with FDR correction."""
# Calculate p-values for all comparisons
p_values = []
group_names = []
print("Performing t-tests against control:")
for group_name, group_data in data_groups.items():
statistic, p_value = stats.ttest_ind(group_data, control_group)
p_values.append(p_value)
group_names.append(group_name)
print(f" {group_name} vs Control: t={statistic:.3f}, p={p_value:.4f}")
# Apply FDR correction
bh = BatchBH(alpha=alpha)
decisions = bh.test_batch(p_values)
# Report results
print(f"\nFDR-corrected results (α={alpha}):")
significant_groups = []
for group_name, p_val, decision in zip(group_names, p_values, decisions):
status = "SIGNIFICANT" if decision else "NOT SIGNIFICANT"
print(f" {group_name}: {status} (p={p_val:.4f})")
if decision:
significant_groups.append(group_name)
return significant_groups, p_values, decisions
# Example with simulated data
np.random.seed(123)
# Control group
control = np.random.normal(0, 1, 100)
# Treatment groups with different effect sizes
groups = {
'Treatment_A': np.random.normal(0.5, 1, 100), # Small effect
'Treatment_B': np.random.normal(1.0, 1, 100), # Medium effect
'Treatment_C': np.random.normal(0.1, 1, 100), # Very small effect
'Treatment_D': np.random.normal(1.5, 1, 100), # Large effect
}
significant, p_vals, decisions = multiple_t_tests_with_fdr(
groups, control, alpha=0.05
)
print(f"\nSignificant treatments: {significant}")
References¶
-
Benjamini, Y., and Y. Hochberg (1995). "Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing." Journal of the Royal Statistical Society, Series B, 57(1):289-300.
-
Storey, J. D. (2002). "A direct approach to false discovery rates." Journal of the Royal Statistical Society, Series B, 64(3):479-498.
-
Benjamini, Y., and D. Yekutieli (2001). "The control of the false discovery rate in multiple testing under dependency." Annals of Statistics, 29(4):1165-1188.
-
Benjamini, Y., and D. Yekutieli (1999). "Resampling-based false discovery rate controlling multiple test procedures for correlated test statistics." Journal of Statistical Planning and Inference, 82(1-2):171-196.
Next Steps¶
- Compare with sequential methods for online scenarios
- See examples for detailed method comparisons
- Study theory for mathematical foundations