Utility Functions¶
The utils module provides essential supporting functions for online FDR control algorithms, including static procedures, gamma sequences, validation functions, and data generation utilities.
Static Procedures¶
Core implementations of classical multiple testing procedures:
online_fdr.utils.static.bh(p_vals, alpha)
¶
Static Benjamini-Hochberg (BH) procedure for FDR control.
The Benjamini-Hochberg procedure is the fundamental method for controlling the False Discovery Rate in multiple testing. It provides FDR control at level α under independence of p-values or positive dependence (PRDS).
The algorithm sorts p-values and finds the largest index i such that P(i) ≤ (i/n) × α, where P(i) is the i-th smallest p-value and n is the total number of tests.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
p_vals | [float] | List of p-values to test. Should be in [0,1]. | required |
alpha | float | Target FDR level. Must be in (0,1). | required |
Returns:
Type | Description |
---|---|
(int, float) | Tuple of (number_of_rejections, rejection_threshold). |
(int, float) |
|
(int, float) |
|
Examples:
>>> # Basic usage
>>> p_values = [0.01, 0.02, 0.5, 0.8]
>>> num_rej, threshold = bh(p_values, alpha=0.05)
>>> print(f"Rejected {num_rej} hypotheses at threshold {threshold:.4f}")
>>> # Check which p-values are rejected
>>> rejected = [p <= threshold for p in p_values]
>>> print(f"Rejected p-values: {[p for p, r in zip(p_values, rejected) if r]}")
References
Benjamini, Y., and Y. Hochberg (1995). "Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing." Journal of the Royal Statistical Society: Series B, 57(1):289-300.
Source code in online_fdr/utils/static.py
online_fdr.utils.static.storey_bh(p_vals, alpha, lambda_)
¶
Static Storey-Benjamini-Hochberg procedure with π₀ estimation.
The Storey-BH procedure extends the classical Benjamini-Hochberg method by estimating π₀ (the proportion of true null hypotheses) and incorporating this estimate into the rejection threshold. This typically provides higher power when π₀ < 1, which is common in genomic and other high-dimensional applications.
The algorithm estimates π₀ using Storey's method: π̂₀ = min(1, (1 + #{p > λ}) / (n × (1 - λ)))
Then applies the BH procedure with adjusted significance level α/π̂₀.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
p_vals | [float] | List of p-values to test. Should be in [0,1]. | required |
alpha | float | Target FDR level. Must be in (0,1). | required |
lambda_ | float | Threshold parameter for π₀ estimation. Must be in [0,1). Common choice: λ = 0.5. Higher values give more conservative π₀ estimates. | required |
Returns:
Type | Description |
---|---|
(int, float) | Tuple of (number_of_rejections, rejection_threshold). |
(int, float) |
|
(int, float) |
|
Raises:
Type | Description |
---|---|
ValueError | If lambda_ >= 1.0. |
Examples:
>>> # Standard usage with λ = 0.5
>>> p_values = [0.001, 0.02, 0.5, 0.8, 0.9]
>>> num_rej, threshold = storey_bh(p_values, alpha=0.05, lambda_=0.5)
>>> print(f"Storey-BH rejected {num_rej} at threshold {threshold:.4f}")
>>> # Compare with standard BH
>>> num_rej_bh, threshold_bh = bh(p_values, alpha=0.05)
>>> print(f"Standard BH rejected {num_rej_bh} at threshold {threshold_bh:.4f}")
>>> # Storey-BH typically has higher power when π₀ < 1
References
Storey, J. D. (2002). "A direct approach to false discovery rates." Journal of the Royal Statistical Society: Series B, 64(3):479-498.
Storey, J. D. (2003). "The positive false discovery rate: a Bayesian interpretation and the q-value." Annals of Statistics, 31(6):2013-2035.
Source code in online_fdr/utils/static.py
online_fdr.utils.static.by(p_vals, alpha)
¶
Static Benjamini-Yekutieli (BY) procedure for FDR control under dependence.
The Benjamini-Yekutieli procedure extends the Benjamini-Hochberg method to provide FDR control even under arbitrary dependence among p-values. It uses harmonic weights to maintain FDR control at the cost of reduced power compared to the standard BH procedure.
The algorithm finds the largest index i such that: P(i) ≤ (i/n) × α / H_n
where H_n = Σ(1/j) for j=1 to n is the n-th harmonic number.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
p_vals | [float] | List of p-values to test. Should be in [0,1]. | required |
alpha | float | Target FDR level. Must be in (0,1). | required |
Returns:
Type | Description |
---|---|
(int, float) | Tuple of (number_of_rejections, rejection_threshold). |
(int, float) |
|
(int, float) |
|
Examples:
>>> # Usage under arbitrary dependence
>>> p_values = [0.01, 0.02, 0.5, 0.8] # May be dependent
>>> num_rej, threshold = by(p_values, alpha=0.05)
>>> print(f"BY rejected {num_rej} at threshold {threshold:.4f}")
>>> # Compare with BH (more conservative under dependence)
>>> num_rej_bh, threshold_bh = bh(p_values, alpha=0.05)
>>> print(f"BH rejected {num_rej_bh} (may not control FDR under dependence)")
Notes
The BY procedure is recommended when: - P-values may be arbitrarily dependent - Conservative FDR control is required - The exact dependence structure is unknown
It maintains valid FDR control under any dependence structure but typically has lower power than BH under independence.
References
Benjamini, Y., and D. Yekutieli (2001). "The control of the false discovery rate in multiple testing under dependency." Annals of Statistics, 29(4):1165-1188.
Source code in online_fdr/utils/static.py
Gamma Sequences¶
Specialized sequences for alpha allocation in online procedures:
online_fdr.utils.sequence.DefaultSaffronGammaSequence
¶
Bases: AbstractGammaSequence
Default gamma sequence for SAFFRON (Spectral Approach for FDR with Online N-tests).
SAFFRON uses a power-law gamma sequence that provides excellent performance for adaptive online FDR control. The sequence can operate in two modes: with or without a normalization constant c.
The sequence is defined as: - With c: γⱼ = c / j^(gamma_exp)
- Without c: γⱼ = j^(gamma_exp)
Parameters:
Name | Type | Description | Default |
---|---|---|---|
gamma_exp | Exponent controlling the decay rate. Recommended: 1.6. | required | |
c | Normalization constant. If None, uses pure power law. | required |
Examples:
>>> # Standard SAFFRON sequence with recommended parameters
>>> gamma_seq = DefaultSaffronGammaSequence(gamma_exp=1.6, c=0.4374901658)
>>> gamma_values = [gamma_seq.calc_gamma(j) for j in range(1, 6)]
>>> print(f"SAFFRON γ values: {gamma_values}")
References
Ramdas, A., T. Zrnic, M. J. Wainwright, and M. I. Jordan (2018). "SAFFRON: an adaptive algorithm for online control of the false discovery rate." Proceedings of the 35th International Conference on Machine Learning (ICML), PMLR, 80:4286-4294.
Source code in online_fdr/utils/sequence.py
online_fdr.utils.sequence.DefaultLordGammaSequence
¶
Bases: AbstractGammaSequence
Default gamma sequence for LORD (Levels based On Recent Discovery) procedures.
This gamma sequence is optimized for LORD algorithms and provides the balance between early power and long-term FDR control. Unlike LOND sequences, LORD sequences don't require the alpha parameter since they work with alpha wealth dynamics.
The sequence is defined as: γⱼ = c × log(max(j,2)) / (j × exp(√log(j)))
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c | float | Normalization constant controlling sequence scale. | required |
Examples:
>>> # Standard LORD gamma sequence
>>> gamma_seq = DefaultLordGammaSequence(c=0.07720838)
>>> gamma_values = [gamma_seq.calc_gamma(j) for j in range(1, 6)]
>>> print(f"First 5 gamma values: {gamma_values}")
References
Javanmard, A., and A. Montanari (2018). "Online rules for control of false discovery rate and false discovery exceedance." Annals of Statistics, 46(2):526-554.
Source code in online_fdr/utils/sequence.py
Functions¶
calc_gamma(j, **kwargs)
¶
Calculate the gamma value for position j in the sequence.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
j | int | Position index (1-based). | required |
**kwargs | Additional arguments (unused for LORD sequences). | {} |
Returns:
Type | Description |
---|---|
The gamma value for position j. |
Source code in online_fdr/utils/sequence.py
online_fdr.utils.sequence.DefaultLondGammaSequence
¶
Bases: AbstractGammaSequence
Default gamma sequence for LOND (Levels based On Number of Discoveries).
This gamma sequence is specifically designed for LOND procedures and provides the recommended decay rate for optimal power-stability trade-offs. The sequence incorporates both logarithmic and exponential decay components to balance early power with long-term FDR control.
The sequence is defined as: γⱼ = c × α × log(max(j,2)) / (j × exp(√log(j)))
Parameters:
Name | Type | Description | Default |
---|---|---|---|
c | float | Normalization constant controlling the overall scale of the sequence. | required |
Attributes:
Name | Type | Description |
---|---|---|
c | float | None | The normalization constant. |
Examples:
>>> # Standard LOND gamma sequence
>>> gamma_seq = DefaultLondGammaSequence(c=0.07720838)
>>> gamma_1 = gamma_seq.calc_gamma(1, alpha=0.05)
>>> gamma_10 = gamma_seq.calc_gamma(10, alpha=0.05)
>>> print(f"γ₁ = {gamma_1:.6f}, γ₁₀ = {gamma_10:.6f}")
References
Javanmard, A., and A. Montanari (2018). "Online rules for control of false discovery rate and false discovery exceedance." Annals of Statistics, 46(2):526-554.
Source code in online_fdr/utils/sequence.py
Functions¶
calc_gamma(j, **kwargs)
¶
Calculate the gamma value for the j-th test.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
j | int | Test index (1-based). | required |
**kwargs | Must contain 'alpha' - the target FDR level. | {} |
Returns:
Type | Description |
---|---|
The gamma value for test j. |
Examples:
Source code in online_fdr/utils/sequence.py
Validation Functions¶
Input validation and error checking:
online_fdr.utils.validity.check_p_val(p_val)
¶
Validate that a p-value is in the valid range [0, 1].
Parameters:
Name | Type | Description | Default |
---|---|---|---|
p_val | float | The p-value to validate. | required |
Raises:
Type | Description |
---|---|
ValueError | If p_val is not in [0, 1]. |
Examples:
>>> check_p_val(0.05) # Valid - no exception
>>> check_p_val(1.5) # Raises ValueError
Traceback (most recent call last):
ValueError: Given p-value must be between [0,1].
Source code in online_fdr/utils/validity.py
online_fdr.utils.validity.check_alpha(p_val)
¶
Validate that an alpha value is in the valid range (0, 1).
Parameters:
Name | Type | Description | Default |
---|---|---|---|
p_val | float | The alpha/significance level to validate. | required |
Raises:
Type | Description |
---|---|
ValueError | If p_val is not in (0, 1). |
Examples:
>>> check_alpha(0.05) # Valid - no exception
>>> check_alpha(0.0) # Raises ValueError (not > 0)
>>> check_alpha(1.0) # Raises ValueError (not < 1)
Source code in online_fdr/utils/validity.py
online_fdr.utils.validity.check_initial_wealth(initial_wealth, alpha)
¶
Validate that initial wealth is properly bounded relative to alpha.
In alpha-wealth procedures (LORD, SAFFRON, etc.), the initial wealth must be positive but less than the target FDR level to ensure proper wealth dynamics and FDR control.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
initial_wealth | float | The initial alpha wealth. | required |
alpha | float | The target FDR level. | required |
Raises:
Type | Description |
---|---|
ValueError | If initial_wealth is not in (0, alpha). |
Examples:
>>> check_initial_wealth(0.025, 0.05) # Valid - no exception
>>> check_initial_wealth(0.1, 0.05) # Raises ValueError (> alpha)
>>> check_initial_wealth(0.0, 0.05) # Raises ValueError (not > 0)
Source code in online_fdr/utils/validity.py
online_fdr.utils.validity.check_candidate_threshold(lambda_)
¶
Validate that a candidate threshold λ is in the valid range (0, 1).
The candidate threshold λ is used in SAFFRON and ADDIS algorithms to determine which p-values are considered "candidates" for rejection. It must be strictly between 0 and 1.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
lambda_ | float | The candidate threshold to validate. | required |
Raises:
Type | Description |
---|---|
ValueError | If lambda_ is not in (0, 1). |
Examples:
>>> check_candidate_threshold(0.5) # Valid - no exception
>>> check_candidate_threshold(0.0) # Raises ValueError
>>> check_candidate_threshold(1.0) # Raises ValueError
Source code in online_fdr/utils/validity.py
online_fdr.utils.validity.check_wealth(wealth)
¶
Validate that alpha wealth is positive (not depleted).
In alpha-wealth procedures, when wealth reaches zero or below, no more tests can be performed while maintaining FDR control. This function checks for wealth depletion.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
wealth | float | Current alpha wealth level. | required |
Raises:
Type | Description |
---|---|
ValueError | If wealth <= 0. |
Examples:
>>> check_wealth(0.01) # Valid - no exception
>>> check_wealth(0.0) # Raises ValueError (wealth depleted)
>>> check_wealth(-0.1) # Raises ValueError (negative wealth)
Source code in online_fdr/utils/validity.py
online_fdr.utils.validity.check_decay_factor(decay_factor)
¶
Validate that a decay factor is in the valid range (0, 1).
Decay factors are used in various algorithms to control the rate at which parameters decrease over time. They must be strictly between 0 and 1 for proper convergence behavior.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
decay_factor | float | The decay factor to validate. | required |
Raises:
Type | Description |
---|---|
ValueError | If decay_factor is not in (0, 1). |
Examples:
>>> check_decay_factor(0.9) # Valid - no exception
>>> check_decay_factor(0.0) # Raises ValueError (not > 0)
>>> check_decay_factor(1.0) # Raises ValueError (not < 1)
>>> check_decay_factor(1.5) # Raises ValueError (not < 1)
Source code in online_fdr/utils/validity.py
Overview¶
Static Procedures¶
The static module contains implementations of classical FDR control procedures that form the building blocks for online and batch methods:
- Benjamini-Hochberg (BH): The foundational FDR procedure
- Storey-BH: Enhanced power through π₀ estimation
- Benjamini-Yekutieli (BY): FDR control under arbitrary dependence
These functions are used internally by batch testing methods and can also be used directly for offline multiple testing.
Gamma Sequences¶
Gamma sequences determine how alpha budget is allocated over time in online procedures. Different algorithms require different sequence properties:
- SAFFRON sequences: Power-law decay optimized for adaptive procedures
- LORD sequences: Logarithmic decay for wealth-based procedures
- LOND sequences: Complex decay balancing early/late power
Validation Framework¶
The validation module ensures input correctness and provides informative error messages:
- Range validation: P-values in [0,1], alpha in (0,1)
- Relationship validation: Initial wealth < alpha
- State validation: Wealth depletion detection
Usage Examples¶
Static Procedures¶
from online_fdr.utils.static import bh, storey_bh, by
# Sample p-values
p_values = [0.001, 0.02, 0.15, 0.8, 0.9]
alpha = 0.05
# Standard Benjamini-Hochberg
num_rej_bh, threshold_bh = bh(p_values, alpha)
print(f"BH: {num_rej_bh} rejections at threshold {threshold_bh:.4f}")
# Storey-BH with π₀ estimation
num_rej_storey, threshold_storey = storey_bh(p_values, alpha, lambda_=0.5)
print(f"Storey-BH: {num_rej_storey} rejections at threshold {threshold_storey:.4f}")
# Benjamini-Yekutieli (under dependence)
num_rej_by, threshold_by = by(p_values, alpha)
print(f"BY: {num_rej_by} rejections at threshold {threshold_by:.4f}")
Gamma Sequences¶
from online_fdr.utils.sequence import DefaultSaffronGammaSequence
# Create SAFFRON gamma sequence
gamma_seq = DefaultSaffronGammaSequence(gamma_exp=1.6, c=0.4374901658)
# Generate first 10 gamma values
gamma_values = [gamma_seq.calc_gamma(j) for j in range(1, 11)]
print("SAFFRON gamma sequence:", gamma_values)
# Show decay behavior
import matplotlib.pyplot as plt
j_values = range(1, 101)
gamma_values = [gamma_seq.calc_gamma(j) for j in j_values]
plt.loglog(j_values, gamma_values)
plt.xlabel("Position j")
plt.ylabel("Gamma value")
plt.title("SAFFRON Gamma Sequence Decay")
Validation Functions¶
from online_fdr.utils.validity import check_p_val, check_alpha, check_initial_wealth
# Validate inputs
try:
check_p_val(0.05) # Valid p-value
check_alpha(0.05) # Valid alpha
check_initial_wealth(0.025, 0.05) # Valid wealth < alpha
print("All validations passed")
except ValueError as e:
print(f"Validation error: {e}")
# Example of validation failure
try:
check_p_val(1.5) # Invalid p-value > 1
except ValueError as e:
print(f"P-value validation failed: {e}")
Algorithm Support¶
How Utilities Support Online Algorithms¶
- Static procedures provide the core multiple testing logic
- Gamma sequences control alpha allocation strategies
- Validation functions ensure algorithmic correctness
- Data utilities support testing and evaluation
Customization and Extension¶
The utility framework supports customization:
- Custom gamma sequences: Inherit from
AbstractGammaSequence
- Custom spending functions: Implement
AbstractSpendFunc
- Custom static procedures: Follow the standard interface
- Additional validation: Extend existing validation functions
Best Practices¶
Static Procedure Usage¶
- Choose appropriate procedure based on dependence assumptions
- Handle edge cases (empty input, all p-values = 1)
- Validate inputs before calling procedures
- Understand threshold interpretation for downstream usage
Gamma Sequence Selection¶
- Use algorithm-specific defaults when available
- Consider decay rate for your application timeline
- Validate convergence properties for custom sequences
- Test empirical performance versus theoretical optimal
Validation Integration¶
- Validate early and often in computational pipelines
- Provide informative error messages to users
- Handle validation gracefully in production code
- Document validation requirements for API users
Performance Considerations¶
Computational Complexity¶
- Static procedures: O(n log n) due to sorting
- Gamma sequences: O(1) per value computation
- Validation: O(1) per check
Memory Usage¶
- Static procedures: O(n) for input storage
- Gamma sequences: O(1) stateless computation
- Validation: Minimal memory footprint
Optimization Tips¶
- Pre-sort p-values when calling static procedures repeatedly
- Cache gamma values for frequently accessed positions
- Batch validation when checking many values
- Use numpy arrays for vectorized operations when possible
References¶
Static Procedures: - Benjamini, Y., and Y. Hochberg (1995). "Controlling the False Discovery Rate" - Storey, J. D. (2002). "A direct approach to false discovery rates"
- Benjamini, Y., and D. Yekutieli (2001). "Control under dependency"
Gamma Sequences: - Ramdas, A., et al. (2018). "SAFFRON: adaptive algorithm for online FDR control" - Javanmard, A., and A. Montanari (2018). "Online rules for FDR control"
Validation Theory: - Wassmer, G., and W. Brannath (2016). "Group Sequential and Confirmatory Adaptive Designs"