Skip to content

Utility Functions

The utils module provides essential supporting functions for online FDR control algorithms, including static procedures, gamma sequences, validation functions, and data generation utilities.

Static Procedures

Core implementations of classical multiple testing procedures:

online_fdr.utils.static.bh(p_vals, alpha)

Static Benjamini-Hochberg (BH) procedure for FDR control.

The Benjamini-Hochberg procedure is the fundamental method for controlling the False Discovery Rate in multiple testing. It provides FDR control at level α under independence of p-values or positive dependence (PRDS).

The algorithm sorts p-values and finds the largest index i such that P(i) ≤ (i/n) × α, where P(i) is the i-th smallest p-value and n is the total number of tests.

Parameters:

Name Type Description Default
p_vals [float]

List of p-values to test. Should be in [0,1].

required
alpha float

Target FDR level. Must be in (0,1).

required

Returns:

Type Description
(int, float)

Tuple of (number_of_rejections, rejection_threshold).

(int, float)
  • number_of_rejections: Number of hypotheses rejected
(int, float)
  • rejection_threshold: The p-value threshold used for rejection

Examples:

>>> # Basic usage
>>> p_values = [0.01, 0.02, 0.5, 0.8]
>>> num_rej, threshold = bh(p_values, alpha=0.05)
>>> print(f"Rejected {num_rej} hypotheses at threshold {threshold:.4f}")
>>> # Check which p-values are rejected
>>> rejected = [p <= threshold for p in p_values]
>>> print(f"Rejected p-values: {[p for p, r in zip(p_values, rejected) if r]}")
References

Benjamini, Y., and Y. Hochberg (1995). "Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing." Journal of the Royal Statistical Society: Series B, 57(1):289-300.

Source code in online_fdr/utils/static.py
def bh(p_vals: [float], alpha: float) -> (int, float):
    """Static Benjamini-Hochberg (BH) procedure for FDR control.

    The Benjamini-Hochberg procedure is the fundamental method for controlling
    the False Discovery Rate in multiple testing. It provides FDR control at
    level α under independence of p-values or positive dependence (PRDS).

    The algorithm sorts p-values and finds the largest index i such that
    P(i) ≤ (i/n) × α, where P(i) is the i-th smallest p-value and n is the
    total number of tests.

    Args:
        p_vals: List of p-values to test. Should be in [0,1].
        alpha: Target FDR level. Must be in (0,1).

    Returns:
        Tuple of (number_of_rejections, rejection_threshold).
        - number_of_rejections: Number of hypotheses rejected
        - rejection_threshold: The p-value threshold used for rejection

    Examples:
        >>> # Basic usage
        >>> p_values = [0.01, 0.02, 0.5, 0.8]
        >>> num_rej, threshold = bh(p_values, alpha=0.05)
        >>> print(f"Rejected {num_rej} hypotheses at threshold {threshold:.4f}")

        >>> # Check which p-values are rejected
        >>> rejected = [p <= threshold for p in p_values]
        >>> print(f"Rejected p-values: {[p for p, r in zip(p_values, rejected) if r]}")

    References:
        Benjamini, Y., and Y. Hochberg (1995). "Controlling the False Discovery Rate:
        A Practical and Powerful Approach to Multiple Testing." Journal of the Royal
        Statistical Society: Series B, 57(1):289-300.
    """
    n = len(p_vals)
    sorted_p_vals = sorted(p_vals)

    def condition(i):
        return sorted_p_vals[i] <= alpha * (i + 1) / n

    left, right = 0, n
    while left < right:
        mid = (left + right) // 2
        if condition(mid):
            left = mid + 1
        else:
            right = mid

    return left, alpha * left / n if left else 0

online_fdr.utils.static.storey_bh(p_vals, alpha, lambda_)

Static Storey-Benjamini-Hochberg procedure with π₀ estimation.

The Storey-BH procedure extends the classical Benjamini-Hochberg method by estimating π₀ (the proportion of true null hypotheses) and incorporating this estimate into the rejection threshold. This typically provides higher power when π₀ < 1, which is common in genomic and other high-dimensional applications.

The algorithm estimates π₀ using Storey's method: π̂₀ = min(1, (1 + #{p > λ}) / (n × (1 - λ)))

Then applies the BH procedure with adjusted significance level α/π̂₀.

Parameters:

Name Type Description Default
p_vals [float]

List of p-values to test. Should be in [0,1].

required
alpha float

Target FDR level. Must be in (0,1).

required
lambda_ float

Threshold parameter for π₀ estimation. Must be in [0,1). Common choice: λ = 0.5. Higher values give more conservative π₀ estimates.

required

Returns:

Type Description
(int, float)

Tuple of (number_of_rejections, rejection_threshold).

(int, float)
  • number_of_rejections: Number of hypotheses rejected
(int, float)
  • rejection_threshold: The p-value threshold used for rejection

Raises:

Type Description
ValueError

If lambda_ >= 1.0.

Examples:

>>> # Standard usage with λ = 0.5
>>> p_values = [0.001, 0.02, 0.5, 0.8, 0.9]
>>> num_rej, threshold = storey_bh(p_values, alpha=0.05, lambda_=0.5)
>>> print(f"Storey-BH rejected {num_rej} at threshold {threshold:.4f}")
>>> # Compare with standard BH
>>> num_rej_bh, threshold_bh = bh(p_values, alpha=0.05)
>>> print(f"Standard BH rejected {num_rej_bh} at threshold {threshold_bh:.4f}")
>>> # Storey-BH typically has higher power when π₀ < 1
References

Storey, J. D. (2002). "A direct approach to false discovery rates." Journal of the Royal Statistical Society: Series B, 64(3):479-498.

Storey, J. D. (2003). "The positive false discovery rate: a Bayesian interpretation and the q-value." Annals of Statistics, 31(6):2013-2035.

Source code in online_fdr/utils/static.py
def storey_bh(p_vals: [float], alpha: float, lambda_: float) -> (int, float):
    """Static Storey-Benjamini-Hochberg procedure with π₀ estimation.

    The Storey-BH procedure extends the classical Benjamini-Hochberg method by
    estimating π₀ (the proportion of true null hypotheses) and incorporating
    this estimate into the rejection threshold. This typically provides higher
    power when π₀ < 1, which is common in genomic and other high-dimensional
    applications.

    The algorithm estimates π₀ using Storey's method:
    π̂₀ = min(1, (1 + #{p > λ}) / (n × (1 - λ)))

    Then applies the BH procedure with adjusted significance level α/π̂₀.

    Args:
        p_vals: List of p-values to test. Should be in [0,1].
        alpha: Target FDR level. Must be in (0,1).
        lambda_: Threshold parameter for π₀ estimation. Must be in [0,1).
                Common choice: λ = 0.5. Higher values give more conservative
                π₀ estimates.

    Returns:
        Tuple of (number_of_rejections, rejection_threshold).
        - number_of_rejections: Number of hypotheses rejected
        - rejection_threshold: The p-value threshold used for rejection

    Raises:
        ValueError: If lambda_ >= 1.0.

    Examples:
        >>> # Standard usage with λ = 0.5
        >>> p_values = [0.001, 0.02, 0.5, 0.8, 0.9]
        >>> num_rej, threshold = storey_bh(p_values, alpha=0.05, lambda_=0.5)
        >>> print(f"Storey-BH rejected {num_rej} at threshold {threshold:.4f}")

        >>> # Compare with standard BH
        >>> num_rej_bh, threshold_bh = bh(p_values, alpha=0.05)
        >>> print(f"Standard BH rejected {num_rej_bh} at threshold {threshold_bh:.4f}")
        >>> # Storey-BH typically has higher power when π₀ < 1

    References:
        Storey, J. D. (2002). "A direct approach to false discovery rates."
        Journal of the Royal Statistical Society: Series B, 64(3):479-498.

        Storey, J. D. (2003). "The positive false discovery rate: a Bayesian
        interpretation and the q-value." Annals of Statistics, 31(6):2013-2035.
    """
    if not p_vals:
        return 0, 0.0

    n = len(p_vals)

    # Estimate π₀ using Storey's method
    # π₀ = (1 + #{p_i > λ}) / (n(1-λ))
    if lambda_ >= 1.0:
        raise ValueError("lambda_ must be less than 1.0")

    num_above_lambda = sum(1 for p in p_vals if p > lambda_)
    pi0 = min(1.0, (1 + num_above_lambda) / (n * (1 - lambda_)))

    # Apply BH procedure with adjusted alpha
    sorted_p_vals = sorted(p_vals)

    # Find the largest i such that P(i) ≤ (i/n) * (alpha/π₀)
    num_reject = 0
    threshold = 0.0

    for i in range(n):
        if sorted_p_vals[i] <= (i + 1) * alpha / (n * pi0):
            num_reject = i + 1
            threshold = sorted_p_vals[i]
        else:
            break

    return num_reject, threshold

online_fdr.utils.static.by(p_vals, alpha)

Static Benjamini-Yekutieli (BY) procedure for FDR control under dependence.

The Benjamini-Yekutieli procedure extends the Benjamini-Hochberg method to provide FDR control even under arbitrary dependence among p-values. It uses harmonic weights to maintain FDR control at the cost of reduced power compared to the standard BH procedure.

The algorithm finds the largest index i such that: P(i) ≤ (i/n) × α / H_n

where H_n = Σ(1/j) for j=1 to n is the n-th harmonic number.

Parameters:

Name Type Description Default
p_vals [float]

List of p-values to test. Should be in [0,1].

required
alpha float

Target FDR level. Must be in (0,1).

required

Returns:

Type Description
(int, float)

Tuple of (number_of_rejections, rejection_threshold).

(int, float)
  • number_of_rejections: Number of hypotheses rejected
(int, float)
  • rejection_threshold: The p-value threshold used for rejection

Examples:

>>> # Usage under arbitrary dependence
>>> p_values = [0.01, 0.02, 0.5, 0.8]  # May be dependent
>>> num_rej, threshold = by(p_values, alpha=0.05)
>>> print(f"BY rejected {num_rej} at threshold {threshold:.4f}")
>>> # Compare with BH (more conservative under dependence)
>>> num_rej_bh, threshold_bh = bh(p_values, alpha=0.05)
>>> print(f"BH rejected {num_rej_bh} (may not control FDR under dependence)")
Notes

The BY procedure is recommended when: - P-values may be arbitrarily dependent - Conservative FDR control is required - The exact dependence structure is unknown

It maintains valid FDR control under any dependence structure but typically has lower power than BH under independence.

References

Benjamini, Y., and D. Yekutieli (2001). "The control of the false discovery rate in multiple testing under dependency." Annals of Statistics, 29(4):1165-1188.

Source code in online_fdr/utils/static.py
def by(p_vals: [float], alpha: float) -> (int, float):
    """Static Benjamini-Yekutieli (BY) procedure for FDR control under dependence.

    The Benjamini-Yekutieli procedure extends the Benjamini-Hochberg method to
    provide FDR control even under arbitrary dependence among p-values. It uses
    harmonic weights to maintain FDR control at the cost of reduced power
    compared to the standard BH procedure.

    The algorithm finds the largest index i such that:
    P(i) ≤ (i/n) × α / H_n

    where H_n = Σ(1/j) for j=1 to n is the n-th harmonic number.

    Args:
        p_vals: List of p-values to test. Should be in [0,1].
        alpha: Target FDR level. Must be in (0,1).

    Returns:
        Tuple of (number_of_rejections, rejection_threshold).
        - number_of_rejections: Number of hypotheses rejected
        - rejection_threshold: The p-value threshold used for rejection

    Examples:
        >>> # Usage under arbitrary dependence
        >>> p_values = [0.01, 0.02, 0.5, 0.8]  # May be dependent
        >>> num_rej, threshold = by(p_values, alpha=0.05)
        >>> print(f"BY rejected {num_rej} at threshold {threshold:.4f}")

        >>> # Compare with BH (more conservative under dependence)
        >>> num_rej_bh, threshold_bh = bh(p_values, alpha=0.05)
        >>> print(f"BH rejected {num_rej_bh} (may not control FDR under dependence)")

    Notes:
        The BY procedure is recommended when:
        - P-values may be arbitrarily dependent
        - Conservative FDR control is required
        - The exact dependence structure is unknown

        It maintains valid FDR control under any dependence structure but
        typically has lower power than BH under independence.

    References:
        Benjamini, Y., and D. Yekutieli (2001). "The control of the false discovery
        rate in multiple testing under dependency." Annals of Statistics, 29(4):1165-1188.
    """
    n = len(p_vals)
    sorted_p_vals = sorted(p_vals)
    harmonic_sum = sum(1 / (i + 1) for i in range(n))

    def condition(i):
        return sorted_p_vals[i] <= alpha * (i + 1) / (n * harmonic_sum)

    left, right = 0, n
    while left < right:
        mid = (left + right) // 2
        if condition(mid):
            left = mid + 1
        else:
            right = mid

    return left, alpha * left / (n * harmonic_sum) if left else 0

Gamma Sequences

Specialized sequences for alpha allocation in online procedures:

online_fdr.utils.sequence.DefaultSaffronGammaSequence

Bases: AbstractGammaSequence

Default gamma sequence for SAFFRON (Spectral Approach for FDR with Online N-tests).

SAFFRON uses a power-law gamma sequence that provides excellent performance for adaptive online FDR control. The sequence can operate in two modes: with or without a normalization constant c.

The sequence is defined as: - With c: γⱼ = c / j^(gamma_exp)
- Without c: γⱼ = j^(gamma_exp)

Parameters:

Name Type Description Default
gamma_exp

Exponent controlling the decay rate. Recommended: 1.6.

required
c

Normalization constant. If None, uses pure power law.

required

Examples:

>>> # Standard SAFFRON sequence with recommended parameters
>>> gamma_seq = DefaultSaffronGammaSequence(gamma_exp=1.6, c=0.4374901658)
>>> gamma_values = [gamma_seq.calc_gamma(j) for j in range(1, 6)]
>>> print(f"SAFFRON γ values: {gamma_values}")
References

Ramdas, A., T. Zrnic, M. J. Wainwright, and M. I. Jordan (2018). "SAFFRON: an adaptive algorithm for online control of the false discovery rate." Proceedings of the 35th International Conference on Machine Learning (ICML), PMLR, 80:4286-4294.

Source code in online_fdr/utils/sequence.py
class DefaultSaffronGammaSequence(AbstractGammaSequence):
    """Default gamma sequence for SAFFRON (Spectral Approach for FDR with Online N-tests).

    SAFFRON uses a power-law gamma sequence that provides excellent performance
    for adaptive online FDR control. The sequence can operate in two modes:
    with or without a normalization constant c.

    The sequence is defined as:
    - With c: γⱼ = c / j^(gamma_exp)  
    - Without c: γⱼ = j^(gamma_exp)

    Args:
        gamma_exp: Exponent controlling the decay rate. Recommended: 1.6.
        c: Normalization constant. If None, uses pure power law.

    Examples:
        >>> # Standard SAFFRON sequence with recommended parameters
        >>> gamma_seq = DefaultSaffronGammaSequence(gamma_exp=1.6, c=0.4374901658)
        >>> gamma_values = [gamma_seq.calc_gamma(j) for j in range(1, 6)]
        >>> print(f"SAFFRON γ values: {gamma_values}")

    References:
        Ramdas, A., T. Zrnic, M. J. Wainwright, and M. I. Jordan (2018).
        "SAFFRON: an adaptive algorithm for online control of the false discovery rate."
        Proceedings of the 35th International Conference on Machine Learning (ICML),
        PMLR, 80:4286-4294.
    """

    def __init__(self, gamma_exp, c):
        """Initialize the SAFFRON gamma sequence.

        Args:
            gamma_exp: Decay exponent. Recommended value: 1.6.
            c: Normalization constant. Recommended value: 0.4374901658.
               If None, uses pure power law without normalization.
        """
        super().__init__(gamma_exp=gamma_exp, c=c)

    def calc_gamma(self, j: int, *args):
        """Calculate gamma value for position j.

        Args:
            j: Position index (1-based).
            *args: Additional arguments (unused).

        Returns:
            Gamma value for position j.
        """
        return j**self.gamma_exp if self.c is None \
            else self.c / j**self.gamma_exp  # fmt: skip

Functions

calc_gamma(j, *args)

Calculate gamma value for position j.

Parameters:

Name Type Description Default
j int

Position index (1-based).

required
*args

Additional arguments (unused).

()

Returns:

Type Description

Gamma value for position j.

Source code in online_fdr/utils/sequence.py
def calc_gamma(self, j: int, *args):
    """Calculate gamma value for position j.

    Args:
        j: Position index (1-based).
        *args: Additional arguments (unused).

    Returns:
        Gamma value for position j.
    """
    return j**self.gamma_exp if self.c is None \
        else self.c / j**self.gamma_exp  # fmt: skip

online_fdr.utils.sequence.DefaultLordGammaSequence

Bases: AbstractGammaSequence

Default gamma sequence for LORD (Levels based On Recent Discovery) procedures.

This gamma sequence is optimized for LORD algorithms and provides the balance between early power and long-term FDR control. Unlike LOND sequences, LORD sequences don't require the alpha parameter since they work with alpha wealth dynamics.

The sequence is defined as: γⱼ = c × log(max(j,2)) / (j × exp(√log(j)))

Parameters:

Name Type Description Default
c float

Normalization constant controlling sequence scale.

required

Examples:

>>> # Standard LORD gamma sequence  
>>> gamma_seq = DefaultLordGammaSequence(c=0.07720838)
>>> gamma_values = [gamma_seq.calc_gamma(j) for j in range(1, 6)]
>>> print(f"First 5 gamma values: {gamma_values}")
References

Javanmard, A., and A. Montanari (2018). "Online rules for control of false discovery rate and false discovery exceedance." Annals of Statistics, 46(2):526-554.

Source code in online_fdr/utils/sequence.py
class DefaultLordGammaSequence(AbstractGammaSequence):
    """Default gamma sequence for LORD (Levels based On Recent Discovery) procedures.

    This gamma sequence is optimized for LORD algorithms and provides the balance
    between early power and long-term FDR control. Unlike LOND sequences, LORD
    sequences don't require the alpha parameter since they work with alpha wealth
    dynamics.

    The sequence is defined as:
    γⱼ = c × log(max(j,2)) / (j × exp(√log(j)))

    Args:
        c: Normalization constant controlling sequence scale.

    Examples:
        >>> # Standard LORD gamma sequence  
        >>> gamma_seq = DefaultLordGammaSequence(c=0.07720838)
        >>> gamma_values = [gamma_seq.calc_gamma(j) for j in range(1, 6)]
        >>> print(f"First 5 gamma values: {gamma_values}")

    References:
        Javanmard, A., and A. Montanari (2018). "Online rules for control of false
        discovery rate and false discovery exceedance." Annals of Statistics,
        46(2):526-554.
    """

    def __init__(self, c: float):
        """Initialize the LORD gamma sequence.

        Args:
            c: Normalization constant. Recommended value: c ≈ 0.07720838.
        """
        super().__init__(c)

    def calc_gamma(self, j: int, **kwargs):
        """Calculate the gamma value for position j in the sequence.

        Args:
            j: Position index (1-based).
            **kwargs: Additional arguments (unused for LORD sequences).

        Returns:
            The gamma value for position j.
        """
        return (
            self.c  # fmt: skip
            * math.log(max(j, 2))
            / (j * math.exp(math.sqrt(math.log(j))))
        )

Functions

calc_gamma(j, **kwargs)

Calculate the gamma value for position j in the sequence.

Parameters:

Name Type Description Default
j int

Position index (1-based).

required
**kwargs

Additional arguments (unused for LORD sequences).

{}

Returns:

Type Description

The gamma value for position j.

Source code in online_fdr/utils/sequence.py
def calc_gamma(self, j: int, **kwargs):
    """Calculate the gamma value for position j in the sequence.

    Args:
        j: Position index (1-based).
        **kwargs: Additional arguments (unused for LORD sequences).

    Returns:
        The gamma value for position j.
    """
    return (
        self.c  # fmt: skip
        * math.log(max(j, 2))
        / (j * math.exp(math.sqrt(math.log(j))))
    )

online_fdr.utils.sequence.DefaultLondGammaSequence

Bases: AbstractGammaSequence

Default gamma sequence for LOND (Levels based On Number of Discoveries).

This gamma sequence is specifically designed for LOND procedures and provides the recommended decay rate for optimal power-stability trade-offs. The sequence incorporates both logarithmic and exponential decay components to balance early power with long-term FDR control.

The sequence is defined as: γⱼ = c × α × log(max(j,2)) / (j × exp(√log(j)))

Parameters:

Name Type Description Default
c float

Normalization constant controlling the overall scale of the sequence.

required

Attributes:

Name Type Description
c float | None

The normalization constant.

Examples:

>>> # Standard LOND gamma sequence
>>> gamma_seq = DefaultLondGammaSequence(c=0.07720838)
>>> gamma_1 = gamma_seq.calc_gamma(1, alpha=0.05)
>>> gamma_10 = gamma_seq.calc_gamma(10, alpha=0.05)
>>> print(f"γ₁ = {gamma_1:.6f}, γ₁₀ = {gamma_10:.6f}")
References

Javanmard, A., and A. Montanari (2018). "Online rules for control of false discovery rate and false discovery exceedance." Annals of Statistics, 46(2):526-554.

Source code in online_fdr/utils/sequence.py
class DefaultLondGammaSequence(AbstractGammaSequence):
    """Default gamma sequence for LOND (Levels based On Number of Discoveries).

    This gamma sequence is specifically designed for LOND procedures and provides
    the recommended decay rate for optimal power-stability trade-offs. The sequence
    incorporates both logarithmic and exponential decay components to balance
    early power with long-term FDR control.

    The sequence is defined as:
    γⱼ = c × α × log(max(j,2)) / (j × exp(√log(j)))

    Args:
        c: Normalization constant controlling the overall scale of the sequence.

    Attributes:
        c: The normalization constant.

    Examples:
        >>> # Standard LOND gamma sequence
        >>> gamma_seq = DefaultLondGammaSequence(c=0.07720838)
        >>> gamma_1 = gamma_seq.calc_gamma(1, alpha=0.05)
        >>> gamma_10 = gamma_seq.calc_gamma(10, alpha=0.05)
        >>> print(f"γ₁ = {gamma_1:.6f}, γ₁₀ = {gamma_10:.6f}")

    References:
        Javanmard, A., and A. Montanari (2018). "Online rules for control of false
        discovery rate and false discovery exceedance." Annals of Statistics,
        46(2):526-554.
    """

    def __init__(self, c: float):
        """Initialize the LOND gamma sequence.

        Args:
            c: Normalization constant. Recommended value: c ≈ 0.07720838.
        """
        super().__init__(c)

    def calc_gamma(self, j: int, **kwargs):
        """Calculate the gamma value for the j-th test.

        Args:
            j: Test index (1-based). 
            **kwargs: Must contain 'alpha' - the target FDR level.

        Returns:
            The gamma value for test j.

        Examples:
            >>> seq = DefaultLondGammaSequence(c=0.07720838)
            >>> gamma_5 = seq.calc_gamma(5, alpha=0.05)
        """
        alpha = kwargs.get("alpha")
        return (
            self.c
            * alpha
            * (math.log(max(j, 2)) / (j * math.exp(math.sqrt(math.log(j)))))
        )

Functions

calc_gamma(j, **kwargs)

Calculate the gamma value for the j-th test.

Parameters:

Name Type Description Default
j int

Test index (1-based).

required
**kwargs

Must contain 'alpha' - the target FDR level.

{}

Returns:

Type Description

The gamma value for test j.

Examples:

>>> seq = DefaultLondGammaSequence(c=0.07720838)
>>> gamma_5 = seq.calc_gamma(5, alpha=0.05)
Source code in online_fdr/utils/sequence.py
def calc_gamma(self, j: int, **kwargs):
    """Calculate the gamma value for the j-th test.

    Args:
        j: Test index (1-based). 
        **kwargs: Must contain 'alpha' - the target FDR level.

    Returns:
        The gamma value for test j.

    Examples:
        >>> seq = DefaultLondGammaSequence(c=0.07720838)
        >>> gamma_5 = seq.calc_gamma(5, alpha=0.05)
    """
    alpha = kwargs.get("alpha")
    return (
        self.c
        * alpha
        * (math.log(max(j, 2)) / (j * math.exp(math.sqrt(math.log(j)))))
    )

Validation Functions

Input validation and error checking:

online_fdr.utils.validity.check_p_val(p_val)

Validate that a p-value is in the valid range [0, 1].

Parameters:

Name Type Description Default
p_val float

The p-value to validate.

required

Raises:

Type Description
ValueError

If p_val is not in [0, 1].

Examples:

>>> check_p_val(0.05)  # Valid - no exception
>>> check_p_val(1.5)   # Raises ValueError
Traceback (most recent call last):
ValueError: Given p-value must be between [0,1].
Source code in online_fdr/utils/validity.py
def check_p_val(p_val: float) -> None:
    """Validate that a p-value is in the valid range [0, 1].

    Args:
        p_val: The p-value to validate.

    Raises:
        ValueError: If p_val is not in [0, 1].

    Examples:
        >>> check_p_val(0.05)  # Valid - no exception
        >>> check_p_val(1.5)   # Raises ValueError
        Traceback (most recent call last):
        ValueError: Given p-value must be between [0,1].
    """
    if not 0 <= p_val <= 1:
        raise ValueError(
            """
            Given p-value must be between [0,1].
            """
        )

online_fdr.utils.validity.check_alpha(p_val)

Validate that an alpha value is in the valid range (0, 1).

Parameters:

Name Type Description Default
p_val float

The alpha/significance level to validate.

required

Raises:

Type Description
ValueError

If p_val is not in (0, 1).

Examples:

>>> check_alpha(0.05)  # Valid - no exception
>>> check_alpha(0.0)   # Raises ValueError (not > 0)
>>> check_alpha(1.0)   # Raises ValueError (not < 1)
Source code in online_fdr/utils/validity.py
def check_alpha(p_val: float) -> None:
    """Validate that an alpha value is in the valid range (0, 1).

    Args:
        p_val: The alpha/significance level to validate.

    Raises:
        ValueError: If p_val is not in (0, 1).

    Examples:
        >>> check_alpha(0.05)  # Valid - no exception
        >>> check_alpha(0.0)   # Raises ValueError (not > 0)
        >>> check_alpha(1.0)   # Raises ValueError (not < 1)
    """
    if not 0 < p_val < 1:
        raise ValueError(
            """
            Given alpha value must be between (0,1).
            """
        )

online_fdr.utils.validity.check_initial_wealth(initial_wealth, alpha)

Validate that initial wealth is properly bounded relative to alpha.

In alpha-wealth procedures (LORD, SAFFRON, etc.), the initial wealth must be positive but less than the target FDR level to ensure proper wealth dynamics and FDR control.

Parameters:

Name Type Description Default
initial_wealth float

The initial alpha wealth.

required
alpha float

The target FDR level.

required

Raises:

Type Description
ValueError

If initial_wealth is not in (0, alpha).

Examples:

>>> check_initial_wealth(0.025, 0.05)  # Valid - no exception
>>> check_initial_wealth(0.1, 0.05)    # Raises ValueError (> alpha)
>>> check_initial_wealth(0.0, 0.05)    # Raises ValueError (not > 0)
Source code in online_fdr/utils/validity.py
def check_initial_wealth(initial_wealth: float, alpha: float) -> None:
    """Validate that initial wealth is properly bounded relative to alpha.

    In alpha-wealth procedures (LORD, SAFFRON, etc.), the initial wealth
    must be positive but less than the target FDR level to ensure proper
    wealth dynamics and FDR control.

    Args:
        initial_wealth: The initial alpha wealth.
        alpha: The target FDR level.

    Raises:
        ValueError: If initial_wealth is not in (0, alpha).

    Examples:
        >>> check_initial_wealth(0.025, 0.05)  # Valid - no exception
        >>> check_initial_wealth(0.1, 0.05)    # Raises ValueError (> alpha)
        >>> check_initial_wealth(0.0, 0.05)    # Raises ValueError (not > 0)
    """
    if not 0 < initial_wealth < alpha:
        raise ValueError(
            """
            The initial wealth should be between (0, alpha).
            """
        )

online_fdr.utils.validity.check_candidate_threshold(lambda_)

Validate that a candidate threshold λ is in the valid range (0, 1).

The candidate threshold λ is used in SAFFRON and ADDIS algorithms to determine which p-values are considered "candidates" for rejection. It must be strictly between 0 and 1.

Parameters:

Name Type Description Default
lambda_ float

The candidate threshold to validate.

required

Raises:

Type Description
ValueError

If lambda_ is not in (0, 1).

Examples:

>>> check_candidate_threshold(0.5)  # Valid - no exception  
>>> check_candidate_threshold(0.0)  # Raises ValueError
>>> check_candidate_threshold(1.0)  # Raises ValueError
Source code in online_fdr/utils/validity.py
def check_candidate_threshold(lambda_: float) -> None:
    """Validate that a candidate threshold λ is in the valid range (0, 1).

    The candidate threshold λ is used in SAFFRON and ADDIS algorithms to
    determine which p-values are considered "candidates" for rejection.
    It must be strictly between 0 and 1.

    Args:
        lambda_: The candidate threshold to validate.

    Raises:
        ValueError: If lambda_ is not in (0, 1).

    Examples:
        >>> check_candidate_threshold(0.5)  # Valid - no exception  
        >>> check_candidate_threshold(0.0)  # Raises ValueError
        >>> check_candidate_threshold(1.0)  # Raises ValueError
    """
    if not 0 < lambda_ < 1:
        raise ValueError(
            """
            The candidate threshold (lambda) should be between (0, 1).
            """
        )

online_fdr.utils.validity.check_wealth(wealth)

Validate that alpha wealth is positive (not depleted).

In alpha-wealth procedures, when wealth reaches zero or below, no more tests can be performed while maintaining FDR control. This function checks for wealth depletion.

Parameters:

Name Type Description Default
wealth float

Current alpha wealth level.

required

Raises:

Type Description
ValueError

If wealth <= 0.

Examples:

>>> check_wealth(0.01)  # Valid - no exception
>>> check_wealth(0.0)   # Raises ValueError (wealth depleted)
>>> check_wealth(-0.1)  # Raises ValueError (negative wealth)
Source code in online_fdr/utils/validity.py
def check_wealth(wealth: float) -> None:
    """Validate that alpha wealth is positive (not depleted).

    In alpha-wealth procedures, when wealth reaches zero or below,
    no more tests can be performed while maintaining FDR control.
    This function checks for wealth depletion.

    Args:
        wealth: Current alpha wealth level.

    Raises:
        ValueError: If wealth <= 0.

    Examples:
        >>> check_wealth(0.01)  # Valid - no exception
        >>> check_wealth(0.0)   # Raises ValueError (wealth depleted)
        >>> check_wealth(-0.1)  # Raises ValueError (negative wealth)
    """
    if wealth <= 0:
        raise ValueError(
            """
            Alpha wealth depleted. Test execution stopped.
            """
        )

online_fdr.utils.validity.check_decay_factor(decay_factor)

Validate that a decay factor is in the valid range (0, 1).

Decay factors are used in various algorithms to control the rate at which parameters decrease over time. They must be strictly between 0 and 1 for proper convergence behavior.

Parameters:

Name Type Description Default
decay_factor float

The decay factor to validate.

required

Raises:

Type Description
ValueError

If decay_factor is not in (0, 1).

Examples:

>>> check_decay_factor(0.9)   # Valid - no exception
>>> check_decay_factor(0.0)   # Raises ValueError (not > 0)
>>> check_decay_factor(1.0)   # Raises ValueError (not < 1)
>>> check_decay_factor(1.5)   # Raises ValueError (not < 1)
Source code in online_fdr/utils/validity.py
def check_decay_factor(decay_factor: float) -> None:
    """Validate that a decay factor is in the valid range (0, 1).

    Decay factors are used in various algorithms to control the rate
    at which parameters decrease over time. They must be strictly
    between 0 and 1 for proper convergence behavior.

    Args:
        decay_factor: The decay factor to validate.

    Raises:
        ValueError: If decay_factor is not in (0, 1).

    Examples:
        >>> check_decay_factor(0.9)   # Valid - no exception
        >>> check_decay_factor(0.0)   # Raises ValueError (not > 0)
        >>> check_decay_factor(1.0)   # Raises ValueError (not < 1)
        >>> check_decay_factor(1.5)   # Raises ValueError (not < 1)
    """
    if not 0 < decay_factor < 1:
        raise ValueError(
            """
            Decay factor must be between (0,1).
            """
        )

Overview

Static Procedures

The static module contains implementations of classical FDR control procedures that form the building blocks for online and batch methods:

  • Benjamini-Hochberg (BH): The foundational FDR procedure
  • Storey-BH: Enhanced power through π₀ estimation
  • Benjamini-Yekutieli (BY): FDR control under arbitrary dependence

These functions are used internally by batch testing methods and can also be used directly for offline multiple testing.

Gamma Sequences

Gamma sequences determine how alpha budget is allocated over time in online procedures. Different algorithms require different sequence properties:

  • SAFFRON sequences: Power-law decay optimized for adaptive procedures
  • LORD sequences: Logarithmic decay for wealth-based procedures
  • LOND sequences: Complex decay balancing early/late power

Validation Framework

The validation module ensures input correctness and provides informative error messages:

  • Range validation: P-values in [0,1], alpha in (0,1)
  • Relationship validation: Initial wealth < alpha
  • State validation: Wealth depletion detection

Usage Examples

Static Procedures

from online_fdr.utils.static import bh, storey_bh, by

# Sample p-values  
p_values = [0.001, 0.02, 0.15, 0.8, 0.9]
alpha = 0.05

# Standard Benjamini-Hochberg
num_rej_bh, threshold_bh = bh(p_values, alpha)
print(f"BH: {num_rej_bh} rejections at threshold {threshold_bh:.4f}")

# Storey-BH with π₀ estimation
num_rej_storey, threshold_storey = storey_bh(p_values, alpha, lambda_=0.5)
print(f"Storey-BH: {num_rej_storey} rejections at threshold {threshold_storey:.4f}")

# Benjamini-Yekutieli (under dependence)
num_rej_by, threshold_by = by(p_values, alpha)  
print(f"BY: {num_rej_by} rejections at threshold {threshold_by:.4f}")

Gamma Sequences

from online_fdr.utils.sequence import DefaultSaffronGammaSequence

# Create SAFFRON gamma sequence
gamma_seq = DefaultSaffronGammaSequence(gamma_exp=1.6, c=0.4374901658)

# Generate first 10 gamma values
gamma_values = [gamma_seq.calc_gamma(j) for j in range(1, 11)]
print("SAFFRON gamma sequence:", gamma_values)

# Show decay behavior
import matplotlib.pyplot as plt
j_values = range(1, 101)
gamma_values = [gamma_seq.calc_gamma(j) for j in j_values]
plt.loglog(j_values, gamma_values)
plt.xlabel("Position j")
plt.ylabel("Gamma value")
plt.title("SAFFRON Gamma Sequence Decay")

Validation Functions

from online_fdr.utils.validity import check_p_val, check_alpha, check_initial_wealth

# Validate inputs
try:
    check_p_val(0.05)           # Valid p-value
    check_alpha(0.05)           # Valid alpha  
    check_initial_wealth(0.025, 0.05)  # Valid wealth < alpha
    print("All validations passed")
except ValueError as e:
    print(f"Validation error: {e}")

# Example of validation failure
try:
    check_p_val(1.5)            # Invalid p-value > 1
except ValueError as e:
    print(f"P-value validation failed: {e}")

Algorithm Support

How Utilities Support Online Algorithms

  1. Static procedures provide the core multiple testing logic
  2. Gamma sequences control alpha allocation strategies
  3. Validation functions ensure algorithmic correctness
  4. Data utilities support testing and evaluation

Customization and Extension

The utility framework supports customization:

  • Custom gamma sequences: Inherit from AbstractGammaSequence
  • Custom spending functions: Implement AbstractSpendFunc
  • Custom static procedures: Follow the standard interface
  • Additional validation: Extend existing validation functions

Best Practices

Static Procedure Usage

  1. Choose appropriate procedure based on dependence assumptions
  2. Handle edge cases (empty input, all p-values = 1)
  3. Validate inputs before calling procedures
  4. Understand threshold interpretation for downstream usage

Gamma Sequence Selection

  1. Use algorithm-specific defaults when available
  2. Consider decay rate for your application timeline
  3. Validate convergence properties for custom sequences
  4. Test empirical performance versus theoretical optimal

Validation Integration

  1. Validate early and often in computational pipelines
  2. Provide informative error messages to users
  3. Handle validation gracefully in production code
  4. Document validation requirements for API users

Performance Considerations

Computational Complexity

  • Static procedures: O(n log n) due to sorting
  • Gamma sequences: O(1) per value computation
  • Validation: O(1) per check

Memory Usage

  • Static procedures: O(n) for input storage
  • Gamma sequences: O(1) stateless computation
  • Validation: Minimal memory footprint

Optimization Tips

  1. Pre-sort p-values when calling static procedures repeatedly
  2. Cache gamma values for frequently accessed positions
  3. Batch validation when checking many values
  4. Use numpy arrays for vectorized operations when possible

References

Static Procedures: - Benjamini, Y., and Y. Hochberg (1995). "Controlling the False Discovery Rate" - Storey, J. D. (2002). "A direct approach to false discovery rates"
- Benjamini, Y., and D. Yekutieli (2001). "Control under dependency"

Gamma Sequences: - Ramdas, A., et al. (2018). "SAFFRON: adaptive algorithm for online FDR control" - Javanmard, A., and A. Montanari (2018). "Online rules for FDR control"

Validation Theory: - Wassmer, G., and W. Brannath (2016). "Group Sequential and Confirmatory Adaptive Designs"