Edit this page View source of this page

Exchangeability Martingales¶

Monitor streaming conformal p-values for evidence against exchangeability. Use these alarms as evidence signals, not as FDR-controlled anomaly decisions.

What This Feature Does¶

nonconform.martingales consumes sequential p-values and maintains:

Martingale evidence (M_n)
Restarted mixture e-process evidence for late-change sensitivity
CUSUM statistic (cumulative-sum change evidence)
Shiryaev-Roberts statistic (sequential evidence accumulator)
Optional alarm triggers from configurable thresholds

Implemented methods in this release:

PowerMartingale
SimpleMixtureMartingale
SimpleJumperMartingale

Why P-values (Not Raw Scores)¶

These martingales are conformal/exchangeability tests. Their validity relies on:

under exchangeability, the sequential conformal p-values used by the martingale are valid and, with proper randomized tie-breaking in the classical construction, i.i.d. Uniform(0, 1)

Raw anomaly scores do not satisfy this requirement directly. Use ConformalDetector to produce p-values first, then feed those p-values into a martingale. The martingale classes do not repair invalid p-values, temporal dependence, or detector retraining choices that break the conformal assumptions.

Do not mix up alarm types

ville_threshold and restarted_ville_threshold provide anytime false-alarm control for a single valid stream. They do not control FDR across many simultaneous hypotheses or many streams. For that, use the methods in FDR Control.

Basic Usage¶

import numpy as np
from sklearn.ensemble import IsolationForest

from nonconform import ConformalDetector, Split
from nonconform.martingales import AlarmConfig, SimpleJumperMartingale

rng = np.random.default_rng(42)
x_train = rng.standard_normal((300, 5))
x_stream = rng.standard_normal((100, 5))

detector = ConformalDetector(
    detector=IsolationForest(random_state=42),
    strategy=Split(n_calib=0.2),
    score_polarity="auto",
    seed=42,
)
detector.fit(x_train)

martingale = SimpleJumperMartingale(
    alarm_config=AlarmConfig(
        ville_threshold=100.0,
        restarted_ville_threshold=100.0,
    )
)

for x_t in x_stream:
    p_t = detector.compute_p_values(x_t.reshape(1, -1))[0]
    state = martingale.update(p_t)
    if "restarted_ville" in state.triggered_alarms:
        print(
            "Restarted Ville alarm "
            f"at step={state.step}, M={state.restarted_martingale:.2f}"
        )
        break

Minimal Example Notebook¶

A runnable notebook example is available at:

examples/exchangeability_martingale.ipynb

Open it with Jupyter:

jupyter notebook examples/exchangeability_martingale.ipynb

It uses:

oddball credit-card fraud data
IsolationForest for base anomaly scoring
ConformalDetector to produce streaming p-values
PowerMartingale, SimpleMixtureMartingale, and SimpleJumperMartingale for online evidence updates

The example trains on a subset and processes the remaining data in a streaming loop while logging p-values and evidence statistics step by step.

Available Martingales¶

PowerMartingale¶

Uses \(r_n = \epsilon \cdot p_n^{\epsilon - 1}\) for \(\epsilon \in (0, 1]\).

from nonconform.martingales import PowerMartingale

martingale = PowerMartingale(epsilon=0.5)

SimpleMixtureMartingale¶

Averaged (discrete) mixture over a grid of power martingales.

from nonconform.martingales import SimpleMixtureMartingale

martingale = SimpleMixtureMartingale(epsilons=[0.25, 0.5, 0.75, 1.0])

SimpleJumperMartingale¶

Implements the Simple Jumper update scheme from conformal martingale literature.

from nonconform.martingales import SimpleJumperMartingale

martingale = SimpleJumperMartingale(jump=0.01)

Alarm Semantics¶

Alarms are disabled by default.

Set thresholds with AlarmConfig:

ville_threshold: threshold on martingale M_n
restarted_ville_threshold: threshold on the restarted mixture e-process
cusum_threshold: threshold on the CUSUM/e-CUSUM evidence statistic
shiryaev_roberts_threshold: threshold on the Shiryaev-Roberts evidence statistic

MartingaleState.triggered_alarms is a tuple of alarm names (for example, ("ville", "restarted_ville")) indicating which thresholds are currently exceeded. It can be empty when no alarms are active.

Interpreting `ville_threshold`¶

For a valid nonnegative martingale started at 1 under the null (exchangeability), Ville's inequality gives:

\[ \Pr\left(\sup_t M_t \ge \lambda\right) \le \frac{1}{\lambda}. \]

So choosing ville_threshold = lambda controls the probability of ever crossing that threshold on a null stream at most 1 / lambda.

Example mappings:

ville_threshold = 20 -> false alarm probability at most 0.05
ville_threshold = 100 -> false alarm probability at most 0.01

Interpreting `restarted_ville_threshold`¶

restarted_ville_threshold applies to a restarted mixture e-process. It uses a proper weighted sum over possible restart times rather than the raw CUSUM maximum. This gives CUSUM-like sensitivity to late changes while preserving the same Ville-style anytime false-alarm probability control as the product martingale.

Use the same threshold mapping:

alpha = 0.01
alarm_config = AlarmConfig(
    ville_threshold=1 / alpha,
    restarted_ville_threshold=1 / alpha,
    cusum_threshold=None,
)

The restarted mixture uses the harmonic restart prior pi_t = 1 / (t * (t + 1)) with tail mass 1 / (t + 1). The tail mass is part of the e-process accounting and keeps the process initialized at 1.

Interpreting CUSUM and Shiryaev-Roberts Thresholds¶

cusum_threshold applies to the CUSUM/e-CUSUM statistic. It is useful as a changepoint evidence statistic, but it is not a Ville threshold. Interpret it through ARL/FAR guarantees or empirical calibration unless a separate theorem is provided for the exact implementation.

shiryaev_roberts_threshold applies to the SR/e-SR statistic. Depending on the procedure, SR variants can have ARL/e-detector interpretations, but this threshold should not be documented as a probability-of-ever-crossing Ville control unless the implemented statistic is itself an e-process.

Scope of this guarantee:

ville_threshold and restarted_ville_threshold provide anytime false-alarm control per stream (single null) when the input e-values are conditionally valid.
FDR control across many simultaneous hypotheses or streams requires separate multiple-testing procedures; see FDR Control.

Practical Notes¶

Keep detector retraining logic outside the martingale classes in this release.
Interpret alarms as evidence signals, not automated retraining decisions.
Exact exchangeability-martingale validity follows the sequential conformal setup in the paper. If you reuse a fixed calibration set to score a stream, treat alarms as monitoring signals unless you have separately justified the resulting p-value sequence.
If temporal dependence is strong, p-value validity can degrade; monitor model and data assumptions alongside evidence statistics.

References¶

Vovk, V., Petej, I., Nouretdinov, I., Ahlberg, E., Carlsson, L., & Gammerman, A. (2021). Retrain or not retrain: conformal test martingales for change-point detection. Proceedings of Machine Learning Research, 152, 191-210.
Ramdas, A., Grünwald, P., Vovk, V., & Shafer, G. (2023). Game-theoretic statistics and safe anytime-valid inference. Statistical Science, 38(4), 576-601.
Shafer, G., & Vovk, V. (2008). A Tutorial on Conformal Prediction. Journal of Machine Learning Research, 9, 371-421.