Exchangeability Martingales¶
Monitor streaming conformal p-values for evidence against exchangeability. Use these alarms as evidence signals, not as FDR-controlled anomaly decisions.
What This Feature Does¶
nonconform.martingales consumes sequential p-values and maintains:
- Martingale evidence (
M_n) - Restarted mixture e-process evidence for late-change sensitivity
- CUSUM statistic (cumulative-sum change evidence)
- Shiryaev-Roberts statistic (sequential evidence accumulator)
- Optional alarm triggers from configurable thresholds
Implemented methods in this release:
PowerMartingaleSimpleMixtureMartingaleSimpleJumperMartingale
Why P-values (Not Raw Scores)¶
These martingales are conformal/exchangeability tests. Their validity relies on:
- under exchangeability, the sequential conformal p-values used by the
martingale are valid and, with proper randomized tie-breaking in the classical
construction, i.i.d.
Uniform(0, 1)
Raw anomaly scores do not satisfy this requirement directly. Use
ConformalDetector to produce p-values first, then feed those p-values into a
martingale. The martingale classes do not repair invalid p-values, temporal
dependence, or detector retraining choices that break the conformal assumptions.
Do not mix up alarm types
ville_threshold and restarted_ville_threshold provide anytime
false-alarm control for a single valid stream. They do not control FDR
across many simultaneous hypotheses or many streams. For that, use the
methods in FDR Control.
Basic Usage¶
import numpy as np
from sklearn.ensemble import IsolationForest
from nonconform import ConformalDetector, Split
from nonconform.martingales import AlarmConfig, SimpleJumperMartingale
rng = np.random.default_rng(42)
x_train = rng.standard_normal((300, 5))
x_stream = rng.standard_normal((100, 5))
detector = ConformalDetector(
detector=IsolationForest(random_state=42),
strategy=Split(n_calib=0.2),
score_polarity="auto",
seed=42,
)
detector.fit(x_train)
martingale = SimpleJumperMartingale(
alarm_config=AlarmConfig(
ville_threshold=100.0,
restarted_ville_threshold=100.0,
)
)
for x_t in x_stream:
p_t = detector.compute_p_values(x_t.reshape(1, -1))[0]
state = martingale.update(p_t)
if "restarted_ville" in state.triggered_alarms:
print(
"Restarted Ville alarm "
f"at step={state.step}, M={state.restarted_martingale:.2f}"
)
break
Minimal Example Notebook¶
A runnable notebook example is available at:
examples/exchangeability_martingale.ipynb
Open it with Jupyter:
jupyter notebook examples/exchangeability_martingale.ipynb
It uses:
oddballcredit-card fraud dataIsolationForestfor base anomaly scoringConformalDetectorto produce streaming p-valuesPowerMartingale,SimpleMixtureMartingale, andSimpleJumperMartingalefor online evidence updates
The example trains on a subset and processes the remaining data in a streaming loop while logging p-values and evidence statistics step by step.
Available Martingales¶
PowerMartingale¶
Uses \(r_n = \epsilon \cdot p_n^{\epsilon - 1}\) for \(\epsilon \in (0, 1]\).
from nonconform.martingales import PowerMartingale
martingale = PowerMartingale(epsilon=0.5)
SimpleMixtureMartingale¶
Averaged (discrete) mixture over a grid of power martingales.
from nonconform.martingales import SimpleMixtureMartingale
martingale = SimpleMixtureMartingale(epsilons=[0.25, 0.5, 0.75, 1.0])
SimpleJumperMartingale¶
Implements the Simple Jumper update scheme from conformal martingale literature.
from nonconform.martingales import SimpleJumperMartingale
martingale = SimpleJumperMartingale(jump=0.01)
Alarm Semantics¶
Alarms are disabled by default.
Set thresholds with AlarmConfig:
ville_threshold: threshold on martingaleM_nrestarted_ville_threshold: threshold on the restarted mixture e-processcusum_threshold: threshold on the CUSUM/e-CUSUM evidence statisticshiryaev_roberts_threshold: threshold on the Shiryaev-Roberts evidence statistic
MartingaleState.triggered_alarms is a tuple of alarm names (for example,
("ville", "restarted_ville")) indicating which thresholds are currently
exceeded.
It can be empty when no alarms are active.
Interpreting ville_threshold¶
For a valid nonnegative martingale started at 1 under the null (exchangeability), Ville's inequality gives:
So choosing ville_threshold = lambda controls the probability of ever crossing
that threshold on a null stream at most 1 / lambda.
Example mappings:
ville_threshold = 20-> false alarm probability at most0.05ville_threshold = 100-> false alarm probability at most0.01
Interpreting restarted_ville_threshold¶
restarted_ville_threshold applies to a restarted mixture e-process. It uses a
proper weighted sum over possible restart times rather than the raw CUSUM
maximum. This gives CUSUM-like sensitivity to late changes while preserving the
same Ville-style anytime false-alarm probability control as the product
martingale.
Use the same threshold mapping:
alpha = 0.01
alarm_config = AlarmConfig(
ville_threshold=1 / alpha,
restarted_ville_threshold=1 / alpha,
cusum_threshold=None,
)
The restarted mixture uses the harmonic restart prior
pi_t = 1 / (t * (t + 1)) with tail mass 1 / (t + 1). The tail mass is part
of the e-process accounting and keeps the process initialized at 1.
Interpreting CUSUM and Shiryaev-Roberts Thresholds¶
cusum_threshold applies to the CUSUM/e-CUSUM statistic. It is useful as a
changepoint evidence statistic, but it is not a Ville threshold. Interpret it
through ARL/FAR guarantees or empirical calibration unless a separate theorem is
provided for the exact implementation.
shiryaev_roberts_threshold applies to the SR/e-SR statistic. Depending on the
procedure, SR variants can have ARL/e-detector interpretations, but this
threshold should not be documented as a probability-of-ever-crossing Ville
control unless the implemented statistic is itself an e-process.
Scope of this guarantee:
ville_thresholdandrestarted_ville_thresholdprovide anytime false-alarm control per stream (single null) when the input e-values are conditionally valid.- FDR control across many simultaneous hypotheses or streams requires separate multiple-testing procedures; see FDR Control.
Practical Notes¶
- Keep detector retraining logic outside the martingale classes in this release.
- Interpret alarms as evidence signals, not automated retraining decisions.
- Exact exchangeability-martingale validity follows the sequential conformal setup in the paper. If you reuse a fixed calibration set to score a stream, treat alarms as monitoring signals unless you have separately justified the resulting p-value sequence.
- If temporal dependence is strong, p-value validity can degrade; monitor model and data assumptions alongside evidence statistics.
References¶
- Vovk, V., Petej, I., Nouretdinov, I., Ahlberg, E., Carlsson, L., & Gammerman, A. (2021). Retrain or not retrain: conformal test martingales for change-point detection. Proceedings of Machine Learning Research, 152, 191-210.
- Ramdas, A., Grünwald, P., Vovk, V., & Shafer, G. (2023). Game-theoretic statistics and safe anytime-valid inference. Statistical Science, 38(4), 576-601.
- Shafer, G., & Vovk, V. (2008). A Tutorial on Conformal Prediction. Journal of Machine Learning Research, 9, 371-421.