User Guide¶

Welcome to the ONAD User Guide! This comprehensive guide will help you master online anomaly detection with ONAD, from basic concepts to advanced deployment strategies.

What is Online Anomaly Detection?¶

Online anomaly detection is the process of identifying unusual patterns or outliers in streaming data as it arrives, without requiring the entire dataset to be available upfront. This approach is essential for:

Real-time systems that need immediate responses to anomalies
Large-scale data where storing everything is impractical
Evolving patterns where the definition of "normal" changes over time
Resource-constrained environments with limited memory and compute

ONAD Philosophy¶

ONAD is built around several core principles:

Streaming-First Design¶

Every component in ONAD is designed to process data one point at a time, maintaining constant memory usage regardless of stream length.

Composable Architecture¶

Models, transformers, and utilities can be combined in flexible pipelines to solve complex problems.

Production-Ready¶

Built-in logging, error handling, and memory management make ONAD suitable for production deployments.

Type Safety¶

Comprehensive type hints ensure code reliability and better IDE support.

Getting Started¶

1. Choose Your Model¶

ONAD provides several categories of anomaly detection models:

Forest-based Models: Tree ensemble methods like Isolation Forest
SVM-based Models: Support Vector Machine approaches
Statistical Models: Classical statistical methods
Distance-based Models: K-NN and similarity search

2. Prepare Your Data¶

Use ONAD's data transformation tools:

Scaling: Normalize features with StandardScaler or MinMaxScaler
Dimensionality Reduction: Reduce features with Incremental PCA
Stream Processing: Efficiently load and process data streams

3. Build Your Pipeline¶

Combine components for sophisticated processing:

Pipeline Construction: Learn to chain transformers and models
Memory Management: Configure memory limits and monitoring
Error Handling: Implement robust error recovery

Common Workflows¶

Basic Anomaly Detection¶

from onad.model.iforest import OnlineIsolationForest

# Initialize model
model = OnlineIsolationForest()

# Process streaming data
for data_point in stream:
    model.learn_one(data_point)
    score = model.score_one(data_point)
    if score > threshold:
        handle_anomaly(data_point, score)

Data Preprocessing Pipeline¶

from onad.transform.preprocessing.scaler import StandardScaler
from onad.model.svm import IncrementalOneClassSVMAdaptiveKernel

# Create pipeline components
scaler = StandardScaler()
detector = IncrementalOneClassSVMAdaptiveKernel()


# Process data through pipeline
def detect_anomaly(raw_data):
    scaler.learn_one(raw_data)
    normalized_data = scaler.transform_one(raw_data)
    detector.learn_one(normalized_data)
    return detector.score_one(normalized_data)

Batch Processing with Streaming Interface¶

from onad.stream import ParquetStreamer, Dataset

with ParquetStreamer(Dataset.FRAUD) as streamer:
    for features, label in streamer:
        # Process each data point
        model.learn_one(features)
        score = model.score_one(features)

Topics Covered¶

Core Concepts¶

Models Overview: Complete guide to all anomaly detection algorithms
Data Transformers: Preprocessing and feature engineering
Stream Processing: Efficient data loading and processing

Advanced Topics¶

Pipeline Construction: Building complex processing workflows
Model Evaluation: Testing and validating your models
Best Practices: Production deployment guidelines

Quick Reference¶

Essential Imports¶

# Models
from onad.model.iforest import OnlineIsolationForest
from onad.model.svm import IncrementalOneClassSVMAdaptiveKernel
from onad.model.stat.multi import MovingMahalanobisDistance

# Transformers
from onad.transform.preprocessing.scaler import StandardScaler, MinMaxScaler
from onad.transform.projection.incremental_pca import IncrementalPCA

# Streaming
from onad.stream import ParquetStreamer, Dataset

Common Parameters¶

# Memory management
model = OnlineIsolationForest(window_size=1000)

# Performance tuning
model = OnlineIsolationForest(num_trees=100, max_leaf_samples=32)

# Data preprocessing
scaler = StandardScaler()
pca = IncrementalPCA(n_components=10)

Typical Workflow¶

Initialize model and transformers
Learn from each data point with learn_one()
Score data points with score_one()
Transform data with transform_one() (if using transformers)
Monitor model state with logging and metrics

Next Steps

Start with the Models Overview to understand available algorithms
Check out Examples for real-world use cases
Review Best Practices for production deployment tips