User Guide¶
Welcome to the ONAD User Guide! This comprehensive guide will help you master online anomaly detection with ONAD, from basic concepts to advanced deployment strategies.
What is Online Anomaly Detection?¶
Online anomaly detection is the process of identifying unusual patterns or outliers in streaming data as it arrives, without requiring the entire dataset to be available upfront. This approach is essential for:
- Real-time systems that need immediate responses to anomalies
- Large-scale data where storing everything is impractical
- Evolving patterns where the definition of "normal" changes over time
- Resource-constrained environments with limited memory and compute
ONAD Philosophy¶
ONAD is built around several core principles:
Streaming-First Design¶
Every component in ONAD is designed to process data one point at a time, maintaining constant memory usage regardless of stream length.
Composable Architecture¶
Models, transformers, and utilities can be combined in flexible pipelines to solve complex problems.
Production-Ready¶
Built-in logging, error handling, and memory management make ONAD suitable for production deployments.
Type Safety¶
Comprehensive type hints ensure code reliability and better IDE support.
Getting Started¶
1. Choose Your Model¶
ONAD provides several categories of anomaly detection models:
- Forest-based Models: Tree ensemble methods like Isolation Forest
- SVM-based Models: Support Vector Machine approaches
- Statistical Models: Classical statistical methods
- Distance-based Models: K-NN and similarity search
2. Prepare Your Data¶
Use ONAD's data transformation tools:
- Scaling: Normalize features with StandardScaler or MinMaxScaler
- Dimensionality Reduction: Reduce features with Incremental PCA
- Stream Processing: Efficiently load and process data streams
3. Build Your Pipeline¶
Combine components for sophisticated processing:
- Pipeline Construction: Learn to chain transformers and models
- Memory Management: Configure memory limits and monitoring
- Error Handling: Implement robust error recovery
Common Workflows¶
Basic Anomaly Detection¶
from onad.model.iforest import OnlineIsolationForest
# Initialize model
model = OnlineIsolationForest()
# Process streaming data
for data_point in stream:
model.learn_one(data_point)
score = model.score_one(data_point)
if score > threshold:
handle_anomaly(data_point, score)
Data Preprocessing Pipeline¶
from onad.transform.preprocessing.scaler import StandardScaler
from onad.model.svm import IncrementalOneClassSVMAdaptiveKernel
# Create pipeline components
scaler = StandardScaler()
detector = IncrementalOneClassSVMAdaptiveKernel()
# Process data through pipeline
def detect_anomaly(raw_data):
scaler.learn_one(raw_data)
normalized_data = scaler.transform_one(raw_data)
detector.learn_one(normalized_data)
return detector.score_one(normalized_data)
Batch Processing with Streaming Interface¶
from onad.stream import ParquetStreamer, Dataset
with ParquetStreamer(Dataset.FRAUD) as streamer:
for features, label in streamer:
# Process each data point
model.learn_one(features)
score = model.score_one(features)
Topics Covered¶
Core Concepts¶
- Models Overview: Complete guide to all anomaly detection algorithms
- Data Transformers: Preprocessing and feature engineering
- Stream Processing: Efficient data loading and processing
Advanced Topics¶
- Pipeline Construction: Building complex processing workflows
- Model Evaluation: Testing and validating your models
- Best Practices: Production deployment guidelines
Quick Reference¶
Essential Imports¶
# Models
from onad.model.iforest import OnlineIsolationForest
from onad.model.svm import IncrementalOneClassSVMAdaptiveKernel
from onad.model.stat.multi import MovingMahalanobisDistance
# Transformers
from onad.transform.preprocessing.scaler import StandardScaler, MinMaxScaler
from onad.transform.projection.incremental_pca import IncrementalPCA
# Streaming
from onad.stream import ParquetStreamer, Dataset
Common Parameters¶
# Memory management
model = OnlineIsolationForest(window_size=1000)
# Performance tuning
model = OnlineIsolationForest(num_trees=100, max_leaf_samples=32)
# Data preprocessing
scaler = StandardScaler()
pca = IncrementalPCA(n_components=10)
Typical Workflow¶
- Initialize model and transformers
- Learn from each data point with
learn_one()
- Score data points with
score_one()
- Transform data with
transform_one()
(if using transformers) - Monitor model state with logging and metrics
Next Steps
- Start with the Models Overview to understand available algorithms
- Check out Examples for real-world use cases
- Review Best Practices for production deployment tips