Examples¶

Welcome to the ONAD examples! This section provides practical, real-world examples of using ONAD for various anomaly detection scenarios. Each example includes complete code, explanations, and best practices.

Example Categories¶

Basic Usage¶

Perfect for getting started with ONAD:

Basic Anomaly Detection: Simple streaming anomaly detection
Data Preprocessing: Feature scaling and transformation
Model Comparison: Compare different algorithms

Model-Specific Examples¶

Deep dives into specific anomaly detection algorithms:

Forest-based Models: Isolation Forest and Mondrian Forest examples
SVM-based Models: Support Vector Machine approaches
Statistical Models: Classical statistical methods

Advanced Pipelines¶

Complex scenarios and production-ready examples:

Custom Pipelines: Building sophisticated detection systems
Multi-model Ensembles: Combining multiple algorithms
Real-time Processing: High-throughput streaming

Quick Start Examples¶

30-Second Quick Start¶

from onad.model.iforest import OnlineIsolationForest
from onad.stream import ParquetStreamer, Dataset

# Initialize model
model = OnlineIsolationForest()

# Process built-in dataset
with ParquetStreamer(Dataset.FRAUD) as streamer:
    for features, label in streamer:
        model.learn_one(features)
        score = model.score_one(features)

        if score > 0.8:  # High anomaly threshold
            print(f"Anomaly detected! Score: {score:.3f}")

2-Minute Data Pipeline¶

from onad.transform.preprocessing.scaler import StandardScaler
from onad.model.iforest import OnlineIsolationForest

# Create preprocessing pipeline
scaler = StandardScaler()
detector = OnlineIsolationForest(num_trees=100)


# Process your data
def detect_anomalies(data_stream, threshold=0.7):
    anomalies = []

    for data_point in data_stream:
        # Preprocess
        scaler.learn_one(data_point)
        scaled_data = scaler.transform_one(data_point)

        # Detect
        detector.learn_one(scaled_data)
        score = detector.score_one(scaled_data)

        if score > threshold:
            anomalies.append((data_point, score))

    return anomalies


# Your data stream (replace with your data source)
data_stream = [
    {'temperature': 25.5, 'pressure': 1013.2, 'humidity': 60.0},
    {'temperature': 80.0, 'pressure': 950.0, 'humidity': 90.0},  # Anomaly
    # ... more data points
]

anomalies = detect_anomalies(data_stream)
print(f"Found {len(anomalies)} anomalies")

5-Minute Production Setup¶

import logging
from onad.transform.preprocessing.scaler import StandardScaler
from onad.transform.projection.incremental_pca import IncrementalPCA
from onad.model.iforest import OnlineIsolationForest

# Configure logging
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)


class ProductionAnomalyDetector:
    def __init__(self):
        # Preprocessing pipeline
        self.scaler = StandardScaler()
        self.pca = IncrementalPCA(n_components=10)

        # Detection model
        self.detector = OnlineIsolationForest(
            num_trees=100,
            window_size=2000,
            max_leaf_samples=32
        )

        # Monitoring
        self.processed_count = 0
        self.anomaly_count = 0

    def process_data_point(self, data_point, threshold=0.8):
        """Process single data point"""
        try:
            # Preprocessing
            self.scaler.learn_one(data_point)
            scaled = self.scaler.transform_one(data_point)

            self.pca.learn_one(scaled)
            reduced = self.pca.transform_one(scaled)

            # Anomaly detection
            self.detector.learn_one(reduced)
            score = self.detector.score_one(reduced)

            # Update counters
            self.processed_count += 1
            is_anomaly = score > threshold

            if is_anomaly:
                self.anomaly_count += 1
                logger.warning(f"Anomaly detected: {score:.3f}")

            # Periodic reporting
            if self.processed_count % 1000 == 0:
                rate = self.anomaly_count / self.processed_count
                logger.info(f"Processed {self.processed_count} points, {rate:.1%} anomalies")

            return {
                'score': score,
                'is_anomaly': is_anomaly,
                'processed_count': self.processed_count
            }

        except Exception as e:
            logger.error(f"Processing error: {e}")
            return {'error': str(e)}


# Usage
detector = ProductionAnomalyDetector()

# Process your stream
for data_point in your_data_stream:
    result = detector.process_data_point(data_point)

    if result.get('is_anomaly'):
        handle_anomaly(data_point, result['score'])

Use Case Examples¶

IoT Sensor Monitoring¶

Monitor industrial sensors for equipment failures:

# Sensor data example
sensor_data = {
    'temperature': 75.2,    # Celsius
    'pressure': 2.1,        # Bar
    'vibration': 0.05,      # G-force
    'current': 12.5,        # Amperes
    'voltage': 230.0        # Volts
}

# Detect sensor anomalies
model = OnlineIsolationForest(num_trees=50, window_size=1000)
model.learn_one(sensor_data)
score = model.score_one(sensor_data)

if score > 0.8:
    print("Equipment maintenance required!")

Network Security¶

Detect network intrusions and unusual traffic:

# Network connection data
connection_data = {
    'duration': 120,           # seconds
    'bytes_sent': 1024,       # bytes
    'bytes_received': 8192,   # bytes
    'packets_sent': 15,       # count
    'packets_received': 20,   # count
    'port': 80                # destination port
}

# Multi-model approach for security
statistical_model = MovingMahalanobisDistance(window_size=500)
forest_model = OnlineIsolationForest(num_trees=100)

# Get scores from both models
statistical_model.learn_one(connection_data)
stat_score = statistical_model.score_one(connection_data)

forest_model.learn_one(connection_data)
forest_score = forest_model.score_one(connection_data)

# Combined decision
combined_score = 0.6 * stat_score + 0.4 * forest_score
if combined_score > 0.7:
    print("Potential security threat detected!")

Financial Fraud Detection¶

Monitor transactions for fraudulent activities:

# Transaction data
transaction = {
    'amount': 1500.00,        # USD
    'merchant_category': 5814, # MCC code
    'hour_of_day': 14,        # 0-23
    'day_of_week': 2,         # 0-6
    'account_age_days': 450,  # days
    'location_risk': 0.2      # 0-1 risk score
}

# Feature scaling for mixed data types
scaler = StandardScaler()
fraud_detector = IncrementalOneClassSVMAdaptiveKernel(nu=0.05)

# Process transaction
scaler.learn_one(transaction)
scaled_transaction = scaler.transform_one(transaction)

fraud_detector.learn_one(scaled_transaction)
fraud_score = fraud_detector.score_one(scaled_transaction)

if fraud_score > 0.9:
    print("High-risk transaction - manual review required")
elif fraud_score > 0.7:
    print("Moderate risk - additional verification needed")

Code Organization¶

All examples follow a consistent structure:

Problem Description: What we're trying to detect
Data Setup: How to prepare and load data
Model Configuration: Choosing and configuring algorithms
Processing Pipeline: Step-by-step data processing
Results Interpretation: Understanding and acting on results
Extensions: Ideas for further development

Running the Examples¶

Prerequisites¶

Make sure you have ONAD installed with example dependencies:

pip install onad[eval]

Data Requirements¶

Most examples use: - Built-in ONAD datasets (no additional data needed) - Synthetic data generation (included in examples) - Public datasets with download instructions

Example Scripts¶

Each example page includes: - Complete, runnable Python scripts - Jupyter notebook versions (where applicable) - Docker containerized examples for complex setups - Command-line interfaces for batch processing

Best Practices from Examples¶

Model Selection¶

Start with OnlineIsolationForest for general-purpose detection
Use statistical models for time-series data
Combine multiple models for higher accuracy

Data Preprocessing¶

Always scale features with different ranges
Use PCA for high-dimensional data
Validate data quality before processing

Threshold Selection¶

Start with 95th percentile of normal data scores
Adjust based on business requirements (false positive vs false negative costs)
Use adaptive thresholds for evolving data patterns

Performance Optimization¶

Process data in batches when possible
Monitor memory usage for long-running processes
Use appropriate window sizes for your data velocity

Getting Started

Start with Basic Usage for fundamental concepts
Try Forest Models for general-purpose anomaly detection
Explore Custom Pipelines for production scenarios
Check Statistical Models for time-series data

Contributing Examples

Have an interesting use case? We welcome community contributions! See our Contributing Guide for how to share your examples.