AnomaLog¶

Reproducible log anomaly detection pipelines, from raw logs to deterministic, template-mapped sequences.

End-to-end preprocessing Go from raw logs to parsed, templated, and sequence-ready artifacts in one reproducible pipeline.
Deterministic by design Fingerprinted stages, caching, and stable artifact lineage make runs repeatable and easier to compare.
Modular pipeline stages Swap parsers, template miners, and sequence builders without rewriting downstream workflows.
Explicit sequence construction Build entity-based, fixed-length, or time-windowed sequences with clear split and leakage controls.
Benchmark and custom datasets Use built-in dataset workflows or bring your own data through a consistent schema and labelling model.
Reusable intermediate artifacts Reuse structured intermediate data across runs instead of rebuilding everything from raw logs.

Why AnomaLog exists¶

Many log anomaly detection results are difficult to compare because the preprocessing pipeline is underspecified. Two experiments may claim to use the same dataset while differing in parsing rules, label alignment, template mining, grouping, or split behavior.

AnomaLog treats those preprocessing decisions as explicit pipeline stages rather than hidden scripts or fixed artifacts. The goal is not only convenience, but to make comparisons, ablations, and reruns more defensible.

How it works¶

AnomaLog structures preprocessing as an explicit, reproducible pipeline:

Define a dataset source
Parse and template logs
Group events into sequences
Represent sequences for modeling

from anomalog.parsers import IdentityTemplateParser
from anomalog.presets import bgl
from anomalog.representations import TemplateCountRepresentation

# Deterministic, composable preprocessing pipeline
samples = (
    bgl.template_with(IdentityTemplateParser)
    .build()
    .group_by_entity()
    .with_train_fraction(0.8)
    .represent_with(TemplateCountRepresentation())
)

See Getting Started for the onboarding walkthrough and Pipeline Concepts for the full mental model.

Start here¶

Getting Started - Install AnomaLog and run your first pipeline
Pipeline Concepts - Understand stages, grouping, representations, and reproducibility
Experiments - Run config-driven detector experiments
API Reference - Browse interfaces, built-ins, and module docs
Development - Set up the repo and run checks locally