Skip to content

AnomaLog

PyPI - Version Codecov GitHub Actions Workflow Status GitHub License

Reproducible log anomaly detection pipelines, from raw logs to deterministic, template-mapped sequences.

Get Started Pipeline Concepts Reference

  • End-to-end preprocessing Go from raw logs to parsed, templated, and sequence-ready artifacts in one reproducible pipeline.

  • Deterministic by design Fingerprinted stages, caching, and stable artifact lineage make runs repeatable and easier to compare.

  • Modular pipeline stages Swap parsers, template miners, and sequence builders without rewriting downstream workflows.

  • Explicit sequence construction Build entity-based, fixed-length, or time-windowed sequences with clear split and leakage controls.

  • Benchmark and custom datasets Use built-in dataset workflows or bring your own data through a consistent schema and labelling model.

  • Reusable intermediate artifacts Reuse structured intermediate data across runs instead of rebuilding everything from raw logs.

Why AnomaLog exists

Many log anomaly detection results are difficult to compare because the preprocessing pipeline is underspecified. Two experiments may claim to use the same dataset while differing in parsing rules, label alignment, template mining, grouping, or split behavior.

AnomaLog treats those preprocessing decisions as explicit pipeline stages rather than hidden scripts or fixed artifacts. The goal is not only convenience, but to make comparisons, ablations, and reruns more defensible.

How it works

AnomaLog structures preprocessing as an explicit, reproducible pipeline:

  1. Define a dataset source
  2. Parse and template logs
  3. Group events into sequences
  4. Represent sequences for modeling
from anomalog.parsers import IdentityTemplateParser
from anomalog.presets import bgl
from anomalog.representations import TemplateCountRepresentation

# Deterministic, composable preprocessing pipeline
samples = (
    bgl.template_with(IdentityTemplateParser)
    .build()
    .group_by_entity()
    .with_train_fraction(0.8)
    .represent_with(TemplateCountRepresentation())
)

See Getting Started for the onboarding walkthrough and Pipeline Concepts for the full mental model.

Start here