About AnomaLog¶
Overview
AnomaLog is a framework for reproducible log anomaly detection pipelines, from raw logs to model-ready sequences.
It treats preprocessing as a first-class research artifact, rather than an implicit or one-off step.
Out of scope
It does not prescribe specific models, but provides the foundation for evaluating them rigorously.
Why it exists¶
Most work in log anomaly detection focuses on modelling, while preprocessing is often:
- only partially described and not reproducible
- implemented as ad-hoc scripts
- replaced entirely by preprocessed datasets
The hidden problem
Two experiments using “the same dataset” may differ in preprocessing decisions that are rarely standardised or made explicit:
- parsing logic
- template mining configuration
- sequence construction rules
- train/test splits
- leakage controls
As a result, observed performance differences are often confounded by uncontrolled preprocessing variation rather than modelling improvements.
Pipeline¶
Raw logs
→
Parsing
→
Templates
→
Sequencing
→
Artifacts
Each stage produces a stable, reusable artifact. This makes preprocessing explicit and modular, allowing experiments to be defined in terms of which stage changed, rather than implicitly altering the entire pipeline.
What this enables¶
-
Stage-level ablations Modify individual pipeline components (e.g. parsing or sequencing) and isolate their effect on model performance.
-
Faithful comparisons Ensure performance differences reflect modelling choices rather than hidden preprocessing variation.
-
Deterministic re-execution Reproduce experiments end-to-end from raw logs with consistent ordering, transformations, and splits.
-
Artifact reuse Avoid recomputation by reusing persisted intermediate outputs across experiments.
Takeaway¶
AnomaLog shifts log anomaly detection from model-centric experimentation to also include pipeline-centric experimentation.
By making preprocessing explicit, versionable, and reproducible, it enables controlled comparisons and more reliable conclusions.