Skip to content

Top-level API

This page covers the smallest public entrypoint surface of AnomaLog.

Start here when you want the library-level names that most examples import directly, rather than the lower-level building blocks under the submodules.

>>> from anomalog import DatasetSpec, SplitLabel
>>> DatasetSpec("demo").dataset_name
'demo'
>>> SplitLabel.TRAIN.value
'train'

anomalog

Top-level public API for AnomaLog.

DatasetSpec dataclass

Immutable fluent builder for configuring a dataset pipeline.

The builder captures dataset preprocessing choices without executing any pipeline stage immediately. Each fluent method returns a new spec so callers can share partially configured specs safely and only trigger orchestration at build() time.

Attributes:

Name Type Description
dataset_name str

Stable dataset identifier used for cache roots and materialised outputs.

source DatasetSource | None

Source that materialises or locates the raw dataset contents before parsing.

structured_parser StructuredParser | None

Parser that turns raw log lines into structured records.

structured_sink type[StructuredSink]

Sink implementation responsible for persisting structured rows and later grouped iteration.

cache_paths CachePathsConfig

Data/cache roots used by the build.

anomaly_label_reader AnomalyLabelReader | None

Optional anomaly label reader bound after structured parsing.

template_parser type[TemplateParser]

Template parser type used to mine message templates from structured records.

build()

Build and return the templated dataset view.

Returns:

Name Type Description
TemplatedDataset TemplatedDataset

Built dataset with structured rows, labels, and templates attached.

clear_cache()

Delete all local cached artifacts for this dataset.

Raises:

Type Description
ValueError

If the dataset name is empty.

from_source(source)

Bind the raw dataset source for later materialisation.

Parameters:

Name Type Description Default
source DatasetSource

Source strategy that knows how to provide the raw logs for this dataset.

required

Returns:

Name Type Description
DatasetSpec DatasetSpec

New spec with the supplied source attached.

label_with(anomaly_label_reader)

Attach an anomaly label reader to enrich the built dataset.

Parameters:

Name Type Description Default
anomaly_label_reader AnomalyLabelReader

Reader used to resolve per-line or per-entity anomaly labels after parsing.

required

Returns:

Name Type Description
DatasetSpec DatasetSpec

New spec with the supplied label reader attached.

parse_with(structured_parser)

Bind the structured parser that defines log-line semantics.

Parameters:

Name Type Description Default
structured_parser StructuredParser

Parser used during build to convert raw lines into structured records.

required

Returns:

Name Type Description
DatasetSpec DatasetSpec

New spec with the supplied parser attached.

store_with(structured_sink)

Override the structured sink implementation for this dataset.

Parameters:

Name Type Description Default
structured_sink type[StructuredSink]

Sink type that owns persistence and grouped access for structured rows.

required

Returns:

Name Type Description
DatasetSpec DatasetSpec

New spec with the supplied sink type attached.

template_with(template_parser)

Select the template parser implementation used during build.

Parameters:

Name Type Description Default
template_parser type[TemplateParser]

Template parser type trained on the structured dataset before the templated view is returned.

required

Returns:

Name Type Description
DatasetSpec DatasetSpec

New spec with the supplied template parser type attached.

with_cache_paths(cache_paths)

Override the default data and cache roots for this dataset.

Parameters:

Name Type Description Default
cache_paths CachePathsConfig

Explicit roots to use for source materialisation and derived local artifacts.

required

Returns:

Name Type Description
DatasetSpec DatasetSpec

New spec with the supplied cache paths attached.

SplitLabel

Bases: str, Enum

Dataset split membership for a sequence.

Attributes:

Name Type Description
TRAIN

Sequence belongs to the training split.

TEST

Sequence belongs to the evaluation/test split.

IGNORED

Sequence belongs to the fixed train pool but is not used for the current training prefix.