Top-level API¶

This page covers the smallest public entrypoint surface of AnomaLog.

Start here when you want the library-level names that most examples import directly, rather than the lower-level building blocks under the submodules.

>>> from anomalog import DatasetSpec, SplitLabel
>>> DatasetSpec("demo").dataset_name
'demo'
>>> SplitLabel.TRAIN.value
'train'

`anomalog`¶

Top-level public API for AnomaLog.

`DatasetSpec` `dataclass` ¶

Immutable fluent builder for configuring a dataset pipeline.

The builder captures dataset preprocessing choices without executing any pipeline stage immediately. Each fluent method returns a new spec so callers can share partially configured specs safely and only trigger orchestration at build() time.

Attributes:

Name	Type	Description
`dataset_name`	`str`	Stable dataset identifier used for cache roots and materialised outputs.
`source`	`DatasetSource \| None`	Source that materialises or locates the raw dataset contents before parsing.
`structured_parser`	`StructuredParser \| None`	Parser that turns raw log lines into structured records.
`structured_sink`	`type[StructuredSink]`	Sink implementation responsible for persisting structured rows and later grouped iteration.
`cache_paths`	`CachePathsConfig`	Data/cache roots used by the build.
`anomaly_label_reader`	`AnomalyLabelReader \| None`	Optional anomaly label reader bound after structured parsing.
`template_parser`	`type[TemplateParser]`	Template parser type used to mine message templates from structured records.

`build()` ¶

Build and return the templated dataset view.

Returns:

Name	Type	Description
`TemplatedDataset`	`TemplatedDataset`	Built dataset with structured rows, labels, and templates attached.

`clear_cache()` ¶

Delete all local cached artifacts for this dataset.

Raises:

Type	Description
`ValueError`	If the dataset name is empty.

`from_source(source)` ¶

Bind the raw dataset source for later materialisation.

Parameters:

Name	Type	Description	Default
`source`	`DatasetSource`	Source strategy that knows how to provide the raw logs for this dataset.	required

Returns:

Name	Type	Description
`DatasetSpec`	`DatasetSpec`	New spec with the supplied source attached.

`label_with(anomaly_label_reader)` ¶

Attach an anomaly label reader to enrich the built dataset.

Parameters:

Name	Type	Description	Default
`anomaly_label_reader`	`AnomalyLabelReader`	Reader used to resolve per-line or per-entity anomaly labels after parsing.	required

Returns:

Name	Type	Description
`DatasetSpec`	`DatasetSpec`	New spec with the supplied label reader attached.

`parse_with(structured_parser)` ¶

Bind the structured parser that defines log-line semantics.

Parameters:

Name	Type	Description	Default
`structured_parser`	`StructuredParser`	Parser used during build to convert raw lines into structured records.	required

Returns:

Name	Type	Description
`DatasetSpec`	`DatasetSpec`	New spec with the supplied parser attached.

`store_with(structured_sink)` ¶

Override the structured sink implementation for this dataset.

Parameters:

Name	Type	Description	Default
`structured_sink`	`type[StructuredSink]`	Sink type that owns persistence and grouped access for structured rows.	required

Returns:

Name	Type	Description
`DatasetSpec`	`DatasetSpec`	New spec with the supplied sink type attached.

`template_with(template_parser)` ¶

Select the template parser implementation used during build.

Parameters:

Name	Type	Description	Default
`template_parser`	`type[TemplateParser]`	Template parser type trained on the structured dataset before the templated view is returned.	required

Returns:

Name	Type	Description
`DatasetSpec`	`DatasetSpec`	New spec with the supplied template parser type attached.

`with_cache_paths(cache_paths)` ¶

Override the default data and cache roots for this dataset.

Parameters:

Name	Type	Description	Default
`cache_paths`	`CachePathsConfig`	Explicit roots to use for source materialisation and derived local artifacts.	required

Returns:

Name	Type	Description
`DatasetSpec`	`DatasetSpec`	New spec with the supplied cache paths attached.

`SplitLabel` ¶

Bases: str, Enum

Dataset split membership for a sequence.

Attributes:

Name	Type	Description
`TRAIN`		Sequence belongs to the training split.
`TEST`		Sequence belongs to the evaluation/test split.
`IGNORED`		Sequence belongs to the fixed train pool but is not used for the current training prefix.

Top-level API¶

anomalog¶

DatasetSpec dataclass ¶

build() ¶

clear_cache() ¶

from_source(source) ¶

label_with(anomaly_label_reader) ¶

parse_with(structured_parser) ¶

store_with(structured_sink) ¶

template_with(template_parser) ¶

with_cache_paths(cache_paths) ¶

SplitLabel ¶

`anomalog`¶

`DatasetSpec` `dataclass` ¶

`build()` ¶

`clear_cache()` ¶

`from_source(source)` ¶

`label_with(anomaly_label_reader)` ¶

`parse_with(structured_parser)` ¶

`store_with(structured_sink)` ¶

`template_with(template_parser)` ¶

`with_cache_paths(cache_paths)` ¶

`SplitLabel` ¶