Top-level API¶
This page covers the smallest public entrypoint surface of AnomaLog.
Start here when you want the library-level names that most examples import directly, rather than the lower-level building blocks under the submodules.
>>> from anomalog import DatasetSpec, SplitLabel
>>> DatasetSpec("demo").dataset_name
'demo'
>>> SplitLabel.TRAIN.value
'train'
anomalog¶
Top-level public API for AnomaLog.
DatasetSpec
dataclass
¶
Immutable fluent builder for configuring a dataset pipeline.
The builder captures dataset preprocessing choices without executing any
pipeline stage immediately. Each fluent method returns a new spec so callers
can share partially configured specs safely and only trigger orchestration at
build() time.
Attributes:
| Name | Type | Description |
|---|---|---|
dataset_name |
str
|
Stable dataset identifier used for cache roots and materialised outputs. |
source |
DatasetSource | None
|
Source that materialises or locates the raw dataset contents before parsing. |
structured_parser |
StructuredParser | None
|
Parser that turns raw log lines into structured records. |
structured_sink |
type[StructuredSink]
|
Sink implementation responsible for persisting structured rows and later grouped iteration. |
cache_paths |
CachePathsConfig
|
Data/cache roots used by the build. |
anomaly_label_reader |
AnomalyLabelReader | None
|
Optional anomaly label reader bound after structured parsing. |
template_parser |
type[TemplateParser]
|
Template parser type used to mine message templates from structured records. |
build()
¶
Build and return the templated dataset view.
Returns:
| Name | Type | Description |
|---|---|---|
TemplatedDataset |
TemplatedDataset
|
Built dataset with structured rows, labels, and templates attached. |
clear_cache()
¶
Delete all local cached artifacts for this dataset.
Raises:
| Type | Description |
|---|---|
ValueError
|
If the dataset name is empty. |
from_source(source)
¶
Bind the raw dataset source for later materialisation.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
source
|
DatasetSource
|
Source strategy that knows how to provide the raw logs for this dataset. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
DatasetSpec |
DatasetSpec
|
New spec with the supplied source attached. |
label_with(anomaly_label_reader)
¶
Attach an anomaly label reader to enrich the built dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
anomaly_label_reader
|
AnomalyLabelReader
|
Reader used to resolve per-line or per-entity anomaly labels after parsing. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
DatasetSpec |
DatasetSpec
|
New spec with the supplied label reader attached. |
parse_with(structured_parser)
¶
Bind the structured parser that defines log-line semantics.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
structured_parser
|
StructuredParser
|
Parser used during build to convert raw lines into structured records. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
DatasetSpec |
DatasetSpec
|
New spec with the supplied parser attached. |
store_with(structured_sink)
¶
Override the structured sink implementation for this dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
structured_sink
|
type[StructuredSink]
|
Sink type that owns persistence and grouped access for structured rows. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
DatasetSpec |
DatasetSpec
|
New spec with the supplied sink type attached. |
template_with(template_parser)
¶
Select the template parser implementation used during build.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
template_parser
|
type[TemplateParser]
|
Template parser type trained on the structured dataset before the templated view is returned. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
DatasetSpec |
DatasetSpec
|
New spec with the supplied template parser type attached. |
with_cache_paths(cache_paths)
¶
Override the default data and cache roots for this dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cache_paths
|
CachePathsConfig
|
Explicit roots to use for source materialisation and derived local artifacts. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
DatasetSpec |
DatasetSpec
|
New spec with the supplied cache paths attached. |