Skip to content

Labels

Labels connect anomaly annotations to the parsed dataset.

This page covers the built-in label readers and lookup helpers used to expose line-level or group-level anomaly labels during dataset building and sequence construction.

>>> from pathlib import Path
>>> from anomalog.labels import CSVReader
>>> reader = CSVReader(relative_path=Path("labels.csv"))
>>> reader.entity_column, reader.label_column
('entity_id', 'anomalous')
>>> reader.with_context(dataset_root=Path("."), sink=None).dataset_root
PosixPath('.')

anomalog.labels

Helpers for loading anomaly labels from different sources.

AnomalyLabelLookup dataclass

Normalised access to anomaly labels.

Both lookup functions return an integer label when one is available, or None when the source has no label for the requested row or group.

Attributes:

Name Type Description
label_for_line Callable[[int], int | None]

Returns the anomaly label for a structured row's stable line_order, or None when absent.

label_for_group Callable[[str], int | None]

Returns the anomaly label for a grouped entity identifier, or None when absent.

AnomalyLabelReader

Bases: Protocol

Protocol for sources that provide anomaly labels.

Readers may be configured ahead of time, then bound to dataset-specific resources with with_context immediately before loading.

load()

Materialise a normalised label lookup for the current dataset.

Returns:

Name Type Description
AnomalyLabelLookup AnomalyLabelLookup

Callables that expose label lookup by stable line order and by grouped entity identifier.

with_context(*, dataset_root, sink)

Bind dataset-specific runtime context to the reader.

Parameters:

Name Type Description Default
dataset_root Path

Materialised dataset root for path-relative label sources.

required
sink StructuredSink

Structured sink for readers that need direct access to parsed rows or sink-owned caches.

required

Returns:

Name Type Description
AnomalyLabelReader AnomalyLabelReader

Reader instance ready to load labels for this concrete dataset build.

CSVReader dataclass

Bases: AnomalyLabelReader

Reads anomaly labels from a CSV file (group/entity level only).

CSV labels are intentionally group-scoped: they annotate entities/blocks rather than individual structured rows. Invalid or non-integer label values are skipped so malformed rows do not abort the whole dataset build.

Attributes:

Name Type Description
relative_path Path

CSV path relative to the materialised dataset root.

dataset_root Path | None

Bound dataset root used to resolve relative_path at runtime.

entity_column str

CSV column containing the group/entity identifier.

label_column str

CSV column containing the integer anomaly label.

normal_value str

CSV value corresponding to the normal class.

anomalous_value str

CSV value corresponding to the anomalous class.

load()

Load labels from the configured CSV file into lookup callables.

Returns:

Name Type Description
AnomalyLabelLookup AnomalyLabelLookup

Lookup functions backed by the configured CSV.

Raises:

Type Description
ValueError

If dataset context is missing or the CSV schema is invalid.

with_context(*, dataset_root, sink)

Attach dataset context when missing and return a new reader.

Parameters:

Name Type Description Default
dataset_root Path

Dataset root used to resolve the CSV path.

required
sink StructuredSink

Structured sink for the dataset. Unused for CSV-backed labels.

required

Returns:

Name Type Description
CSVReader CSVReader

Reader bound to the supplied dataset root.

InlineReader dataclass

Bases: AnomalyLabelReader

Derives labels directly from the structured sink.

This reader exists for datasets whose parser already exposes anomaly labels inline. It delegates to the sink so sink implementations can use efficient projected scans instead of forcing full row materialisation.

Attributes:

Name Type Description
sink StructuredSink | None

Bound sink that can supply sparse inline label lookups.

load()

Collect inline labels from the sink and return lookup callables.

Returns:

Name Type Description
AnomalyLabelLookup AnomalyLabelLookup

Lookup functions backed by the structured sink.

Raises:

Type Description
ValueError

If no structured sink has been attached.

RuntimeError

If the sink fails while loading inline labels.

with_context(*, dataset_root, sink)

Attach sink context when missing and return a new reader.

Parameters:

Name Type Description Default
dataset_root Path

Dataset root for the current build. Unused for inline labels.

required
sink StructuredSink

Structured sink that provides inline labels.

required

Returns:

Name Type Description
InlineReader InlineReader

Reader bound to the supplied structured sink.