Labels¶
Labels connect anomaly annotations to the parsed dataset.
This page covers the built-in label readers and lookup helpers used to expose line-level or group-level anomaly labels during dataset building and sequence construction.
>>> from pathlib import Path
>>> from anomalog.labels import CSVReader
>>> reader = CSVReader(relative_path=Path("labels.csv"))
>>> reader.entity_column, reader.label_column
('entity_id', 'anomalous')
>>> reader.with_context(dataset_root=Path("."), sink=None).dataset_root
PosixPath('.')
anomalog.labels¶
Helpers for loading anomaly labels from different sources.
AnomalyLabelLookup
dataclass
¶
Normalised access to anomaly labels.
Both lookup functions return an integer label when one is available, or
None when the source has no label for the requested row or group.
Attributes:
| Name | Type | Description |
|---|---|---|
label_for_line |
Callable[[int], int | None]
|
Returns the anomaly label
for a structured row's stable |
label_for_group |
Callable[[str], int | None]
|
Returns the anomaly label
for a grouped entity identifier, or |
AnomalyLabelReader
¶
Bases: Protocol
Protocol for sources that provide anomaly labels.
Readers may be configured ahead of time, then bound to dataset-specific
resources with with_context immediately before loading.
load()
¶
Materialise a normalised label lookup for the current dataset.
Returns:
| Name | Type | Description |
|---|---|---|
AnomalyLabelLookup |
AnomalyLabelLookup
|
Callables that expose label lookup by stable line order and by grouped entity identifier. |
with_context(*, dataset_root, sink)
¶
Bind dataset-specific runtime context to the reader.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset_root
|
Path
|
Materialised dataset root for path-relative label sources. |
required |
sink
|
StructuredSink
|
Structured sink for readers that need direct access to parsed rows or sink-owned caches. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
AnomalyLabelReader |
AnomalyLabelReader
|
Reader instance ready to load labels for this concrete dataset build. |
CSVReader
dataclass
¶
Bases: AnomalyLabelReader
Reads anomaly labels from a CSV file (group/entity level only).
CSV labels are intentionally group-scoped: they annotate entities/blocks rather than individual structured rows. Invalid or non-integer label values are skipped so malformed rows do not abort the whole dataset build.
Attributes:
| Name | Type | Description |
|---|---|---|
relative_path |
Path
|
CSV path relative to the materialised dataset root. |
dataset_root |
Path | None
|
Bound dataset root used to resolve
|
entity_column |
str
|
CSV column containing the group/entity identifier. |
label_column |
str
|
CSV column containing the integer anomaly label. |
normal_value |
str
|
CSV value corresponding to the normal class. |
anomalous_value |
str
|
CSV value corresponding to the anomalous class. |
load()
¶
Load labels from the configured CSV file into lookup callables.
Returns:
| Name | Type | Description |
|---|---|---|
AnomalyLabelLookup |
AnomalyLabelLookup
|
Lookup functions backed by the configured CSV. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If dataset context is missing or the CSV schema is invalid. |
with_context(*, dataset_root, sink)
¶
Attach dataset context when missing and return a new reader.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset_root
|
Path
|
Dataset root used to resolve the CSV path. |
required |
sink
|
StructuredSink
|
Structured sink for the dataset. Unused for CSV-backed labels. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
CSVReader |
CSVReader
|
Reader bound to the supplied dataset root. |
InlineReader
dataclass
¶
Bases: AnomalyLabelReader
Derives labels directly from the structured sink.
This reader exists for datasets whose parser already exposes anomaly labels inline. It delegates to the sink so sink implementations can use efficient projected scans instead of forcing full row materialisation.
Attributes:
| Name | Type | Description |
|---|---|---|
sink |
StructuredSink | None
|
Bound sink that can supply sparse inline label lookups. |
load()
¶
Collect inline labels from the sink and return lookup callables.
Returns:
| Name | Type | Description |
|---|---|---|
AnomalyLabelLookup |
AnomalyLabelLookup
|
Lookup functions backed by the structured sink. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If no structured sink has been attached. |
RuntimeError
|
If the sink fails while loading inline labels. |
with_context(*, dataset_root, sink)
¶
Attach sink context when missing and return a new reader.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dataset_root
|
Path
|
Dataset root for the current build. Unused for inline labels. |
required |
sink
|
StructuredSink
|
Structured sink that provides inline labels. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
InlineReader |
InlineReader
|
Reader bound to the supplied structured sink. |