Representations¶
Representations convert a model-agnostic TemplateSequence into the concrete
input shape expected by a detector.
A representation receives the full TemplateSequence, not just
sequence.templates. That means custom representations can use event timing
(dt_prev_ms), extracted parameters, entity IDs, or any other sequence
metadata when building model inputs.
>>> from anomalog.representations import (
... SequentialRepresentation,
... TemplateCountRepresentation,
... TemplatePhraseRepresentation,
... )
>>> from anomalog.sequences import TemplateSequence
>>> sequence = TemplateSequence(
... events=[
... ("Error on node <*>", ["7"], None),
... ("Error on node <*>", ["8"], 50),
... ],
... label=1,
... entity_ids=["node-7"],
... window_id=3,
... )
>>> SequentialRepresentation().represent(sequence)
['Error on node <*>', 'Error on node <*>']
>>> TemplateCountRepresentation().represent(sequence)
Counter({'Error on node <*>': 2})
>>> TemplatePhraseRepresentation(phrase_ngram_min=1, phrase_ngram_max=1).represent(sequence)
Counter({'error on node <*>': 2, 'error': 2, 'on': 2, 'node': 2})
Use:
SequentialRepresentationfor ordered template streamsTemplateCountRepresentationfor sparse template-count vectorsTemplatePhraseRepresentationfor phrase-count features derived from template text
The built-ins are intentionally template-centric, but that is a choice of those representations rather than a limit of the interface.
Represented outputs are wrapped in SequenceSample, which preserves
entity_ids, label, split_label, and window_id alongside the
representation payload.
You can also define your own representation by implementing
SequenceRepresentation[T] and passing it to represent_with(...).
>>> from dataclasses import dataclass
>>> @dataclass(frozen=True)
... class SequenceSummaryRepresentation:
... name = "sequence_summary"
...
... def represent(self, sequence: TemplateSequence) -> dict[str, int | list[str]]:
... return {
... "entity_count": len(sequence.entity_ids),
... "timed_event_count": sum(
... dt_prev_ms is not None for _, _, dt_prev_ms in sequence.events
... ),
... "entity_ids": sequence.entity_ids,
... }
>>> SequenceSummaryRepresentation().represent(sequence)
{'entity_count': 1, 'timed_event_count': 1, 'entity_ids': ['node-7']}
anomalog.representations¶
Public sequence representation exports.
SequenceRepresentation
¶
Bases: Protocol[TRepresentation]
Protocol for converting full grouped sequences into model inputs.
Implementations receive the complete TemplateSequence, including event
timings, extracted parameters, entity IDs, labels, and split metadata, and
may choose whichever fields are relevant for a detector.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
ClassVar[str]
|
Stable registry/config name for the representation. |
represent(sequence)
¶
Convert one grouped sequence into a representation payload.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sequence
|
TemplateSequence
|
Full grouped sequence carrying events, labels, entity ids, and split metadata. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
TRepresentation |
TRepresentation
|
Detector-specific representation of the sequence. |
SequenceRepresentationView
dataclass
¶
Bases: Generic[TRepresentation]
Lazy iterable over represented sequence samples.
The representation stage is the point where a model decides which parts of
TemplateSequence matter; the full sequence object is passed through to the
representation implementation on each iteration.
Attributes:
| Name | Type | Description |
|---|---|---|
sequences |
SequenceBuilder
|
Underlying sequence builder producing
|
representation |
SequenceRepresentation[TRepresentation]
|
Representation applied to each yielded sequence. |
__iter__()
¶
Yield represented sequence samples.
Yields:
| Type | Description |
|---|---|
SequenceSample[TRepresentation]
|
SequenceSample[TRepresentation]: One represented sample per input template sequence. |
SequenceSample
dataclass
¶
Bases: Generic[TRepresentation]
Model-ready data derived from a TemplateSequence.
TemplateSequence is the grouped log window; SequenceSample is the
representation-specific payload passed to a detector.
Attributes:
| Name | Type | Description |
|---|---|---|
data |
TRepresentation
|
Detector-ready representation payload. |
label |
int
|
Sequence-level anomaly label derived from the source window. |
entity_ids |
list[str]
|
Unique entity ids present in the source window. |
split_label |
SplitLabel
|
Train/test split assigned during sequence building. |
window_id |
int
|
Stable window identifier within the sequence builder. |
as_labeled_example()
¶
from_sequence(sequence, *, data)
classmethod
¶
Build a model-ready sample from one template sequence.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sequence
|
TemplateSequence
|
Source grouped sequence carrying labels and metadata. |
required |
data
|
TRepresentation
|
Representation payload derived from the sequence. |
required |
Returns:
| Type | Description |
|---|---|
SequenceSample[TRepresentation]
|
SequenceSample[TRepresentation]: Sample carrying the represented payload together with the original sequence metadata. |
SequentialRepresentation
dataclass
¶
Bases: SequenceRepresentation[list[str]]
Ordered template-only representation for sequential models.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
ClassVar[str]
|
Registry/config name for the representation. |
represent(sequence)
¶
Return the ordered template stream for one sequence.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sequence
|
TemplateSequence
|
Sequence whose template order should be preserved exactly. |
required |
Returns:
| Type | Description |
|---|---|
list[str]
|
list[str]: Ordered template stream for the sequence. |
TemplateCountRepresentation
dataclass
¶
Bases: SequenceRepresentation[Counter[str]]
Count-based representation that intentionally uses template text only.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
ClassVar[str]
|
Registry/config name for the representation. |
represent(sequence)
¶
Return one template-count vector.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sequence
|
TemplateSequence
|
Sequence whose template frequencies are being counted. |
required |
Returns:
| Type | Description |
|---|---|
Counter[str]
|
Counter[str]: Template-frequency vector for the sequence. |
TemplatePhraseRepresentation
dataclass
¶
Bases: SequenceRepresentation[Counter[str]]
Phrase-count representation derived from template text only.
This expands each template into normalsed full-template phrases and token n-grams. The representation deliberately ignores parameters and timing so phrase-based detectors react only to recurring message wording.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
ClassVar[str]
|
Registry/config name for the representation. |
phrase_ngram_min |
int
|
Smallest token n-gram size to emit. |
phrase_ngram_max |
int
|
Largest token n-gram size to emit. |
__post_init__()
¶
Validate phrase extraction settings.
Raises:
| Type | Description |
|---|---|
ValueError
|
If the configured n-gram bounds are invalid. |
represent(sequence)
¶
Return one phrase-count vector.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sequence
|
TemplateSequence
|
Sequence whose template phrases should be counted. |
required |
Returns:
| Type | Description |
|---|---|
Counter[str]
|
Counter[str]: Phrase-frequency vector for the sequence. |