Skip to content

Representations

Representations convert a model-agnostic TemplateSequence into the concrete input shape expected by a detector.

A representation receives the full TemplateSequence, not just sequence.templates. That means custom representations can use event timing (dt_prev_ms), extracted parameters, entity IDs, or any other sequence metadata when building model inputs.

>>> from anomalog.representations import (
...     SequentialRepresentation,
...     TemplateCountRepresentation,
...     TemplatePhraseRepresentation,
... )
>>> from anomalog.sequences import TemplateSequence
>>> sequence = TemplateSequence(
...     events=[
...         ("Error on node <*>", ["7"], None),
...         ("Error on node <*>", ["8"], 50),
...     ],
...     label=1,
...     entity_ids=["node-7"],
...     window_id=3,
... )
>>> SequentialRepresentation().represent(sequence)
['Error on node <*>', 'Error on node <*>']
>>> TemplateCountRepresentation().represent(sequence)
Counter({'Error on node <*>': 2})
>>> TemplatePhraseRepresentation(phrase_ngram_min=1, phrase_ngram_max=1).represent(sequence)
Counter({'error on node <*>': 2, 'error': 2, 'on': 2, 'node': 2})

Use:

  • SequentialRepresentation for ordered template streams
  • TemplateCountRepresentation for sparse template-count vectors
  • TemplatePhraseRepresentation for phrase-count features derived from template text

The built-ins are intentionally template-centric, but that is a choice of those representations rather than a limit of the interface.

Represented outputs are wrapped in SequenceSample, which preserves entity_ids, label, split_label, and window_id alongside the representation payload.

You can also define your own representation by implementing SequenceRepresentation[T] and passing it to represent_with(...).

>>> from dataclasses import dataclass
>>> @dataclass(frozen=True)
... class SequenceSummaryRepresentation:
...     name = "sequence_summary"
...
...     def represent(self, sequence: TemplateSequence) -> dict[str, int | list[str]]:
...         return {
...             "entity_count": len(sequence.entity_ids),
...             "timed_event_count": sum(
...                 dt_prev_ms is not None for _, _, dt_prev_ms in sequence.events
...             ),
...             "entity_ids": sequence.entity_ids,
...         }
>>> SequenceSummaryRepresentation().represent(sequence)
{'entity_count': 1, 'timed_event_count': 1, 'entity_ids': ['node-7']}

anomalog.representations

Public sequence representation exports.

SequenceRepresentation

Bases: Protocol[TRepresentation]

Protocol for converting full grouped sequences into model inputs.

Implementations receive the complete TemplateSequence, including event timings, extracted parameters, entity IDs, labels, and split metadata, and may choose whichever fields are relevant for a detector.

Attributes:

Name Type Description
name ClassVar[str]

Stable registry/config name for the representation.

represent(sequence)

Convert one grouped sequence into a representation payload.

Parameters:

Name Type Description Default
sequence TemplateSequence

Full grouped sequence carrying events, labels, entity ids, and split metadata.

required

Returns:

Name Type Description
TRepresentation TRepresentation

Detector-specific representation of the sequence.

SequenceRepresentationView dataclass

Bases: Generic[TRepresentation]

Lazy iterable over represented sequence samples.

The representation stage is the point where a model decides which parts of TemplateSequence matter; the full sequence object is passed through to the representation implementation on each iteration.

Attributes:

Name Type Description
sequences SequenceBuilder

Underlying sequence builder producing TemplateSequence objects lazily.

representation SequenceRepresentation[TRepresentation]

Representation applied to each yielded sequence.

__iter__()

Yield represented sequence samples.

Yields:

Type Description
SequenceSample[TRepresentation]

SequenceSample[TRepresentation]: One represented sample per input template sequence.

iter_labeled_examples()

Yield (x, y) pairs only, intentionally dropping split metadata.

Yields:

Type Description
tuple[TRepresentation, int]

tuple[TRepresentation, int]: Representation payload and label pairs.

SequenceSample dataclass

Bases: Generic[TRepresentation]

Model-ready data derived from a TemplateSequence.

TemplateSequence is the grouped log window; SequenceSample is the representation-specific payload passed to a detector.

Attributes:

Name Type Description
data TRepresentation

Detector-ready representation payload.

label int

Sequence-level anomaly label derived from the source window.

entity_ids list[str]

Unique entity ids present in the source window.

split_label SplitLabel

Train/test split assigned during sequence building.

window_id int

Stable window identifier within the sequence builder.

as_labeled_example()

Return a generic (x, y) example pair.

Returns:

Type Description
tuple[TRepresentation, int]

tuple[TRepresentation, int]: Representation payload and label.

from_sequence(sequence, *, data) classmethod

Build a model-ready sample from one template sequence.

Parameters:

Name Type Description Default
sequence TemplateSequence

Source grouped sequence carrying labels and metadata.

required
data TRepresentation

Representation payload derived from the sequence.

required

Returns:

Type Description
SequenceSample[TRepresentation]

SequenceSample[TRepresentation]: Sample carrying the represented payload together with the original sequence metadata.

SequentialRepresentation dataclass

Bases: SequenceRepresentation[list[str]]

Ordered template-only representation for sequential models.

Attributes:

Name Type Description
name ClassVar[str]

Registry/config name for the representation.

represent(sequence)

Return the ordered template stream for one sequence.

Parameters:

Name Type Description Default
sequence TemplateSequence

Sequence whose template order should be preserved exactly.

required

Returns:

Type Description
list[str]

list[str]: Ordered template stream for the sequence.

TemplateCountRepresentation dataclass

Bases: SequenceRepresentation[Counter[str]]

Count-based representation that intentionally uses template text only.

Attributes:

Name Type Description
name ClassVar[str]

Registry/config name for the representation.

represent(sequence)

Return one template-count vector.

Parameters:

Name Type Description Default
sequence TemplateSequence

Sequence whose template frequencies are being counted.

required

Returns:

Type Description
Counter[str]

Counter[str]: Template-frequency vector for the sequence.

TemplatePhraseRepresentation dataclass

Bases: SequenceRepresentation[Counter[str]]

Phrase-count representation derived from template text only.

This expands each template into normalsed full-template phrases and token n-grams. The representation deliberately ignores parameters and timing so phrase-based detectors react only to recurring message wording.

Attributes:

Name Type Description
name ClassVar[str]

Registry/config name for the representation.

phrase_ngram_min int

Smallest token n-gram size to emit.

phrase_ngram_max int

Largest token n-gram size to emit.

__post_init__()

Validate phrase extraction settings.

Raises:

Type Description
ValueError

If the configured n-gram bounds are invalid.

represent(sequence)

Return one phrase-count vector.

Parameters:

Name Type Description Default
sequence TemplateSequence

Sequence whose template phrases should be counted.

required

Returns:

Type Description
Counter[str]

Counter[str]: Phrase-frequency vector for the sequence.