Skip to content

Representations

Representations convert a model-agnostic TemplateSequence into the concrete input shape expected by a detector.

A representation receives the full TemplateSequence, not just sequence.templates. That means custom representations can use event timing (dt_prev_ms), extracted parameters, entity IDs, or any other sequence metadata when building model inputs.

>>> from anomalog.representations import (
...     SequentialRepresentation,
...     TemplateCountRepresentation,
...     TemplatePhraseRepresentation,
... )
>>> from anomalog.sequences import TemplateSequence
>>> sequence = TemplateSequence(
...     events=[
...         ("Error on node <*>", ["7"], None),
...         ("Error on node <*>", ["8"], 50),
...     ],
...     label=1,
...     entity_ids=["node-7"],
...     window_id=3,
... )
>>> SequentialRepresentation().represent(sequence)
['Error on node <*>', 'Error on node <*>']
>>> TemplateCountRepresentation().represent(sequence)
Counter({'Error on node <*>': 2})
>>> TemplatePhraseRepresentation(phrase_ngram_min=1, phrase_ngram_max=1).represent(sequence)
Counter({'error on node <*>': 2, 'error': 2, 'on': 2, 'node': 2})

Use:

  • SequentialRepresentation for ordered template streams
  • TemplateCountRepresentation for sparse template-count vectors
  • TemplatePhraseRepresentation for phrase-count features derived from template text

The built-ins are intentionally template-centric, but that is a choice of those representations rather than a limit of the interface.

Represented outputs are wrapped in SequenceSample, which preserves entity_ids, label, split_label, and window_id alongside the representation payload.

You can also define your own representation by implementing SequenceRepresentation[T] and passing it to represent_with(...).

>>> from dataclasses import dataclass
>>> @dataclass(frozen=True)
... class SequenceSummaryRepresentation:
...     name = "sequence_summary"
...
...     def represent(self, sequence: TemplateSequence) -> dict[str, int | list[str]]:
...         return {
...             "entity_count": len(sequence.entity_ids),
...             "timed_event_count": sum(
...                 dt_prev_ms is not None for _, _, dt_prev_ms in sequence.events
...             ),
...             "entity_ids": sequence.entity_ids,
...         }
>>> SequenceSummaryRepresentation().represent(sequence)
{'entity_count': 1, 'timed_event_count': 1, 'entity_ids': ['node-7']}

anomalog.representations

Public sequence representation exports.

SequenceRepresentation

Bases: Protocol[TRepresentation]

Protocol for converting full grouped sequences into model inputs.

Implementations receive the complete TemplateSequence, including event timings, extracted parameters, entity IDs, labels, and split metadata, and may choose whichever fields are relevant for a detector.

represent(sequence)

Convert one grouped sequence into a representation payload.

SequenceRepresentationView dataclass

Bases: Generic[TRepresentation]

Lazy iterable over represented sequence samples.

The representation stage is the point where a model decides which parts of TemplateSequence matter; the full sequence object is passed through to the representation implementation on each iteration.

__iter__()

Yield represented sequence samples.

Yields:

Type Description
SequenceSample[TRepresentation]

SequenceSample[TRepresentation]: One represented sample per input template sequence.

iter_labeled_examples()

Yield (x, y) pairs only, intentionally dropping split metadata.

Yields:

Type Description
tuple[TRepresentation, int]

tuple[TRepresentation, int]: Representation payload and label pairs.

SequenceSample dataclass

Bases: Generic[TRepresentation]

Model-ready data derived from a TemplateSequence.

TemplateSequence is the grouped log window; SequenceSample is the representation-specific payload passed to a detector.

as_labeled_example()

Return a generic (x, y) example pair.

Returns:

Type Description
tuple[TRepresentation, int]

tuple[TRepresentation, int]: Representation payload and label.

from_sequence(sequence, *, data) classmethod

Build a model-ready sample from one template sequence.

Parameters:

Name Type Description Default
sequence TemplateSequence

Source grouped sequence carrying labels and metadata.

required
data TRepresentation

Representation payload derived from the sequence.

required

Returns:

Type Description
SequenceSample[TRepresentation]

SequenceSample[TRepresentation]: Sample carrying the represented payload together with the original sequence metadata.

SequentialRepresentation dataclass

Bases: SequenceRepresentation[list[str]]

Ordered template-only representation for sequential models.

represent(sequence)

Return the ordered template stream for one sequence.

TemplateCountRepresentation dataclass

Bases: SequenceRepresentation[Counter[str]]

Count-based representation that intentionally uses template text only.

represent(sequence)

Return one template-count vector.

TemplatePhraseRepresentation dataclass

Bases: SequenceRepresentation[Counter[str]]

Phrase-count representation derived from template text only.

__post_init__()

Validate phrase extraction settings.

Raises:

Type Description
ValueError

If the configured n-gram bounds are invalid.

represent(sequence)

Return one phrase-count vector.