Skip to content

Sequences

Sequences are the point where a templated dataset becomes a modeling dataset.

This page covers the grouping builders, the TemplateSequence shape, and the split semantics that determine how sequences are assigned to train and test.

>>> from anomalog.sequences import SplitLabel, TemplateSequence
>>> sequence = TemplateSequence(
...     events=[("template <*>", ["x"], None), ("template <*>", ["y"], 10)],
...     label=0,
...     entity_ids=["node-1"],
...     window_id=7,
...     split_label=SplitLabel.TRAIN,
... )
>>> sequence.sole_entity_id
'node-1'
>>> sequence.templates
['template <*>', 'template <*>']
>>> sequence.split_label.value
'train'

anomalog.sequences

Utilities for building template sequences from structured log lines.

The module groups parsed log lines into windows (entity, fixed-size, or time-based) and decorates them with inferred templates and anomaly labels.

EntitySequenceBuilder dataclass

Bases: SequenceBuilder

Sequence builder for per-entity grouping.

mode property

Return the grouping strategy for this builder.

__iter__()

Iterate over template sequences yielded by the configured grouping.

Yields:

Name Type Description
TemplateSequence TemplateSequence

One grouped and template-enriched sequence.

Raises:

Type Description
ValueError

If the requested train split is impossible for the configured grouping and constraints.

build_split_summary(*, sequence_count, train_sequence_count, train_label_counts, test_label_counts)

Describe requested versus effective split semantics for one run.

Parameters:

Name Type Description Default
sequence_count int

Total number of generated sequences.

required
train_sequence_count int

Number of sequences assigned to train.

required
train_label_counts dict[int, int]

Label counts in the train split.

required
test_label_counts dict[int, int]

Label counts in the test split.

required

Returns:

Name Type Description
SequenceSplitSummary SequenceSplitSummary

Requested and effective split metrics.

eligible_train_sequence_count(*, sequence_count, train_label_counts, test_label_counts)

Return the sequences eligible for train-fraction accounting.

Parameters:

Name Type Description Default
sequence_count int

Total number of generated entity sequences.

required
train_label_counts dict[int, int]

Label counts assigned to the train split.

required
test_label_counts dict[int, int]

Label counts assigned to the test split.

required

Returns:

Name Type Description
int int

Eligible entity-sequence count under the current policy.

from_dataset(td) classmethod

Create an entity-grouped builder from a templated dataset.

Parameters:

Name Type Description Default
td TemplatedDataset

Templated dataset to bind into the builder.

required

Returns:

Name Type Description
Self Self

Builder bound to the templated dataset.

represent_with(representation)

Return a lazy builder that applies a representation per sequence.

Parameters:

Name Type Description Default
representation SequenceRepresentation[TRepresentation]

Sequence representation to apply lazily to each built sequence.

required

Returns:

Type Description
SequenceRepresentationView[TRepresentation]

SequenceRepresentationView[TRepresentation]: Lazy represented view of the generated sequences.

with_train_fraction(train_frac)

Return a copy with an updated train/test split fraction.

Parameters:

Name Type Description Default
train_frac float

Requested fraction of eligible sequences to assign to the train split.

required

Returns:

Name Type Description
Self Self

Copy with updated train/test split fraction.

with_train_on_normal_entities_only(*, enabled=True)

Limit training sequences to entities without anomalies.

Parameters:

Name Type Description Default
enabled bool

Whether to restrict train sequences to normal entities only.

True

Returns:

Name Type Description
Self Self

Copy with updated normal-only training behavior.

FixedSequenceBuilder dataclass

Bases: SequenceBuilder

Sequence builder for fixed-size window grouping.

mode property

Return the grouping strategy for this builder.

__iter__()

Iterate over template sequences yielded by the configured grouping.

Yields:

Name Type Description
TemplateSequence TemplateSequence

One grouped and template-enriched sequence.

Raises:

Type Description
ValueError

If the requested train split is impossible for the configured grouping and constraints.

build_split_summary(*, sequence_count, train_sequence_count, train_label_counts, test_label_counts)

Describe requested versus effective split semantics for one run.

Parameters:

Name Type Description Default
sequence_count int

Total number of generated sequences.

required
train_sequence_count int

Number of sequences assigned to train.

required
train_label_counts dict[int, int]

Label counts in the train split.

required
test_label_counts dict[int, int]

Label counts in the test split.

required

Returns:

Name Type Description
SequenceSplitSummary SequenceSplitSummary

Requested and effective split metrics.

eligible_train_sequence_count(*, sequence_count, train_label_counts, test_label_counts)

Return the sequences eligible for train-fraction accounting.

Parameters:

Name Type Description Default
sequence_count int

Total number of generated sequences.

required
train_label_counts dict[int, int]

Label counts assigned to the train split.

required
test_label_counts dict[int, int]

Label counts assigned to the test split.

required

Returns:

Name Type Description
int int

Count of sequences considered eligible for train-fraction calculations for this grouping strategy.

represent_with(representation)

Return a lazy builder that applies a representation per sequence.

Parameters:

Name Type Description Default
representation SequenceRepresentation[TRepresentation]

Sequence representation to apply lazily to each built sequence.

required

Returns:

Type Description
SequenceRepresentationView[TRepresentation]

SequenceRepresentationView[TRepresentation]: Lazy represented view of the generated sequences.

with_train_fraction(train_frac)

Return a copy with an updated train/test split fraction.

Parameters:

Name Type Description Default
train_frac float

Requested fraction of eligible sequences to assign to the train split.

required

Returns:

Name Type Description
Self Self

Copy with updated train/test split fraction.

GroupingMode

Bases: str, Enum

Strategy for grouping structured lines into sequences.

SequenceBuilder dataclass

Common sequence-building behavior shared across grouping strategies.

mode property

Return the grouping strategy for this builder.

__iter__()

Iterate over template sequences yielded by the configured grouping.

Yields:

Name Type Description
TemplateSequence TemplateSequence

One grouped and template-enriched sequence.

Raises:

Type Description
ValueError

If the requested train split is impossible for the configured grouping and constraints.

build_split_summary(*, sequence_count, train_sequence_count, train_label_counts, test_label_counts)

Describe requested versus effective split semantics for one run.

Parameters:

Name Type Description Default
sequence_count int

Total number of generated sequences.

required
train_sequence_count int

Number of sequences assigned to train.

required
train_label_counts dict[int, int]

Label counts in the train split.

required
test_label_counts dict[int, int]

Label counts in the test split.

required

Returns:

Name Type Description
SequenceSplitSummary SequenceSplitSummary

Requested and effective split metrics.

eligible_train_sequence_count(*, sequence_count, train_label_counts, test_label_counts)

Return the sequences eligible for train-fraction accounting.

Parameters:

Name Type Description Default
sequence_count int

Total number of generated sequences.

required
train_label_counts dict[int, int]

Label counts assigned to the train split.

required
test_label_counts dict[int, int]

Label counts assigned to the test split.

required

Returns:

Name Type Description
int int

Count of sequences considered eligible for train-fraction calculations for this grouping strategy.

represent_with(representation)

Return a lazy builder that applies a representation per sequence.

Parameters:

Name Type Description Default
representation SequenceRepresentation[TRepresentation]

Sequence representation to apply lazily to each built sequence.

required

Returns:

Type Description
SequenceRepresentationView[TRepresentation]

SequenceRepresentationView[TRepresentation]: Lazy represented view of the generated sequences.

with_train_fraction(train_frac)

Return a copy with an updated train/test split fraction.

Parameters:

Name Type Description Default
train_frac float

Requested fraction of eligible sequences to assign to the train split.

required

Returns:

Name Type Description
Self Self

Copy with updated train/test split fraction.

SequenceSplitSummary dataclass

Serializable summary of requested versus effective split behavior.

as_dict()

Return a stable JSON-friendly representation.

Returns:

Type Description
dict[str, int | float | bool | str]

dict[str, int | float | bool | str]: Serialized split summary.

SplitLabel

Bases: str, Enum

Dataset split membership for a sequence.

TemplateSequence dataclass

Grouped log window before any model-specific representation is applied.

This keeps sequence semantics such as event ordering, labels, and entity membership. Model inputs derived from it live in SequenceSample.

sole_entity_id property

Return the entity id when the sequence belongs to exactly one entity.

If multiple entities appear in the window, None is returned to avoid implying a single owning entity.

templates property

Return the ordered template strings for this sequence.

TimeSequenceBuilder dataclass

Bases: SequenceBuilder

Sequence builder for time-window grouping.

mode property

Return the grouping strategy for this builder.

__iter__()

Iterate over template sequences yielded by the configured grouping.

Yields:

Name Type Description
TemplateSequence TemplateSequence

One grouped and template-enriched sequence.

Raises:

Type Description
ValueError

If the requested train split is impossible for the configured grouping and constraints.

build_split_summary(*, sequence_count, train_sequence_count, train_label_counts, test_label_counts)

Describe requested versus effective split semantics for one run.

Parameters:

Name Type Description Default
sequence_count int

Total number of generated sequences.

required
train_sequence_count int

Number of sequences assigned to train.

required
train_label_counts dict[int, int]

Label counts in the train split.

required
test_label_counts dict[int, int]

Label counts in the test split.

required

Returns:

Name Type Description
SequenceSplitSummary SequenceSplitSummary

Requested and effective split metrics.

eligible_train_sequence_count(*, sequence_count, train_label_counts, test_label_counts)

Return the sequences eligible for train-fraction accounting.

Parameters:

Name Type Description Default
sequence_count int

Total number of generated sequences.

required
train_label_counts dict[int, int]

Label counts assigned to the train split.

required
test_label_counts dict[int, int]

Label counts assigned to the test split.

required

Returns:

Name Type Description
int int

Count of sequences considered eligible for train-fraction calculations for this grouping strategy.

represent_with(representation)

Return a lazy builder that applies a representation per sequence.

Parameters:

Name Type Description Default
representation SequenceRepresentation[TRepresentation]

Sequence representation to apply lazily to each built sequence.

required

Returns:

Type Description
SequenceRepresentationView[TRepresentation]

SequenceRepresentationView[TRepresentation]: Lazy represented view of the generated sequences.

with_train_fraction(train_frac)

Return a copy with an updated train/test split fraction.

Parameters:

Name Type Description Default
train_frac float

Requested fraction of eligible sequences to assign to the train split.

required

Returns:

Name Type Description
Self Self

Copy with updated train/test split fraction.