Sequences¶
Sequences are the point where a templated dataset becomes a modeling dataset.
This page covers the grouping builders, the TemplateSequence shape, and the
split semantics that determine how sequences are assigned to train and test.
>>> from anomalog.sequences import SplitLabel, TemplateSequence
>>> sequence = TemplateSequence(
... events=[("template <*>", ["x"], None), ("template <*>", ["y"], 10)],
... label=0,
... entity_ids=["node-1"],
... window_id=7,
... split_label=SplitLabel.TRAIN,
... )
>>> sequence.sole_entity_id
'node-1'
>>> sequence.templates
['template <*>', 'template <*>']
>>> sequence.split_label.value
'train'
anomalog.sequences¶
Utilities for building template sequences from structured log lines.
The module groups parsed log lines into windows (entity, fixed-size, or time-based) and decorates them with inferred templates and anomaly labels.
EntitySequenceBuilder
dataclass
¶
Bases: SequenceBuilder
Sequence builder for per-entity grouping.
mode
property
¶
Return the grouping strategy for this builder.
__iter__()
¶
Iterate over template sequences yielded by the configured grouping.
Yields:
| Name | Type | Description |
|---|---|---|
TemplateSequence |
TemplateSequence
|
One grouped and template-enriched sequence. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the requested train split is impossible for the configured grouping and constraints. |
build_split_summary(*, sequence_count, train_sequence_count, train_label_counts, test_label_counts)
¶
Describe requested versus effective split semantics for one run.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sequence_count
|
int
|
Total number of generated sequences. |
required |
train_sequence_count
|
int
|
Number of sequences assigned to train. |
required |
train_label_counts
|
dict[int, int]
|
Label counts in the train split. |
required |
test_label_counts
|
dict[int, int]
|
Label counts in the test split. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
SequenceSplitSummary |
SequenceSplitSummary
|
Requested and effective split metrics. |
eligible_train_sequence_count(*, sequence_count, train_label_counts, test_label_counts)
¶
Return the sequences eligible for train-fraction accounting.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sequence_count
|
int
|
Total number of generated entity sequences. |
required |
train_label_counts
|
dict[int, int]
|
Label counts assigned to the train split. |
required |
test_label_counts
|
dict[int, int]
|
Label counts assigned to the test split. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
Eligible entity-sequence count under the current policy. |
from_dataset(td)
classmethod
¶
Create an entity-grouped builder from a templated dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
td
|
TemplatedDataset
|
Templated dataset to bind into the builder. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Self |
Self
|
Builder bound to the templated dataset. |
represent_with(representation)
¶
Return a lazy builder that applies a representation per sequence.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
representation
|
SequenceRepresentation[TRepresentation]
|
Sequence representation to apply lazily to each built sequence. |
required |
Returns:
| Type | Description |
|---|---|
SequenceRepresentationView[TRepresentation]
|
SequenceRepresentationView[TRepresentation]: Lazy represented view of the generated sequences. |
with_train_fraction(train_frac)
¶
Return a copy with an updated train/test split fraction.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
train_frac
|
float
|
Requested fraction of eligible sequences to assign to the train split. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Self |
Self
|
Copy with updated train/test split fraction. |
with_train_on_normal_entities_only(*, enabled=True)
¶
Limit training sequences to entities without anomalies.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
enabled
|
bool
|
Whether to restrict train sequences to normal entities only. |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
Self |
Self
|
Copy with updated normal-only training behavior. |
FixedSequenceBuilder
dataclass
¶
Bases: SequenceBuilder
Sequence builder for fixed-size window grouping.
mode
property
¶
Return the grouping strategy for this builder.
__iter__()
¶
Iterate over template sequences yielded by the configured grouping.
Yields:
| Name | Type | Description |
|---|---|---|
TemplateSequence |
TemplateSequence
|
One grouped and template-enriched sequence. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the requested train split is impossible for the configured grouping and constraints. |
build_split_summary(*, sequence_count, train_sequence_count, train_label_counts, test_label_counts)
¶
Describe requested versus effective split semantics for one run.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sequence_count
|
int
|
Total number of generated sequences. |
required |
train_sequence_count
|
int
|
Number of sequences assigned to train. |
required |
train_label_counts
|
dict[int, int]
|
Label counts in the train split. |
required |
test_label_counts
|
dict[int, int]
|
Label counts in the test split. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
SequenceSplitSummary |
SequenceSplitSummary
|
Requested and effective split metrics. |
eligible_train_sequence_count(*, sequence_count, train_label_counts, test_label_counts)
¶
Return the sequences eligible for train-fraction accounting.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sequence_count
|
int
|
Total number of generated sequences. |
required |
train_label_counts
|
dict[int, int]
|
Label counts assigned to the train split. |
required |
test_label_counts
|
dict[int, int]
|
Label counts assigned to the test split. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
Count of sequences considered eligible for train-fraction calculations for this grouping strategy. |
represent_with(representation)
¶
Return a lazy builder that applies a representation per sequence.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
representation
|
SequenceRepresentation[TRepresentation]
|
Sequence representation to apply lazily to each built sequence. |
required |
Returns:
| Type | Description |
|---|---|
SequenceRepresentationView[TRepresentation]
|
SequenceRepresentationView[TRepresentation]: Lazy represented view of the generated sequences. |
with_train_fraction(train_frac)
¶
Return a copy with an updated train/test split fraction.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
train_frac
|
float
|
Requested fraction of eligible sequences to assign to the train split. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Self |
Self
|
Copy with updated train/test split fraction. |
SequenceBuilder
dataclass
¶
Common sequence-building behavior shared across grouping strategies.
mode
property
¶
Return the grouping strategy for this builder.
__iter__()
¶
Iterate over template sequences yielded by the configured grouping.
Yields:
| Name | Type | Description |
|---|---|---|
TemplateSequence |
TemplateSequence
|
One grouped and template-enriched sequence. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the requested train split is impossible for the configured grouping and constraints. |
build_split_summary(*, sequence_count, train_sequence_count, train_label_counts, test_label_counts)
¶
Describe requested versus effective split semantics for one run.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sequence_count
|
int
|
Total number of generated sequences. |
required |
train_sequence_count
|
int
|
Number of sequences assigned to train. |
required |
train_label_counts
|
dict[int, int]
|
Label counts in the train split. |
required |
test_label_counts
|
dict[int, int]
|
Label counts in the test split. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
SequenceSplitSummary |
SequenceSplitSummary
|
Requested and effective split metrics. |
eligible_train_sequence_count(*, sequence_count, train_label_counts, test_label_counts)
¶
Return the sequences eligible for train-fraction accounting.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sequence_count
|
int
|
Total number of generated sequences. |
required |
train_label_counts
|
dict[int, int]
|
Label counts assigned to the train split. |
required |
test_label_counts
|
dict[int, int]
|
Label counts assigned to the test split. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
Count of sequences considered eligible for train-fraction calculations for this grouping strategy. |
represent_with(representation)
¶
Return a lazy builder that applies a representation per sequence.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
representation
|
SequenceRepresentation[TRepresentation]
|
Sequence representation to apply lazily to each built sequence. |
required |
Returns:
| Type | Description |
|---|---|
SequenceRepresentationView[TRepresentation]
|
SequenceRepresentationView[TRepresentation]: Lazy represented view of the generated sequences. |
with_train_fraction(train_frac)
¶
Return a copy with an updated train/test split fraction.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
train_frac
|
float
|
Requested fraction of eligible sequences to assign to the train split. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Self |
Self
|
Copy with updated train/test split fraction. |
SequenceSplitSummary
dataclass
¶
TemplateSequence
dataclass
¶
Grouped log window before any model-specific representation is applied.
This keeps sequence semantics such as event ordering, labels, and entity
membership. Model inputs derived from it live in SequenceSample.
TimeSequenceBuilder
dataclass
¶
Bases: SequenceBuilder
Sequence builder for time-window grouping.
mode
property
¶
Return the grouping strategy for this builder.
__iter__()
¶
Iterate over template sequences yielded by the configured grouping.
Yields:
| Name | Type | Description |
|---|---|---|
TemplateSequence |
TemplateSequence
|
One grouped and template-enriched sequence. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the requested train split is impossible for the configured grouping and constraints. |
build_split_summary(*, sequence_count, train_sequence_count, train_label_counts, test_label_counts)
¶
Describe requested versus effective split semantics for one run.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sequence_count
|
int
|
Total number of generated sequences. |
required |
train_sequence_count
|
int
|
Number of sequences assigned to train. |
required |
train_label_counts
|
dict[int, int]
|
Label counts in the train split. |
required |
test_label_counts
|
dict[int, int]
|
Label counts in the test split. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
SequenceSplitSummary |
SequenceSplitSummary
|
Requested and effective split metrics. |
eligible_train_sequence_count(*, sequence_count, train_label_counts, test_label_counts)
¶
Return the sequences eligible for train-fraction accounting.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sequence_count
|
int
|
Total number of generated sequences. |
required |
train_label_counts
|
dict[int, int]
|
Label counts assigned to the train split. |
required |
test_label_counts
|
dict[int, int]
|
Label counts assigned to the test split. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
Count of sequences considered eligible for train-fraction calculations for this grouping strategy. |
represent_with(representation)
¶
Return a lazy builder that applies a representation per sequence.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
representation
|
SequenceRepresentation[TRepresentation]
|
Sequence representation to apply lazily to each built sequence. |
required |
Returns:
| Type | Description |
|---|---|
SequenceRepresentationView[TRepresentation]
|
SequenceRepresentationView[TRepresentation]: Lazy represented view of the generated sequences. |
with_train_fraction(train_frac)
¶
Return a copy with an updated train/test split fraction.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
train_frac
|
float
|
Requested fraction of eligible sequences to assign to the train split. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Self |
Self
|
Copy with updated train/test split fraction. |