Skip to content

Sequences

Sequences are the point where a templated dataset becomes a modeling dataset.

This page covers the grouping builders, the TemplateSequence shape, and the split semantics that determine how sequences are assigned to train and test.

>>> from anomalog.sequences import SplitLabel, TemplateSequence
>>> sequence = TemplateSequence(
...     events=[("template <*>", ["x"], None), ("template <*>", ["y"], 10)],
...     label=0,
...     entity_ids=["node-1"],
...     window_id=7,
...     split_label=SplitLabel.TRAIN,
... )
>>> sequence.sole_entity_id
'node-1'
>>> sequence.templates
['template <*>', 'template <*>']
>>> sequence.split_label.value
'train'

anomalog.sequences

Utilities for building template sequences from structured log lines.

The module groups parsed log lines into windows (entity, fixed-size, or time-based) and decorates them with inferred templates and anomaly labels.

ChronologicalStreamSequenceBuilder dataclass

Bases: NonEntitySequenceBuilder

Sequence builder for chronological raw-entry stream chunks.

Attributes:

Name Type Description
chunk_size int

Maximum number of raw entries per emitted chunk.

continuous_context bool

Whether adjacent chunks should carry model state across sequence boundaries.

__iter__()

Iterate over chronological stream chunks with optional raw splits.

Yields:

Name Type Description
TemplateSequence TemplateSequence

One preserved chronological chunk per emitted

TemplateSequence

sequence. When a raw-entry split is active, per-event training

TemplateSequence

eligibility is attached through training_event_mask instead of

TemplateSequence

fragmenting the chunk.

build_raw_entry_split_summary()

Return diagnostics for a configured raw-entry split, if any.

Returns:

Type Description
RawEntrySplitSummary | None

RawEntrySplitSummary | None: Raw-entry split diagnostics when a before-grouping split is configured, otherwise None.

build_split_summary(*, sequence_summary)

Describe requested versus effective split semantics for one run.

Parameters:

Name Type Description Default
sequence_summary SequenceSummary

Aggregate split and label counts.

required

Returns:

Name Type Description
SequenceSplitSummary SequenceSplitSummary

Requested and effective split metrics.

count_windows()

Return the number of chronological stream chunks.

Returns:

Name Type Description
int int

Count of chronological stream chunks implied by the sink.

iter_grouped_rows()

Return rows grouped into deterministic chronological chunks.

Returns:

Type Description
Iterator[Collection[StructuredLine]]

Iterator[Collection[StructuredLine]]: Deterministic chronological chunks of structured rows.

iter_test_sequences()

Yield only the test suffix used for detector scoring.

Yields:

Name Type Description
TemplateSequence TemplateSequence

Sequences assigned to the test split.

iter_training_sequences()

Yield the training slice used by model fitting.

Yields:

Name Type Description
TemplateSequence TemplateSequence

Sequences assigned to the training split.

represent_with(representation)

Return a lazy builder that applies a representation per sequence.

Parameters:

Name Type Description Default
representation SequenceRepresentation[TRepresentation]

Sequence representation to apply lazily to each built sequence.

required

Returns:

Type Description
SequenceRepresentationView[TRepresentation]

SequenceRepresentationView[TRepresentation]: Lazy represented view of the generated sequences.

split_count_hint()

Return the exact split-count summary for non-entity grouping.

Returns:

Name Type Description
SequenceSplitCounts SequenceSplitCounts

Exact split counts for the grouping strategy.

split_summary_train_on_normal_entities_only()

Return split-summary metadata for entity-only normal training.

Returns:

Type Description
bool | None

bool | None: Whether train was restricted to normal entities only,

bool | None

or None when that concept does not apply to this builder.

train_fraction_eligible_sequence_count(*, sequence_summary)

Return the denominator for effective train-fraction accounting.

Parameters:

Name Type Description Default
sequence_summary SequenceSummary

Aggregate split and label counts.

required

Returns:

Name Type Description
int int

Count of sequences considered eligible when reporting the realised train fraction for this grouping strategy. This is not necessarily the number of sequences that were actually assigned to train.

train_sequence_count_unit_hint()

Return the unit label for stream chunks.

Returns:

Name Type Description
str str

Human-readable unit label for stream progress.

with_continuous_context(*, enabled=True)

Treat consecutive stream chunks as one continuous stream.

Parameters:

Name Type Description Default
enabled bool

Whether to carry model state across chunk boundaries.

True

Returns:

Name Type Description
Self Self

Copy with updated continuity behaviour.

with_split_fractions(train_frac, test_frac)

Return a copy with both split fractions updated together.

Parameters:

Name Type Description Default
train_frac float

Requested fraction of the total population to assign to the train prefix.

required
test_frac float

Requested fraction reserved for the fixed test suffix.

required

Returns:

Name Type Description
Self Self

Copy with updated split fractions.

EntitySequenceBuilder dataclass

Bases: SequenceBuilder

Sequence builder for per-entity grouping.

Attributes:

Name Type Description
train_on_normal_entities_only bool

Whether anomalous entities are excluded from the training split budget.

continuous_context bool

Whether adjacent entity windows should carry state across sequence boundaries.

__iter__()

Iterate over template sequences yielded by the configured grouping.

Yields:

Name Type Description
TemplateSequence TemplateSequence

One grouped and template-enriched sequence.

__post_init__()

Validate the requested split fractions and raw-entry split inputs.

Raises:

Type Description
ValueError

If the requested split settings are inconsistent.

build_raw_entry_split_summary()

Return diagnostics for a configured raw-entry split, if any.

Returns:

Type Description
RawEntrySplitSummary | None

RawEntrySplitSummary | None: Raw-entry split diagnostics when a before-grouping split is configured, otherwise None.

build_split_summary(*, sequence_summary)

Describe requested versus effective split semantics for one run.

Parameters:

Name Type Description Default
sequence_summary SequenceSummary

Aggregate split and label counts.

required

Returns:

Name Type Description
SequenceSplitSummary SequenceSplitSummary

Requested and effective split metrics.

from_dataset(td) classmethod

Create an entity-grouped builder from a templated dataset.

Parameters:

Name Type Description Default
td TemplatedDataset

Templated dataset to bind into the builder.

required

Returns:

Name Type Description
Self Self

Builder bound to the templated dataset.

iter_grouped_rows()

Return rows grouped by entity.

Returns:

Type Description
Iterator[Collection[StructuredLine]]

Iterator[Collection[StructuredLine]]: Entity-grouped structured rows.

iter_test_sequences()

Yield only the test suffix used for detector scoring.

Yields:

Name Type Description
TemplateSequence TemplateSequence

Sequences assigned to the test split.

iter_training_sequences()

Yield only the train split used for detector fitting.

For HDFS-style raw-entry prefix protocols, fitting only needs the train population implied by the selected policy. We therefore avoid replaying the full suffix and instead materialise just the selected train entities or raw-prefix rows, depending on the straddler policy.

Yields:

Name Type Description
TemplateSequence TemplateSequence

Training-split sequences yielded by the configured grouping strategy.

Raises:

Type Description
ValueError

If the requested raw-entry split metadata is missing for a configured before-grouping prefix protocol.

represent_with(representation)

Return a lazy builder that applies a representation per sequence.

Parameters:

Name Type Description Default
representation SequenceRepresentation[TRepresentation]

Sequence representation to apply lazily to each built sequence.

required

Returns:

Type Description
SequenceRepresentationView[TRepresentation]

SequenceRepresentationView[TRepresentation]: Lazy represented view of the generated sequences.

split_count_hint()

Return the exact split-count summary for entity grouping.

Returns:

Name Type Description
SequenceSplitCounts SequenceSplitCounts

Exact split counts for the entity builder.

split_summary_train_on_normal_entities_only()

Return entity split-summary metadata for normal-only training.

Returns:

Name Type Description
bool bool

Whether train was restricted to normal entities only.

train_fraction_eligible_sequence_count(*, sequence_summary)

Return the denominator for effective train-fraction accounting.

Parameters:

Name Type Description Default
sequence_summary SequenceSummary

Aggregate split and label counts.

required

Returns:

Name Type Description
int int

Eligible entity-sequence count under the current policy. When normal-only training is enabled, this counts only normal entities in the fixed train pool before the hold-out suffix.

train_sequence_count_unit_hint()

Return the unit label for entity-grouped train progress.

Returns:

Name Type Description
str str

Unit label for entity-grouped train progress.

with_continuous_context(*, enabled=True)

Treat consecutive entity windows as one continuous stream.

Parameters:

Name Type Description Default
enabled bool

Whether to carry model state across entity boundaries.

True

Returns:

Name Type Description
Self Self

Copy with updated continuity behaviour.

with_split_fractions(train_frac, test_frac)

Return a copy with both split fractions updated together.

Parameters:

Name Type Description Default
train_frac float

Requested fraction of the total population to assign to the train prefix.

required
test_frac float

Requested fraction reserved for the fixed test suffix.

required

Returns:

Name Type Description
Self Self

Copy with updated split fractions.

with_train_on_normal_entities_only(*, enabled=True)

Limit training sequences to entities without anomalies.

Parameters:

Name Type Description Default
enabled bool

Whether to restrict train sequences to normal entities only.

True

Returns:

Name Type Description
Self Self

Copy with updated normal-only training behavior.

FixedSequenceBuilder dataclass

Bases: NonEntitySequenceBuilder

Sequence builder for fixed-size window grouping.

Attributes:

Name Type Description
window_size int

Number of rows per emitted window.

step int | None

Row advance between windows. None means non-overlapping windows.

window_basis FixedWindowBasis

Whether windows are built over the compacted structured rows or over the raw line positions.

window_alignment_offset int

Raw-position offset before the first full raw-position window.

__iter__()

Yield fixed-size windows with the configured basis.

Yields:

Name Type Description
TemplateSequence TemplateSequence

One grouped template sequence per fixed window.

__post_init__()

Validate the requested split fractions and raw-entry split inputs.

Raises:

Type Description
ValueError

If the requested split settings are inconsistent.

build_raw_entry_split_summary()

Return diagnostics for a configured raw-entry split, if any.

Returns:

Type Description
RawEntrySplitSummary | None

RawEntrySplitSummary | None: Raw-entry split diagnostics when a before-grouping split is configured, otherwise None.

build_split_summary(*, sequence_summary)

Describe requested versus effective split semantics for one run.

Parameters:

Name Type Description Default
sequence_summary SequenceSummary

Aggregate split and label counts.

required

Returns:

Name Type Description
SequenceSplitSummary SequenceSplitSummary

Requested and effective split metrics.

count_windows()

Return the number of fixed-size windows.

Returns:

Name Type Description
int int

Count of fixed-size windows implied by the sink and config.

iter_grouped_rows()

Return rows grouped by fixed-size windows.

Returns:

Type Description
Iterator[Collection[StructuredLine]]

Iterator[Collection[StructuredLine]]: Fixed-size row windows.

iter_test_sequences()

Yield only the test suffix used for detector scoring.

Yields:

Name Type Description
TemplateSequence TemplateSequence

Sequences assigned to the test split.

iter_training_sequences()

Yield the training slice used by model fitting.

Yields:

Name Type Description
TemplateSequence TemplateSequence

Sequences assigned to the training split.

represent_with(representation)

Return a lazy builder that applies a representation per sequence.

Parameters:

Name Type Description Default
representation SequenceRepresentation[TRepresentation]

Sequence representation to apply lazily to each built sequence.

required

Returns:

Type Description
SequenceRepresentationView[TRepresentation]

SequenceRepresentationView[TRepresentation]: Lazy represented view of the generated sequences.

split_count_hint()

Return the exact split-count summary for non-entity grouping.

Returns:

Name Type Description
SequenceSplitCounts SequenceSplitCounts

Exact split counts for the grouping strategy.

split_summary_train_on_normal_entities_only()

Return split-summary metadata for entity-only normal training.

Returns:

Type Description
bool | None

bool | None: Whether train was restricted to normal entities only,

bool | None

or None when that concept does not apply to this builder.

train_fraction_eligible_sequence_count(*, sequence_summary)

Return the denominator for effective train-fraction accounting.

Parameters:

Name Type Description Default
sequence_summary SequenceSummary

Aggregate split and label counts.

required

Returns:

Name Type Description
int int

Count of sequences considered eligible when reporting the realised train fraction for this grouping strategy. This is not necessarily the number of sequences that were actually assigned to train.

train_sequence_count_unit_hint()

Return the unit label for fixed-window train progress.

Returns:

Name Type Description
str str

Unit label for fixed-window train progress.

with_split_fractions(train_frac, test_frac)

Return a copy with both split fractions updated together.

Parameters:

Name Type Description Default
train_frac float

Requested fraction of the total population to assign to the train prefix.

required
test_frac float

Requested fraction reserved for the fixed test suffix.

required

Returns:

Name Type Description
Self Self

Copy with updated split fractions.

FixedWindowBasis

Bases: str, Enum

What positional basis to use for fixed-size windows.

Attributes:

Name Type Description
COMPACTED_ROWS

Build windows over compacted structured rows.

RAW_POSITIONS

Build windows over raw line positions.

NonEntitySequenceBuilder dataclass

Bases: SequenceBuilder

Sequence builder for non-entity grouping strategies.

This is a marker subclass to clarify when normal entity logic does not apply, such as for fixed-size or time-based windowing.

__iter__()

Iterate over template sequences yielded by the configured grouping.

Yields:

Name Type Description
TemplateSequence TemplateSequence

One grouped and template-enriched sequence.

__post_init__()

Validate the requested split fractions and raw-entry split inputs.

Raises:

Type Description
ValueError

If the requested split settings are inconsistent.

build_raw_entry_split_summary()

Return diagnostics for a configured raw-entry split, if any.

Returns:

Type Description
RawEntrySplitSummary | None

RawEntrySplitSummary | None: Raw-entry split diagnostics when a before-grouping split is configured, otherwise None.

build_split_summary(*, sequence_summary)

Describe requested versus effective split semantics for one run.

Parameters:

Name Type Description Default
sequence_summary SequenceSummary

Aggregate split and label counts.

required

Returns:

Name Type Description
SequenceSplitSummary SequenceSplitSummary

Requested and effective split metrics.

count_windows() abstractmethod

Return the total number of windows implied by the sink and config.

Returns:

Name Type Description
int int

Count of windows implied by the sink and current builder config.

iter_grouped_rows() abstractmethod

Return grouped rows for the configured strategy.

Returns:

Type Description
Iterator[Collection[StructuredLine]]

Iterator[Collection[StructuredLine]]: Iterator over grouped windows of structured rows.

iter_test_sequences()

Yield only the test suffix used for detector scoring.

Yields:

Name Type Description
TemplateSequence TemplateSequence

Sequences assigned to the test split.

iter_training_sequences()

Yield the training slice used by model fitting.

Yields:

Name Type Description
TemplateSequence TemplateSequence

Sequences assigned to the training split.

represent_with(representation)

Return a lazy builder that applies a representation per sequence.

Parameters:

Name Type Description Default
representation SequenceRepresentation[TRepresentation]

Sequence representation to apply lazily to each built sequence.

required

Returns:

Type Description
SequenceRepresentationView[TRepresentation]

SequenceRepresentationView[TRepresentation]: Lazy represented view of the generated sequences.

split_count_hint()

Return the exact split-count summary for non-entity grouping.

Returns:

Name Type Description
SequenceSplitCounts SequenceSplitCounts

Exact split counts for the grouping strategy.

split_summary_train_on_normal_entities_only()

Return split-summary metadata for entity-only normal training.

Returns:

Type Description
bool | None

bool | None: Whether train was restricted to normal entities only,

bool | None

or None when that concept does not apply to this builder.

train_fraction_eligible_sequence_count(*, sequence_summary)

Return the denominator for effective train-fraction accounting.

Parameters:

Name Type Description Default
sequence_summary SequenceSummary

Aggregate split and label counts.

required

Returns:

Name Type Description
int int

Count of sequences considered eligible when reporting the realised train fraction for this grouping strategy. This is not necessarily the number of sequences that were actually assigned to train.

train_sequence_count_unit_hint()

Return a human-readable unit label for train-count progress.

This is intended for progress reporting only. Builders should return a unit when it clarifies what the bounded train count represents, such as "entities" for entity grouping.

Returns:

Type Description
str | None

str | None: Unit label for train-count progress when useful, otherwise None.

with_split_fractions(train_frac, test_frac)

Return a copy with both split fractions updated together.

Parameters:

Name Type Description Default
train_frac float

Requested fraction of the total population to assign to the train prefix.

required
test_frac float

Requested fraction reserved for the fixed test suffix.

required

Returns:

Name Type Description
Self Self

Copy with updated split fractions.

RawEntrySplitMode

Bases: str, Enum

Chronological raw-entry split modes supported by sequence builders.

Attributes:

Name Type Description
PREFIX_COUNT

Split by the first N raw entries.

PREFIX_FRACTION

Split by the first fraction of raw entries.

PREFIX_NORMAL_FRACTION

Split by the first fraction of normal entries.

RawEntrySplitSummary dataclass

Audit summary for a chronological raw-entry split.

Attributes:

Name Type Description
split_mode str

Configured raw-entry split mode.

application_order str

Whether the split was applied before or after grouping.

cutoff_entry_index int

Zero-based raw-entry cutoff where the test suffix begins.

train_raw_entry_count int

Raw entries assigned to train.

train_normal_entry_count int

Normal raw entries assigned to train.

train_anomalous_entry_count int

Anomalous raw entries assigned to train.

test_raw_entry_count int

Raw entries assigned to test.

test_normal_entry_count int

Normal raw entries assigned to test.

test_anomalous_entry_count int

Anomalous raw entries assigned to test.

ignored_raw_entry_count int

Raw entries withheld from both train and test.

ignored_normal_entry_count int

Normal raw entries withheld.

ignored_anomalous_entry_count int

Anomalous raw entries withheld.

straddling_group_count int

Number of grouped windows that crossed the split boundary.

straddling_group_policy str | None

Policy applied to straddling groups.

as_dict()

Return a JSON-friendly representation.

Returns:

Type Description
dict[str, int | str | None]

dict[str, int | str | None]: Serialisable split summary payload.

SequenceBuilder dataclass

Bases: ABC, Iterable[TemplateSequence]

Common sequence-building behavior shared across grouping strategies.

Sequence builders stay lazy so expensive grouping, template inference, and label resolution only happen when a caller iterates. The shared base also centralises split assignment so experiment manifests can describe train/test semantics consistently across grouping modes.

Attributes:

Name Type Description
sink StructuredSink

Structured sink supplying grouped rows.

infer_template Callable[[str], tuple[LogTemplate, ExtractedParameters]]

Template inference function for row message text.

label_for_group Callable[[str], int | None]

Group-level anomaly label lookup by entity id.

template_parser TemplateParser | None

Optional parser object kept alongside infer_template so optimisation paths can inspect parser capabilities without peeking at bound-method metadata.

raw_replay_state _BeforeGroupingRawReplayState

Mutable cache reused by before-grouping raw-entry split paths.

split_mode RawEntrySplitMode | None

Raw-entry split mode used for special reproduction protocols. None preserves the legacy sequence-fraction split behaviour.

split_application_order SplitApplicationOrder

Whether the split is applied before or after grouping.

straddling_group_policy StraddlingGroupPolicy

Policy for grouped rows that cross a raw-entry split boundary.

train_entry_count int | None

Requested raw-entry prefix length when split_mode = PREFIX_COUNT.

train_entry_fraction float | None

Requested raw-entry prefix fraction when split_mode = PREFIX_FRACTION.

train_normal_entry_fraction float | None

Requested normal-entry prefix fraction when split_mode = PREFIX_NORMAL_FRACTION.

stream_chunk_size int | None

Optional chunk size used by stream grouping strategies.

train_frac float

Requested training fraction for the builder.

test_frac float

Fixed test suffix fraction.

__iter__() abstractmethod

Iterate over template sequences yielded by the configured grouping.

Returns:

Type Description
Iterator[TemplateSequence]

Iterator[TemplateSequence]: Iterator yielding grouped and template-enriched sequences.

__post_init__()

Validate the requested split fractions and raw-entry split inputs.

Raises:

Type Description
ValueError

If the requested split settings are inconsistent.

build_raw_entry_split_summary()

Return diagnostics for a configured raw-entry split, if any.

Returns:

Type Description
RawEntrySplitSummary | None

RawEntrySplitSummary | None: Raw-entry split diagnostics when a before-grouping split is configured, otherwise None.

build_split_summary(*, sequence_summary)

Describe requested versus effective split semantics for one run.

Parameters:

Name Type Description Default
sequence_summary SequenceSummary

Aggregate split and label counts.

required

Returns:

Name Type Description
SequenceSplitSummary SequenceSplitSummary

Requested and effective split metrics.

iter_grouped_rows() abstractmethod

Return grouped rows for the configured strategy.

Returns:

Type Description
Iterator[Collection[StructuredLine]]

Iterator[Collection[StructuredLine]]: Iterator over grouped windows of structured rows.

iter_test_sequences()

Yield the test slice used by model scoring.

Yields:

Name Type Description
TemplateSequence TemplateSequence

Sequences assigned to the test split.

iter_training_sequences()

Yield the training slice used by model fitting.

Yields:

Name Type Description
TemplateSequence TemplateSequence

Sequences assigned to the training split.

represent_with(representation)

Return a lazy builder that applies a representation per sequence.

Parameters:

Name Type Description Default
representation SequenceRepresentation[TRepresentation]

Sequence representation to apply lazily to each built sequence.

required

Returns:

Type Description
SequenceRepresentationView[TRepresentation]

SequenceRepresentationView[TRepresentation]: Lazy represented view of the generated sequences.

split_count_hint()

Return a cheap exact split-count summary when the builder knows it.

Returns:

Type Description
SequenceSplitCounts | None

SequenceSplitCounts | None: Exact split counts when cheaply available, otherwise None.

split_summary_train_on_normal_entities_only()

Return split-summary metadata for entity-only normal training.

Returns:

Type Description
bool | None

bool | None: Whether train was restricted to normal entities only,

bool | None

or None when that concept does not apply to this builder.

train_fraction_eligible_sequence_count(*, sequence_summary)

Return the denominator for effective train-fraction accounting.

Parameters:

Name Type Description Default
sequence_summary SequenceSummary

Aggregate split and label counts.

required

Returns:

Name Type Description
int int

Count of sequences considered eligible when reporting the realised train fraction for this grouping strategy. This is not necessarily the number of sequences that were actually assigned to train.

train_sequence_count_unit_hint()

Return a human-readable unit label for train-count progress.

This is intended for progress reporting only. Builders should return a unit when it clarifies what the bounded train count represents, such as "entities" for entity grouping.

Returns:

Type Description
str | None

str | None: Unit label for train-count progress when useful, otherwise None.

with_split_fractions(train_frac, test_frac)

Return a copy with both split fractions updated together.

Parameters:

Name Type Description Default
train_frac float

Requested fraction of the total population to assign to the train prefix.

required
test_frac float

Requested fraction reserved for the fixed test suffix.

required

Returns:

Name Type Description
Self Self

Copy with updated split fractions.

SequenceSplitCounts dataclass

Exact split counts for a concrete sequence builder.

Attributes:

Name Type Description
total_count int

Total emitted sequence count.

train_count int

Count assigned to the current train prefix.

ignored_count int

Count withheld between train and test.

test_count int

Count assigned to the fixed test suffix.

SequenceSplitSummary dataclass

Serialisable summary of requested versus effective split behavior.

The requested train fraction may not equal the effective one after grouping-specific eligibility rules are applied. Persisting both protects downstream experiment manifests from silently overstating how much data was actually available for training.

Attributes:

Name Type Description
requested_train_fraction float

Requested fraction provided by the caller.

requested_test_fraction float

Requested test suffix fraction provided by the caller.

train_on_normal_entities_only bool | None

Whether training was restricted to normal entities only. Only applicable to entity grouping; None otherwise.

train_pool_sequence_count int

Number of sequences in the chronological train candidate window before detector-specific filtering is applied.

ineligible_train_pool_count int

Number of sequences in the train pool that were ineligible for training under the current policy.

realised_train_sequence_count int

Number of sequences actually used for training after any detector-specific filtering.

excluded_from_train_count int

Number of sequences withheld from the train pool before scoring, including the ignored middle band and any detector-ineligible prefix items.

eligible_train_sequence_count int

Number of sequences in the denominator for the effective train-fraction calculation. In entity-grouped mode this is the fixed chronological train pool, or the normal-only subset of that pool when normal-only training is enabled.

ignored_sequence_count int

Number of sequences withheld from the train pool because they fell outside the requested train prefix or were ineligible under the current filtering policy.

effective_train_fraction_of_eligible float

Realised train fraction over the eligible set.

effective_train_fraction_overall float

Realised train fraction over the full generated sequence population.

as_dict()

Return a stable JSON-friendly representation.

Returns:

Type Description
dict[str, int | float | bool | str]

dict[str, int | float | bool | str]: Serialised split summary.

SplitApplicationOrder

Bases: str, Enum

When to apply a configured split relative to grouping.

Attributes:

Name Type Description
AFTER_GROUPING

Apply the split after grouping has produced sequences.

BEFORE_GROUPING

Apply the split on raw entries before grouping.

SplitLabel

Bases: str, Enum

Dataset split membership for a sequence.

Attributes:

Name Type Description
TRAIN

Sequence belongs to the training split.

TEST

Sequence belongs to the evaluation/test split.

IGNORED

Sequence belongs to the fixed train pool but is not used for the current training prefix.

StraddlingGroupPolicy

Bases: str, Enum

How to handle grouped rows that cross a raw-entry split boundary.

Attributes:

Name Type Description
SPLIT_PARTIAL_SEQUENCES

Emit one sequence per contiguous segment.

ASSIGN_BY_FIRST_EVENT

Assign the whole group by the first segment.

ASSIGN_BY_LAST_EVENT

Assign the whole group by the last segment.

DROP_STRADDLERS

Drop groups that span both sides of the split.

TemplateSequence dataclass

Grouped log window before any model-specific representation is applied.

This keeps sequence semantics such as event ordering, labels, and entity membership. Model inputs derived from it live in SequenceSample.

Attributes:

Name Type Description
events list[tuple[str, list[str], int | None]]

Ordered sequence events as (template, parameters, dt_prev_ms) tuples.

label int

Sequence-level anomaly label derived from rows and group labels.

entity_ids list[str]

Unique entity ids present in the window in first-seen order.

window_id int

Stable window identifier assigned by the builder.

split_label SplitLabel

Dataset split assigned to the sequence.

event_labels tuple[int | None, ...] | None

Optional per-event anomaly labels aligned positionally with events. When present, each entry may be None if that event has no direct label.

training_event_mask tuple[bool, ...] | None

Optional per-event eligibility mask for training-target selection. This is used when a preserved chronological chunk must stay intact even though only a subset of its events are valid training targets.

evaluation_event_mask tuple[bool, ...] | None

Optional per-event eligibility mask for scoring targets. This is used when a preserved chronological chunk must stay intact even though only a subset of its events belong to the evaluation split.

continuous_context bool

Whether adjacent sequences should be treated as a single chronological stream for model state carryover.

sole_entity_id property

Return the entity id when the sequence belongs to exactly one entity.

If multiple entities appear in the window, None is returned to avoid implying a single owning entity.

templates property

Return the ordered template strings for this sequence.

__post_init__()

Validate that any event labels stay aligned with the events.

Raises:

Type Description
ValueError

If event_labels is provided with a different length from events.

TimeSequenceBuilder dataclass

Bases: NonEntitySequenceBuilder

Sequence builder for time-window grouping.

Attributes:

Name Type Description
time_span_ms int

Duration of each emitted window in milliseconds.

step int | None

Window advance in milliseconds. None means non-overlapping windows.

__iter__()

Iterate over time windows with optional raw-entry split semantics.

Yields:

Name Type Description
TemplateSequence TemplateSequence

One grouped time window, optionally segmented according to a raw-entry split applied before grouping.

__post_init__()

Validate the requested split fractions and raw-entry split inputs.

Raises:

Type Description
ValueError

If the requested split settings are inconsistent.

build_raw_entry_split_summary()

Return diagnostics for a configured raw-entry split, if any.

Returns:

Type Description
RawEntrySplitSummary | None

RawEntrySplitSummary | None: Raw-entry split diagnostics when a before-grouping split is configured, otherwise None.

build_split_summary(*, sequence_summary)

Describe requested versus effective split semantics for one run.

Parameters:

Name Type Description Default
sequence_summary SequenceSummary

Aggregate split and label counts.

required

Returns:

Name Type Description
SequenceSplitSummary SequenceSplitSummary

Requested and effective split metrics.

count_windows()

Return the number of time windows.

Returns:

Name Type Description
int int

Count of time windows implied by the sink timestamps and config.

iter_grouped_rows()

Return rows grouped by time windows.

Returns:

Type Description
Iterator[Collection[StructuredLine]]

Iterator[Collection[StructuredLine]]: Time-based row windows.

iter_test_sequences()

Yield only the test suffix used for detector scoring.

Yields:

Name Type Description
TemplateSequence TemplateSequence

Sequences assigned to the test split.

iter_training_sequences()

Yield the training slice used by model fitting.

Yields:

Name Type Description
TemplateSequence TemplateSequence

Sequences assigned to the training split.

represent_with(representation)

Return a lazy builder that applies a representation per sequence.

Parameters:

Name Type Description Default
representation SequenceRepresentation[TRepresentation]

Sequence representation to apply lazily to each built sequence.

required

Returns:

Type Description
SequenceRepresentationView[TRepresentation]

SequenceRepresentationView[TRepresentation]: Lazy represented view of the generated sequences.

split_count_hint()

Return the exact split-count summary for non-entity grouping.

Returns:

Name Type Description
SequenceSplitCounts SequenceSplitCounts

Exact split counts for the grouping strategy.

split_summary_train_on_normal_entities_only()

Return split-summary metadata for entity-only normal training.

Returns:

Type Description
bool | None

bool | None: Whether train was restricted to normal entities only,

bool | None

or None when that concept does not apply to this builder.

train_fraction_eligible_sequence_count(*, sequence_summary)

Return the denominator for effective train-fraction accounting.

Parameters:

Name Type Description Default
sequence_summary SequenceSummary

Aggregate split and label counts.

required

Returns:

Name Type Description
int int

Count of sequences considered eligible when reporting the realised train fraction for this grouping strategy. This is not necessarily the number of sequences that were actually assigned to train.

train_sequence_count_unit_hint()

Return the unit label for time-window train progress.

Returns:

Name Type Description
str str

Unit label for time-window train progress.

with_split_fractions(train_frac, test_frac)

Return a copy with both split fractions updated together.

Parameters:

Name Type Description Default
train_frac float

Requested fraction of the total population to assign to the train prefix.

required
test_frac float

Requested fraction reserved for the fixed test suffix.

required

Returns:

Name Type Description
Self Self

Copy with updated split fractions.