Sequences¶

Sequences are the point where a templated dataset becomes a modeling dataset.

This page covers the grouping builders, the TemplateSequence shape, and the split semantics that determine how sequences are assigned to train and test.

>>> from anomalog.sequences import SplitLabel, TemplateSequence
>>> sequence = TemplateSequence(
...     events=[("template <*>", ["x"], None), ("template <*>", ["y"], 10)],
...     label=0,
...     entity_ids=["node-1"],
...     window_id=7,
...     split_label=SplitLabel.TRAIN,
... )
>>> sequence.sole_entity_id
'node-1'
>>> sequence.templates
['template <*>', 'template <*>']
>>> sequence.split_label.value
'train'

`anomalog.sequences`¶

Utilities for building template sequences from structured log lines.

The module groups parsed log lines into windows (entity, fixed-size, or time-based) and decorates them with inferred templates and anomaly labels.

`ChronologicalStreamSequenceBuilder` `dataclass` ¶

Bases: NonEntitySequenceBuilder

Sequence builder for chronological raw-entry stream chunks.

Attributes:

Name	Type	Description
`chunk_size`	`int`	Maximum number of raw entries per emitted chunk.
`continuous_context`	`bool`	Whether adjacent chunks should carry model state across sequence boundaries.

`iter()` ¶

Iterate over chronological stream chunks with optional raw splits.

Yields:

Name	Type	Description
`TemplateSequence`	`TemplateSequence`	One preserved chronological chunk per emitted
	`TemplateSequence`	sequence. When a raw-entry split is active, per-event training
	`TemplateSequence`	eligibility is attached through `training_event_mask` instead of
	`TemplateSequence`	fragmenting the chunk.

`build_raw_entry_split_summary()` ¶

Return diagnostics for a configured raw-entry split, if any.

Returns:

Type	Description
`RawEntrySplitSummary \| None`	RawEntrySplitSummary \| None: Raw-entry split diagnostics when a before-grouping split is configured, otherwise `None`.

`build_split_summary(*, sequence_summary)` ¶

Describe requested versus effective split semantics for one run.

Parameters:

Name	Type	Description	Default
`sequence_summary`	`SequenceSummary`	Aggregate split and label counts.	required

Returns:

Name	Type	Description
`SequenceSplitSummary`	`SequenceSplitSummary`	Requested and effective split metrics.

`count_windows()` ¶

Return the number of chronological stream chunks.

Returns:

Name	Type	Description
`int`	`int`	Count of chronological stream chunks implied by the sink.

`iter_grouped_rows()` ¶

Return rows grouped into deterministic chronological chunks.

Returns:

Type	Description
`Iterator[Collection[StructuredLine]]`	Iterator[Collection[StructuredLine]]: Deterministic chronological chunks of structured rows.

`iter_test_sequences()` ¶

Yield only the test suffix used for detector scoring.

Yields:

Name	Type	Description
`TemplateSequence`	`TemplateSequence`	Sequences assigned to the test split.

`iter_training_sequences()` ¶

Yield the training slice used by model fitting.

Yields:

Name	Type	Description
`TemplateSequence`	`TemplateSequence`	Sequences assigned to the training split.

`represent_with(representation)` ¶

Return a lazy builder that applies a representation per sequence.

Parameters:

Name	Type	Description	Default
`representation`	`SequenceRepresentation[TRepresentation]`	Sequence representation to apply lazily to each built sequence.	required

Returns:

Type	Description
`SequenceRepresentationView[TRepresentation]`	SequenceRepresentationView[TRepresentation]: Lazy represented view of the generated sequences.

`split_count_hint()` ¶

Return the exact split-count summary for non-entity grouping.

Returns:

Name	Type	Description
`SequenceSplitCounts`	`SequenceSplitCounts`	Exact split counts for the grouping strategy.

`split_summary_train_on_normal_entities_only()` ¶

Return split-summary metadata for entity-only normal training.

Returns:

Type	Description
`bool \| None`	bool \| None: Whether train was restricted to normal entities only,
`bool \| None`	or `None` when that concept does not apply to this builder.

`train_fraction_eligible_sequence_count(*, sequence_summary)` ¶

Return the denominator for effective train-fraction accounting.

Parameters:

Name	Type	Description	Default
`sequence_summary`	`SequenceSummary`	Aggregate split and label counts.	required

Returns:

Name	Type	Description
`int`	`int`	Count of sequences considered eligible when reporting the realised train fraction for this grouping strategy. This is not necessarily the number of sequences that were actually assigned to train.

`train_sequence_count_unit_hint()` ¶

Return the unit label for stream chunks.

Returns:

Name	Type	Description
`str`	`str`	Human-readable unit label for stream progress.

`with_continuous_context(*, enabled=True)` ¶

Treat consecutive stream chunks as one continuous stream.

Parameters:

Name	Type	Description	Default
`enabled`	`bool`	Whether to carry model state across chunk boundaries.	`True`

Returns:

Name	Type	Description
`Self`	`Self`	Copy with updated continuity behaviour.

`with_split_fractions(train_frac, test_frac)` ¶

Return a copy with both split fractions updated together.

Parameters:

Name	Type	Description	Default
`train_frac`	`float`	Requested fraction of the total population to assign to the train prefix.	required
`test_frac`	`float`	Requested fraction reserved for the fixed test suffix.	required

Returns:

Name	Type	Description
`Self`	`Self`	Copy with updated split fractions.

`EntitySequenceBuilder` `dataclass` ¶

Bases: SequenceBuilder

Sequence builder for per-entity grouping.

Attributes:

Name	Type	Description
`train_on_normal_entities_only`	`bool`	Whether anomalous entities are excluded from the training split budget.
`continuous_context`	`bool`	Whether adjacent entity windows should carry state across sequence boundaries.

`iter()` ¶

Iterate over template sequences yielded by the configured grouping.

Yields:

Name	Type	Description
`TemplateSequence`	`TemplateSequence`	One grouped and template-enriched sequence.

`__post_init__()` ¶

Validate the requested split fractions and raw-entry split inputs.

Raises:

Type	Description
`ValueError`	If the requested split settings are inconsistent.

`build_raw_entry_split_summary()` ¶

Return diagnostics for a configured raw-entry split, if any.

Returns:

Type	Description
`RawEntrySplitSummary \| None`	RawEntrySplitSummary \| None: Raw-entry split diagnostics when a before-grouping split is configured, otherwise `None`.

`build_split_summary(*, sequence_summary)` ¶

Describe requested versus effective split semantics for one run.

Parameters:

Name	Type	Description	Default
`sequence_summary`	`SequenceSummary`	Aggregate split and label counts.	required

Returns:

Name	Type	Description
`SequenceSplitSummary`	`SequenceSplitSummary`	Requested and effective split metrics.

`from_dataset(td)` `classmethod` ¶

Create an entity-grouped builder from a templated dataset.

Parameters:

Name	Type	Description	Default
`td`	`TemplatedDataset`	Templated dataset to bind into the builder.	required

Returns:

Name	Type	Description
`Self`	`Self`	Builder bound to the templated dataset.

`iter_grouped_rows()` ¶

Return rows grouped by entity.

Returns:

Type	Description
`Iterator[Collection[StructuredLine]]`	Iterator[Collection[StructuredLine]]: Entity-grouped structured rows.

`iter_test_sequences()` ¶

Yield only the test suffix used for detector scoring.

Yields:

Name	Type	Description
`TemplateSequence`	`TemplateSequence`	Sequences assigned to the test split.

`iter_training_sequences()` ¶

Yield only the train split used for detector fitting.

For HDFS-style raw-entry prefix protocols, fitting only needs the train population implied by the selected policy. We therefore avoid replaying the full suffix and instead materialise just the selected train entities or raw-prefix rows, depending on the straddler policy.

Yields:

Name	Type	Description
`TemplateSequence`	`TemplateSequence`	Training-split sequences yielded by the configured grouping strategy.

Raises:

Type	Description
`ValueError`	If the requested raw-entry split metadata is missing for a configured before-grouping prefix protocol.

`represent_with(representation)` ¶

Return a lazy builder that applies a representation per sequence.

Parameters:

Name	Type	Description	Default
`representation`	`SequenceRepresentation[TRepresentation]`	Sequence representation to apply lazily to each built sequence.	required

Returns:

Type	Description
`SequenceRepresentationView[TRepresentation]`	SequenceRepresentationView[TRepresentation]: Lazy represented view of the generated sequences.

`split_count_hint()` ¶

Return the exact split-count summary for entity grouping.

Returns:

Name	Type	Description
`SequenceSplitCounts`	`SequenceSplitCounts`	Exact split counts for the entity builder.

`split_summary_train_on_normal_entities_only()` ¶

Return entity split-summary metadata for normal-only training.

Returns:

Name	Type	Description
`bool`	`bool`	Whether train was restricted to normal entities only.

`train_fraction_eligible_sequence_count(*, sequence_summary)` ¶

Return the denominator for effective train-fraction accounting.

Parameters:

Name	Type	Description	Default
`sequence_summary`	`SequenceSummary`	Aggregate split and label counts.	required

Returns:

Name	Type	Description
`int`	`int`	Eligible entity-sequence count under the current policy. When normal-only training is enabled, this counts only normal entities in the fixed train pool before the hold-out suffix.

`train_sequence_count_unit_hint()` ¶

Return the unit label for entity-grouped train progress.

Returns:

Name	Type	Description
`str`	`str`	Unit label for entity-grouped train progress.

`with_continuous_context(*, enabled=True)` ¶

Treat consecutive entity windows as one continuous stream.

Parameters:

Name	Type	Description	Default
`enabled`	`bool`	Whether to carry model state across entity boundaries.	`True`

Returns:

Name	Type	Description
`Self`	`Self`	Copy with updated continuity behaviour.

`with_split_fractions(train_frac, test_frac)` ¶

Return a copy with both split fractions updated together.

Parameters:

Name	Type	Description	Default
`train_frac`	`float`	Requested fraction of the total population to assign to the train prefix.	required
`test_frac`	`float`	Requested fraction reserved for the fixed test suffix.	required

Returns:

Name	Type	Description
`Self`	`Self`	Copy with updated split fractions.

`with_train_on_normal_entities_only(*, enabled=True)` ¶

Limit training sequences to entities without anomalies.

Parameters:

Name	Type	Description	Default
`enabled`	`bool`	Whether to restrict train sequences to normal entities only.	`True`

Returns:

Name	Type	Description
`Self`	`Self`	Copy with updated normal-only training behavior.

`FixedSequenceBuilder` `dataclass` ¶

Bases: NonEntitySequenceBuilder

Sequence builder for fixed-size window grouping.

Attributes:

Name	Type	Description
`window_size`	`int`	Number of rows per emitted window.
`step`	`int \| None`	Row advance between windows. `None` means non-overlapping windows.
`window_basis`	`FixedWindowBasis`	Whether windows are built over the compacted structured rows or over the raw line positions.
`window_alignment_offset`	`int`	Raw-position offset before the first full raw-position window.

`iter()` ¶

Yield fixed-size windows with the configured basis.

Yields:

Name	Type	Description
`TemplateSequence`	`TemplateSequence`	One grouped template sequence per fixed window.

`__post_init__()` ¶

Validate the requested split fractions and raw-entry split inputs.

Raises:

Type	Description
`ValueError`	If the requested split settings are inconsistent.

`build_raw_entry_split_summary()` ¶

Return diagnostics for a configured raw-entry split, if any.

Returns:

Type	Description
`RawEntrySplitSummary \| None`	RawEntrySplitSummary \| None: Raw-entry split diagnostics when a before-grouping split is configured, otherwise `None`.

`build_split_summary(*, sequence_summary)` ¶

Describe requested versus effective split semantics for one run.

Parameters:

Name	Type	Description	Default
`sequence_summary`	`SequenceSummary`	Aggregate split and label counts.	required

Returns:

Name	Type	Description
`SequenceSplitSummary`	`SequenceSplitSummary`	Requested and effective split metrics.

`count_windows()` ¶

Return the number of fixed-size windows.

Returns:

Name	Type	Description
`int`	`int`	Count of fixed-size windows implied by the sink and config.

`iter_grouped_rows()` ¶

Return rows grouped by fixed-size windows.

Returns:

Type	Description
`Iterator[Collection[StructuredLine]]`	Iterator[Collection[StructuredLine]]: Fixed-size row windows.

`iter_test_sequences()` ¶

Yield only the test suffix used for detector scoring.

Yields:

Name	Type	Description
`TemplateSequence`	`TemplateSequence`	Sequences assigned to the test split.

`iter_training_sequences()` ¶

Yield the training slice used by model fitting.

Yields:

Name	Type	Description
`TemplateSequence`	`TemplateSequence`	Sequences assigned to the training split.

`represent_with(representation)` ¶

Return a lazy builder that applies a representation per sequence.

Parameters:

Name	Type	Description	Default
`representation`	`SequenceRepresentation[TRepresentation]`	Sequence representation to apply lazily to each built sequence.	required

Returns:

Type	Description
`SequenceRepresentationView[TRepresentation]`	SequenceRepresentationView[TRepresentation]: Lazy represented view of the generated sequences.

`split_count_hint()` ¶

Return the exact split-count summary for non-entity grouping.

Returns:

Name	Type	Description
`SequenceSplitCounts`	`SequenceSplitCounts`	Exact split counts for the grouping strategy.

`split_summary_train_on_normal_entities_only()` ¶

Return split-summary metadata for entity-only normal training.

Returns:

Type	Description
`bool \| None`	bool \| None: Whether train was restricted to normal entities only,
`bool \| None`	or `None` when that concept does not apply to this builder.

`train_fraction_eligible_sequence_count(*, sequence_summary)` ¶

Return the denominator for effective train-fraction accounting.

Parameters:

Name	Type	Description	Default
`sequence_summary`	`SequenceSummary`	Aggregate split and label counts.	required

Returns:

Name	Type	Description
`int`	`int`	Count of sequences considered eligible when reporting the realised train fraction for this grouping strategy. This is not necessarily the number of sequences that were actually assigned to train.

`train_sequence_count_unit_hint()` ¶

Return the unit label for fixed-window train progress.

Returns:

Name	Type	Description
`str`	`str`	Unit label for fixed-window train progress.

`with_split_fractions(train_frac, test_frac)` ¶

Return a copy with both split fractions updated together.

Parameters:

Name	Type	Description	Default
`train_frac`	`float`	Requested fraction of the total population to assign to the train prefix.	required
`test_frac`	`float`	Requested fraction reserved for the fixed test suffix.	required

Returns:

Name	Type	Description
`Self`	`Self`	Copy with updated split fractions.

`FixedWindowBasis` ¶

Bases: str, Enum

What positional basis to use for fixed-size windows.

Attributes:

Name	Type	Description
`COMPACTED_ROWS`		Build windows over compacted structured rows.
`RAW_POSITIONS`		Build windows over raw line positions.

`NonEntitySequenceBuilder` `dataclass` ¶

Bases: SequenceBuilder

Sequence builder for non-entity grouping strategies.

This is a marker subclass to clarify when normal entity logic does not apply, such as for fixed-size or time-based windowing.

`iter()` ¶

Iterate over template sequences yielded by the configured grouping.

Yields:

Name	Type	Description
`TemplateSequence`	`TemplateSequence`	One grouped and template-enriched sequence.

`__post_init__()` ¶

Validate the requested split fractions and raw-entry split inputs.

Raises:

Type	Description
`ValueError`	If the requested split settings are inconsistent.

`build_raw_entry_split_summary()` ¶

Return diagnostics for a configured raw-entry split, if any.

Returns:

Type	Description
`RawEntrySplitSummary \| None`	RawEntrySplitSummary \| None: Raw-entry split diagnostics when a before-grouping split is configured, otherwise `None`.

`build_split_summary(*, sequence_summary)` ¶

Describe requested versus effective split semantics for one run.

Parameters:

Name	Type	Description	Default
`sequence_summary`	`SequenceSummary`	Aggregate split and label counts.	required

Returns:

Name	Type	Description
`SequenceSplitSummary`	`SequenceSplitSummary`	Requested and effective split metrics.

`count_windows()` `abstractmethod` ¶

Return the total number of windows implied by the sink and config.

Returns:

Name	Type	Description
`int`	`int`	Count of windows implied by the sink and current builder config.

`iter_grouped_rows()` `abstractmethod` ¶

Return grouped rows for the configured strategy.

Returns:

Type	Description
`Iterator[Collection[StructuredLine]]`	Iterator[Collection[StructuredLine]]: Iterator over grouped windows of structured rows.

`iter_test_sequences()` ¶

Yield only the test suffix used for detector scoring.

Yields:

Name	Type	Description
`TemplateSequence`	`TemplateSequence`	Sequences assigned to the test split.

`iter_training_sequences()` ¶

Yield the training slice used by model fitting.

Yields:

Name	Type	Description
`TemplateSequence`	`TemplateSequence`	Sequences assigned to the training split.

`represent_with(representation)` ¶

Return a lazy builder that applies a representation per sequence.

Parameters:

Name	Type	Description	Default
`representation`	`SequenceRepresentation[TRepresentation]`	Sequence representation to apply lazily to each built sequence.	required

Returns:

Type	Description
`SequenceRepresentationView[TRepresentation]`	SequenceRepresentationView[TRepresentation]: Lazy represented view of the generated sequences.

`split_count_hint()` ¶

Return the exact split-count summary for non-entity grouping.

Returns:

Name	Type	Description
`SequenceSplitCounts`	`SequenceSplitCounts`	Exact split counts for the grouping strategy.

`split_summary_train_on_normal_entities_only()` ¶

Return split-summary metadata for entity-only normal training.

Returns:

Type	Description
`bool \| None`	bool \| None: Whether train was restricted to normal entities only,
`bool \| None`	or `None` when that concept does not apply to this builder.

`train_fraction_eligible_sequence_count(*, sequence_summary)` ¶

Return the denominator for effective train-fraction accounting.

Parameters:

Name	Type	Description	Default
`sequence_summary`	`SequenceSummary`	Aggregate split and label counts.	required

Returns:

Name	Type	Description
`int`	`int`	Count of sequences considered eligible when reporting the realised train fraction for this grouping strategy. This is not necessarily the number of sequences that were actually assigned to train.

`train_sequence_count_unit_hint()` ¶

Return a human-readable unit label for train-count progress.

This is intended for progress reporting only. Builders should return a unit when it clarifies what the bounded train count represents, such as "entities" for entity grouping.

Returns:

Type	Description
`str \| None`	str \| None: Unit label for train-count progress when useful, otherwise `None`.

`with_split_fractions(train_frac, test_frac)` ¶

Return a copy with both split fractions updated together.

Parameters:

Name	Type	Description	Default
`train_frac`	`float`	Requested fraction of the total population to assign to the train prefix.	required
`test_frac`	`float`	Requested fraction reserved for the fixed test suffix.	required

Returns:

Name	Type	Description
`Self`	`Self`	Copy with updated split fractions.

`RawEntrySplitMode` ¶

Bases: str, Enum

Chronological raw-entry split modes supported by sequence builders.

Attributes:

Name	Type	Description
`PREFIX_COUNT`		Split by the first N raw entries.
`PREFIX_FRACTION`		Split by the first fraction of raw entries.
`PREFIX_NORMAL_FRACTION`		Split by the first fraction of normal entries.

`RawEntrySplitSummary` `dataclass` ¶

Audit summary for a chronological raw-entry split.

Attributes:

Name	Type	Description
`split_mode`	`str`	Configured raw-entry split mode.
`application_order`	`str`	Whether the split was applied before or after grouping.
`cutoff_entry_index`	`int`	Zero-based raw-entry cutoff where the test suffix begins.
`train_raw_entry_count`	`int`	Raw entries assigned to train.
`train_normal_entry_count`	`int`	Normal raw entries assigned to train.
`train_anomalous_entry_count`	`int`	Anomalous raw entries assigned to train.
`test_raw_entry_count`	`int`	Raw entries assigned to test.
`test_normal_entry_count`	`int`	Normal raw entries assigned to test.
`test_anomalous_entry_count`	`int`	Anomalous raw entries assigned to test.
`ignored_raw_entry_count`	`int`	Raw entries withheld from both train and test.
`ignored_normal_entry_count`	`int`	Normal raw entries withheld.
`ignored_anomalous_entry_count`	`int`	Anomalous raw entries withheld.
`straddling_group_count`	`int`	Number of grouped windows that crossed the split boundary.
`straddling_group_policy`	`str \| None`	Policy applied to straddling groups.

`as_dict()` ¶

Return a JSON-friendly representation.

Returns:

Type	Description
`dict[str, int \| str \| None]`	dict[str, int \| str \| None]: Serialisable split summary payload.

`SequenceBuilder` `dataclass` ¶

Bases: ABC, Iterable[TemplateSequence]

Common sequence-building behavior shared across grouping strategies.

Sequence builders stay lazy so expensive grouping, template inference, and label resolution only happen when a caller iterates. The shared base also centralises split assignment so experiment manifests can describe train/test semantics consistently across grouping modes.

Attributes:

Name	Type	Description
`sink`	`StructuredSink`	Structured sink supplying grouped rows.
`infer_template`	`Callable[[str], tuple[LogTemplate, ExtractedParameters]]`	Template inference function for row message text.
`label_for_group`	`Callable[[str], int \| None]`	Group-level anomaly label lookup by entity id.
`template_parser`	`TemplateParser \| None`	Optional parser object kept alongside `infer_template` so optimisation paths can inspect parser capabilities without peeking at bound-method metadata.
`raw_replay_state`	`_BeforeGroupingRawReplayState`	Mutable cache reused by before-grouping raw-entry split paths.
`split_mode`	`RawEntrySplitMode \| None`	Raw-entry split mode used for special reproduction protocols. `None` preserves the legacy sequence-fraction split behaviour.
`split_application_order`	`SplitApplicationOrder`	Whether the split is applied before or after grouping.
`straddling_group_policy`	`StraddlingGroupPolicy`	Policy for grouped rows that cross a raw-entry split boundary.
`train_entry_count`	`int \| None`	Requested raw-entry prefix length when `split_mode = PREFIX_COUNT`.
`train_entry_fraction`	`float \| None`	Requested raw-entry prefix fraction when `split_mode = PREFIX_FRACTION`.
`train_normal_entry_fraction`	`float \| None`	Requested normal-entry prefix fraction when `split_mode = PREFIX_NORMAL_FRACTION`.
`stream_chunk_size`	`int \| None`	Optional chunk size used by stream grouping strategies.
`train_frac`	`float`	Requested training fraction for the builder.
`test_frac`	`float`	Fixed test suffix fraction.

`iter()` `abstractmethod` ¶

Iterate over template sequences yielded by the configured grouping.

Returns:

Type	Description
`Iterator[TemplateSequence]`	Iterator[TemplateSequence]: Iterator yielding grouped and template-enriched sequences.

`__post_init__()` ¶

Validate the requested split fractions and raw-entry split inputs.

Raises:

Type	Description
`ValueError`	If the requested split settings are inconsistent.

`build_raw_entry_split_summary()` ¶

Return diagnostics for a configured raw-entry split, if any.

Returns:

Type	Description
`RawEntrySplitSummary \| None`	RawEntrySplitSummary \| None: Raw-entry split diagnostics when a before-grouping split is configured, otherwise `None`.

`build_split_summary(*, sequence_summary)` ¶

Describe requested versus effective split semantics for one run.

Parameters:

Name	Type	Description	Default
`sequence_summary`	`SequenceSummary`	Aggregate split and label counts.	required

Returns:

Name	Type	Description
`SequenceSplitSummary`	`SequenceSplitSummary`	Requested and effective split metrics.

`iter_grouped_rows()` `abstractmethod` ¶

Return grouped rows for the configured strategy.

Returns:

Type	Description
`Iterator[Collection[StructuredLine]]`	Iterator[Collection[StructuredLine]]: Iterator over grouped windows of structured rows.

`iter_test_sequences()` ¶

Yield the test slice used by model scoring.

Yields:

Name	Type	Description
`TemplateSequence`	`TemplateSequence`	Sequences assigned to the test split.

`iter_training_sequences()` ¶

Yield the training slice used by model fitting.

Yields:

Name	Type	Description
`TemplateSequence`	`TemplateSequence`	Sequences assigned to the training split.

`represent_with(representation)` ¶

Return a lazy builder that applies a representation per sequence.

Parameters:

Name	Type	Description	Default
`representation`	`SequenceRepresentation[TRepresentation]`	Sequence representation to apply lazily to each built sequence.	required

Returns:

Type	Description
`SequenceRepresentationView[TRepresentation]`	SequenceRepresentationView[TRepresentation]: Lazy represented view of the generated sequences.

`split_count_hint()` ¶

Return a cheap exact split-count summary when the builder knows it.

Returns:

Type	Description
`SequenceSplitCounts \| None`	SequenceSplitCounts \| None: Exact split counts when cheaply available, otherwise `None`.

`split_summary_train_on_normal_entities_only()` ¶

Return split-summary metadata for entity-only normal training.

Returns:

Type	Description
`bool \| None`	bool \| None: Whether train was restricted to normal entities only,
`bool \| None`	or `None` when that concept does not apply to this builder.

`train_fraction_eligible_sequence_count(*, sequence_summary)` ¶

Return the denominator for effective train-fraction accounting.

Parameters:

Name	Type	Description	Default
`sequence_summary`	`SequenceSummary`	Aggregate split and label counts.	required

Returns:

Name	Type	Description
`int`	`int`	Count of sequences considered eligible when reporting the realised train fraction for this grouping strategy. This is not necessarily the number of sequences that were actually assigned to train.

`train_sequence_count_unit_hint()` ¶

Return a human-readable unit label for train-count progress.

This is intended for progress reporting only. Builders should return a unit when it clarifies what the bounded train count represents, such as "entities" for entity grouping.

Returns:

Type	Description
`str \| None`	str \| None: Unit label for train-count progress when useful, otherwise `None`.

`with_split_fractions(train_frac, test_frac)` ¶

Return a copy with both split fractions updated together.

Parameters:

Name	Type	Description	Default
`train_frac`	`float`	Requested fraction of the total population to assign to the train prefix.	required
`test_frac`	`float`	Requested fraction reserved for the fixed test suffix.	required

Returns:

Name	Type	Description
`Self`	`Self`	Copy with updated split fractions.

`SequenceSplitCounts` `dataclass` ¶

Exact split counts for a concrete sequence builder.

Attributes:

Name	Type	Description
`total_count`	`int`	Total emitted sequence count.
`train_count`	`int`	Count assigned to the current train prefix.
`ignored_count`	`int`	Count withheld between train and test.
`test_count`	`int`	Count assigned to the fixed test suffix.

`SequenceSplitSummary` `dataclass` ¶

Serialisable summary of requested versus effective split behavior.

The requested train fraction may not equal the effective one after grouping-specific eligibility rules are applied. Persisting both protects downstream experiment manifests from silently overstating how much data was actually available for training.

Attributes:

Name	Type	Description
`requested_train_fraction`	`float`	Requested fraction provided by the caller.
`requested_test_fraction`	`float`	Requested test suffix fraction provided by the caller.
`train_on_normal_entities_only`	`bool \| None`	Whether training was restricted to normal entities only. Only applicable to entity grouping; `None` otherwise.
`train_pool_sequence_count`	`int`	Number of sequences in the chronological train candidate window before detector-specific filtering is applied.
`ineligible_train_pool_count`	`int`	Number of sequences in the train pool that were ineligible for training under the current policy.
`realised_train_sequence_count`	`int`	Number of sequences actually used for training after any detector-specific filtering.
`excluded_from_train_count`	`int`	Number of sequences withheld from the train pool before scoring, including the ignored middle band and any detector-ineligible prefix items.
`eligible_train_sequence_count`	`int`	Number of sequences in the denominator for the effective train-fraction calculation. In entity-grouped mode this is the fixed chronological train pool, or the normal-only subset of that pool when normal-only training is enabled.
`ignored_sequence_count`	`int`	Number of sequences withheld from the train pool because they fell outside the requested train prefix or were ineligible under the current filtering policy.
`effective_train_fraction_of_eligible`	`float`	Realised train fraction over the eligible set.
`effective_train_fraction_overall`	`float`	Realised train fraction over the full generated sequence population.

`as_dict()` ¶

Return a stable JSON-friendly representation.

Returns:

Type	Description
`dict[str, int \| float \| bool \| str]`	dict[str, int \| float \| bool \| str]: Serialised split summary.

`SplitApplicationOrder` ¶

Bases: str, Enum

When to apply a configured split relative to grouping.

Attributes:

Name	Type	Description
`AFTER_GROUPING`		Apply the split after grouping has produced sequences.
`BEFORE_GROUPING`		Apply the split on raw entries before grouping.

`SplitLabel` ¶

Bases: str, Enum

Dataset split membership for a sequence.

Attributes:

Name	Type	Description
`TRAIN`		Sequence belongs to the training split.
`TEST`		Sequence belongs to the evaluation/test split.
`IGNORED`		Sequence belongs to the fixed train pool but is not used for the current training prefix.

`StraddlingGroupPolicy` ¶

Bases: str, Enum

How to handle grouped rows that cross a raw-entry split boundary.

Attributes:

Name	Type	Description
`SPLIT_PARTIAL_SEQUENCES`		Emit one sequence per contiguous segment.
`ASSIGN_BY_FIRST_EVENT`		Assign the whole group by the first segment.
`ASSIGN_BY_LAST_EVENT`		Assign the whole group by the last segment.
`DROP_STRADDLERS`		Drop groups that span both sides of the split.

`TemplateSequence` `dataclass` ¶

Grouped log window before any model-specific representation is applied.

This keeps sequence semantics such as event ordering, labels, and entity membership. Model inputs derived from it live in SequenceSample.

Attributes:

Name	Type	Description
`events`	`list[tuple[str, list[str], int \| None]]`	Ordered sequence events as `(template, parameters, dt_prev_ms)` tuples.
`label`	`int`	Sequence-level anomaly label derived from rows and group labels.
`entity_ids`	`list[str]`	Unique entity ids present in the window in first-seen order.
`window_id`	`int`	Stable window identifier assigned by the builder.
`split_label`	`SplitLabel`	Dataset split assigned to the sequence.
`event_labels`	`tuple[int \| None, ...] \| None`	Optional per-event anomaly labels aligned positionally with `events`. When present, each entry may be `None` if that event has no direct label.
`training_event_mask`	`tuple[bool, ...] \| None`	Optional per-event eligibility mask for training-target selection. This is used when a preserved chronological chunk must stay intact even though only a subset of its events are valid training targets.
`evaluation_event_mask`	`tuple[bool, ...] \| None`	Optional per-event eligibility mask for scoring targets. This is used when a preserved chronological chunk must stay intact even though only a subset of its events belong to the evaluation split.
`continuous_context`	`bool`	Whether adjacent sequences should be treated as a single chronological stream for model state carryover.

`sole_entity_id` `property` ¶

Return the entity id when the sequence belongs to exactly one entity.

If multiple entities appear in the window, None is returned to avoid implying a single owning entity.

`templates` `property` ¶

Return the ordered template strings for this sequence.

`__post_init__()` ¶

Validate that any event labels stay aligned with the events.

Raises:

Type	Description
`ValueError`	If `event_labels` is provided with a different length from `events`.

`TimeSequenceBuilder` `dataclass` ¶

Bases: NonEntitySequenceBuilder

Sequence builder for time-window grouping.

Attributes:

Name	Type	Description
`time_span_ms`	`int`	Duration of each emitted window in milliseconds.
`step`	`int \| None`	Window advance in milliseconds. `None` means non-overlapping windows.

`iter()` ¶

Iterate over time windows with optional raw-entry split semantics.

Yields:

Name	Type	Description
`TemplateSequence`	`TemplateSequence`	One grouped time window, optionally segmented according to a raw-entry split applied before grouping.

`__post_init__()` ¶

Validate the requested split fractions and raw-entry split inputs.

Raises:

Type	Description
`ValueError`	If the requested split settings are inconsistent.

`build_raw_entry_split_summary()` ¶

Return diagnostics for a configured raw-entry split, if any.

Returns:

Type	Description
`RawEntrySplitSummary \| None`	RawEntrySplitSummary \| None: Raw-entry split diagnostics when a before-grouping split is configured, otherwise `None`.

`build_split_summary(*, sequence_summary)` ¶

Describe requested versus effective split semantics for one run.

Parameters:

Name	Type	Description	Default
`sequence_summary`	`SequenceSummary`	Aggregate split and label counts.	required

Returns:

Name	Type	Description
`SequenceSplitSummary`	`SequenceSplitSummary`	Requested and effective split metrics.

`count_windows()` ¶

Return the number of time windows.

Returns:

Name	Type	Description
`int`	`int`	Count of time windows implied by the sink timestamps and config.

`iter_grouped_rows()` ¶

Return rows grouped by time windows.

Returns:

Type	Description
`Iterator[Collection[StructuredLine]]`	Iterator[Collection[StructuredLine]]: Time-based row windows.

`iter_test_sequences()` ¶

Yield only the test suffix used for detector scoring.

Yields:

Name	Type	Description
`TemplateSequence`	`TemplateSequence`	Sequences assigned to the test split.

`iter_training_sequences()` ¶

Yield the training slice used by model fitting.

Yields:

Name	Type	Description
`TemplateSequence`	`TemplateSequence`	Sequences assigned to the training split.

`represent_with(representation)` ¶

Return a lazy builder that applies a representation per sequence.

Parameters:

Name	Type	Description	Default
`representation`	`SequenceRepresentation[TRepresentation]`	Sequence representation to apply lazily to each built sequence.	required

Returns:

Type	Description
`SequenceRepresentationView[TRepresentation]`	SequenceRepresentationView[TRepresentation]: Lazy represented view of the generated sequences.

`split_count_hint()` ¶

Return the exact split-count summary for non-entity grouping.

Returns:

Name	Type	Description
`SequenceSplitCounts`	`SequenceSplitCounts`	Exact split counts for the grouping strategy.

`split_summary_train_on_normal_entities_only()` ¶

Return split-summary metadata for entity-only normal training.

Returns:

Type	Description
`bool \| None`	bool \| None: Whether train was restricted to normal entities only,
`bool \| None`	or `None` when that concept does not apply to this builder.

`train_fraction_eligible_sequence_count(*, sequence_summary)` ¶

Return the denominator for effective train-fraction accounting.

Parameters:

Name	Type	Description	Default
`sequence_summary`	`SequenceSummary`	Aggregate split and label counts.	required

Returns:

Name	Type	Description
`int`	`int`	Count of sequences considered eligible when reporting the realised train fraction for this grouping strategy. This is not necessarily the number of sequences that were actually assigned to train.

`train_sequence_count_unit_hint()` ¶

Return the unit label for time-window train progress.

Returns:

Name	Type	Description
`str`	`str`	Unit label for time-window train progress.

`with_split_fractions(train_frac, test_frac)` ¶

Return a copy with both split fractions updated together.

Parameters:

Name	Type	Description	Default
`train_frac`	`float`	Requested fraction of the total population to assign to the train prefix.	required
`test_frac`	`float`	Requested fraction reserved for the fixed test suffix.	required

Returns:

Name	Type	Description
`Self`	`Self`	Copy with updated split fractions.

Sequences¶

anomalog.sequences¶

ChronologicalStreamSequenceBuilder dataclass ¶

__iter__() ¶

build_raw_entry_split_summary() ¶

build_split_summary(*, sequence_summary) ¶

count_windows() ¶

iter_grouped_rows() ¶

iter_test_sequences() ¶

iter_training_sequences() ¶

represent_with(representation) ¶

split_count_hint() ¶

split_summary_train_on_normal_entities_only() ¶

train_fraction_eligible_sequence_count(*, sequence_summary) ¶

train_sequence_count_unit_hint() ¶

with_continuous_context(*, enabled=True) ¶

with_split_fractions(train_frac, test_frac) ¶

EntitySequenceBuilder dataclass ¶

__iter__() ¶

__post_init__() ¶

build_raw_entry_split_summary() ¶

build_split_summary(*, sequence_summary) ¶

from_dataset(td) classmethod ¶

iter_grouped_rows() ¶

iter_test_sequences() ¶

iter_training_sequences() ¶

represent_with(representation) ¶

split_count_hint() ¶

split_summary_train_on_normal_entities_only() ¶

train_fraction_eligible_sequence_count(*, sequence_summary) ¶

train_sequence_count_unit_hint() ¶

with_continuous_context(*, enabled=True) ¶

with_split_fractions(train_frac, test_frac) ¶

with_train_on_normal_entities_only(*, enabled=True) ¶

FixedSequenceBuilder dataclass ¶

__iter__() ¶

__post_init__() ¶

build_raw_entry_split_summary() ¶

build_split_summary(*, sequence_summary) ¶

count_windows() ¶

iter_grouped_rows() ¶

iter_test_sequences() ¶

iter_training_sequences() ¶

represent_with(representation) ¶

split_count_hint() ¶

split_summary_train_on_normal_entities_only() ¶

train_fraction_eligible_sequence_count(*, sequence_summary) ¶

train_sequence_count_unit_hint() ¶

with_split_fractions(train_frac, test_frac) ¶

FixedWindowBasis ¶

NonEntitySequenceBuilder dataclass ¶

__iter__() ¶

__post_init__() ¶

build_raw_entry_split_summary() ¶

build_split_summary(*, sequence_summary) ¶

count_windows() abstractmethod ¶

iter_grouped_rows() abstractmethod ¶

iter_test_sequences() ¶

iter_training_sequences() ¶

represent_with(representation) ¶

split_count_hint() ¶

split_summary_train_on_normal_entities_only() ¶

train_fraction_eligible_sequence_count(*, sequence_summary) ¶

train_sequence_count_unit_hint() ¶

with_split_fractions(train_frac, test_frac) ¶

RawEntrySplitMode ¶

RawEntrySplitSummary dataclass ¶

as_dict() ¶

SequenceBuilder dataclass ¶

__iter__() abstractmethod ¶

__post_init__() ¶

build_raw_entry_split_summary() ¶

build_split_summary(*, sequence_summary) ¶

iter_grouped_rows() abstractmethod ¶

iter_test_sequences() ¶

iter_training_sequences() ¶

represent_with(representation) ¶

split_count_hint() ¶

split_summary_train_on_normal_entities_only() ¶

train_fraction_eligible_sequence_count(*, sequence_summary) ¶

`anomalog.sequences`¶

`ChronologicalStreamSequenceBuilder` `dataclass` ¶

`iter()` ¶

`build_raw_entry_split_summary()` ¶

`build_split_summary(*, sequence_summary)` ¶

`count_windows()` ¶

`iter_grouped_rows()` ¶

`iter_test_sequences()` ¶

`iter_training_sequences()` ¶

`represent_with(representation)` ¶

`split_count_hint()` ¶

`split_summary_train_on_normal_entities_only()` ¶

`train_fraction_eligible_sequence_count(*, sequence_summary)` ¶

`train_sequence_count_unit_hint()` ¶

`with_continuous_context(*, enabled=True)` ¶

`with_split_fractions(train_frac, test_frac)` ¶

`EntitySequenceBuilder` `dataclass` ¶

`iter()` ¶

`__post_init__()` ¶

`build_raw_entry_split_summary()` ¶

`build_split_summary(*, sequence_summary)` ¶

`from_dataset(td)` `classmethod` ¶

`iter_grouped_rows()` ¶

`iter_test_sequences()` ¶

`iter_training_sequences()` ¶

`represent_with(representation)` ¶

`split_count_hint()` ¶

`split_summary_train_on_normal_entities_only()` ¶

`train_fraction_eligible_sequence_count(*, sequence_summary)` ¶

`train_sequence_count_unit_hint()` ¶

`with_continuous_context(*, enabled=True)` ¶

`with_split_fractions(train_frac, test_frac)` ¶

`with_train_on_normal_entities_only(*, enabled=True)` ¶

`FixedSequenceBuilder` `dataclass` ¶

`iter()` ¶

`__post_init__()` ¶

`build_raw_entry_split_summary()` ¶

`build_split_summary(*, sequence_summary)` ¶

`count_windows()` ¶

`iter_grouped_rows()` ¶

`iter_test_sequences()` ¶

`iter_training_sequences()` ¶

`represent_with(representation)` ¶

`split_count_hint()` ¶

`split_summary_train_on_normal_entities_only()` ¶

`train_fraction_eligible_sequence_count(*, sequence_summary)` ¶

`train_sequence_count_unit_hint()` ¶

`with_split_fractions(train_frac, test_frac)` ¶

`FixedWindowBasis` ¶

`NonEntitySequenceBuilder` `dataclass` ¶

`iter()` ¶

`__post_init__()` ¶

`build_raw_entry_split_summary()` ¶

`build_split_summary(*, sequence_summary)` ¶

`count_windows()` `abstractmethod` ¶

`iter_grouped_rows()` `abstractmethod` ¶

`iter_test_sequences()` ¶

`iter_training_sequences()` ¶

`represent_with(representation)` ¶

`split_count_hint()` ¶

`split_summary_train_on_normal_entities_only()` ¶

`train_fraction_eligible_sequence_count(*, sequence_summary)` ¶

`train_sequence_count_unit_hint()` ¶

`with_split_fractions(train_frac, test_frac)` ¶

`RawEntrySplitMode` ¶

`RawEntrySplitSummary` `dataclass` ¶

`as_dict()` ¶

`SequenceBuilder` `dataclass` ¶

`iter()` `abstractmethod` ¶

`__post_init__()` ¶

`build_raw_entry_split_summary()` ¶

`build_split_summary(*, sequence_summary)` ¶

`iter_grouped_rows()` `abstractmethod` ¶

`iter_test_sequences()` ¶

`iter_training_sequences()` ¶

`represent_with(representation)` ¶

`split_count_hint()` ¶

`split_summary_train_on_normal_entities_only()` ¶

`train_fraction_eligible_sequence_count(*, sequence_summary)` ¶

`train_sequence_count_unit_hint()` ¶