Sequences¶
Sequences are the point where a templated dataset becomes a modeling dataset.
This page covers the grouping builders, the TemplateSequence shape, and the
split semantics that determine how sequences are assigned to train and test.
>>> from anomalog.sequences import SplitLabel, TemplateSequence
>>> sequence = TemplateSequence(
... events=[("template <*>", ["x"], None), ("template <*>", ["y"], 10)],
... label=0,
... entity_ids=["node-1"],
... window_id=7,
... split_label=SplitLabel.TRAIN,
... )
>>> sequence.sole_entity_id
'node-1'
>>> sequence.templates
['template <*>', 'template <*>']
>>> sequence.split_label.value
'train'
anomalog.sequences¶
Utilities for building template sequences from structured log lines.
The module groups parsed log lines into windows (entity, fixed-size, or time-based) and decorates them with inferred templates and anomaly labels.
ChronologicalStreamSequenceBuilder
dataclass
¶
Bases: NonEntitySequenceBuilder
Sequence builder for chronological raw-entry stream chunks.
Attributes:
| Name | Type | Description |
|---|---|---|
chunk_size |
int
|
Maximum number of raw entries per emitted chunk. |
continuous_context |
bool
|
Whether adjacent chunks should carry model state across sequence boundaries. |
__iter__()
¶
Iterate over chronological stream chunks with optional raw splits.
Yields:
| Name | Type | Description |
|---|---|---|
TemplateSequence |
TemplateSequence
|
One preserved chronological chunk per emitted |
TemplateSequence
|
sequence. When a raw-entry split is active, per-event training |
|
TemplateSequence
|
eligibility is attached through |
|
TemplateSequence
|
fragmenting the chunk. |
build_raw_entry_split_summary()
¶
Return diagnostics for a configured raw-entry split, if any.
Returns:
| Type | Description |
|---|---|
RawEntrySplitSummary | None
|
RawEntrySplitSummary | None: Raw-entry split diagnostics when a
before-grouping split is configured, otherwise |
build_split_summary(*, sequence_summary)
¶
Describe requested versus effective split semantics for one run.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sequence_summary
|
SequenceSummary
|
Aggregate split and label counts. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
SequenceSplitSummary |
SequenceSplitSummary
|
Requested and effective split metrics. |
count_windows()
¶
Return the number of chronological stream chunks.
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
Count of chronological stream chunks implied by the sink. |
iter_grouped_rows()
¶
Return rows grouped into deterministic chronological chunks.
Returns:
| Type | Description |
|---|---|
Iterator[Collection[StructuredLine]]
|
Iterator[Collection[StructuredLine]]: Deterministic chronological chunks of structured rows. |
iter_test_sequences()
¶
Yield only the test suffix used for detector scoring.
Yields:
| Name | Type | Description |
|---|---|---|
TemplateSequence |
TemplateSequence
|
Sequences assigned to the test split. |
iter_training_sequences()
¶
Yield the training slice used by model fitting.
Yields:
| Name | Type | Description |
|---|---|---|
TemplateSequence |
TemplateSequence
|
Sequences assigned to the training split. |
represent_with(representation)
¶
Return a lazy builder that applies a representation per sequence.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
representation
|
SequenceRepresentation[TRepresentation]
|
Sequence representation to apply lazily to each built sequence. |
required |
Returns:
| Type | Description |
|---|---|
SequenceRepresentationView[TRepresentation]
|
SequenceRepresentationView[TRepresentation]: Lazy represented view of the generated sequences. |
split_count_hint()
¶
Return the exact split-count summary for non-entity grouping.
Returns:
| Name | Type | Description |
|---|---|---|
SequenceSplitCounts |
SequenceSplitCounts
|
Exact split counts for the grouping strategy. |
split_summary_train_on_normal_entities_only()
¶
train_fraction_eligible_sequence_count(*, sequence_summary)
¶
Return the denominator for effective train-fraction accounting.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sequence_summary
|
SequenceSummary
|
Aggregate split and label counts. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
Count of sequences considered eligible when reporting the realised train fraction for this grouping strategy. This is not necessarily the number of sequences that were actually assigned to train. |
train_sequence_count_unit_hint()
¶
Return the unit label for stream chunks.
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
Human-readable unit label for stream progress. |
with_continuous_context(*, enabled=True)
¶
Treat consecutive stream chunks as one continuous stream.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
enabled
|
bool
|
Whether to carry model state across chunk boundaries. |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
Self |
Self
|
Copy with updated continuity behaviour. |
with_split_fractions(train_frac, test_frac)
¶
Return a copy with both split fractions updated together.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
train_frac
|
float
|
Requested fraction of the total population to assign to the train prefix. |
required |
test_frac
|
float
|
Requested fraction reserved for the fixed test suffix. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Self |
Self
|
Copy with updated split fractions. |
EntitySequenceBuilder
dataclass
¶
Bases: SequenceBuilder
Sequence builder for per-entity grouping.
Attributes:
| Name | Type | Description |
|---|---|---|
train_on_normal_entities_only |
bool
|
Whether anomalous entities are excluded from the training split budget. |
continuous_context |
bool
|
Whether adjacent entity windows should carry state across sequence boundaries. |
__iter__()
¶
Iterate over template sequences yielded by the configured grouping.
Yields:
| Name | Type | Description |
|---|---|---|
TemplateSequence |
TemplateSequence
|
One grouped and template-enriched sequence. |
__post_init__()
¶
Validate the requested split fractions and raw-entry split inputs.
Raises:
| Type | Description |
|---|---|
ValueError
|
If the requested split settings are inconsistent. |
build_raw_entry_split_summary()
¶
Return diagnostics for a configured raw-entry split, if any.
Returns:
| Type | Description |
|---|---|
RawEntrySplitSummary | None
|
RawEntrySplitSummary | None: Raw-entry split diagnostics when a
before-grouping split is configured, otherwise |
build_split_summary(*, sequence_summary)
¶
Describe requested versus effective split semantics for one run.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sequence_summary
|
SequenceSummary
|
Aggregate split and label counts. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
SequenceSplitSummary |
SequenceSplitSummary
|
Requested and effective split metrics. |
from_dataset(td)
classmethod
¶
Create an entity-grouped builder from a templated dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
td
|
TemplatedDataset
|
Templated dataset to bind into the builder. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Self |
Self
|
Builder bound to the templated dataset. |
iter_grouped_rows()
¶
Return rows grouped by entity.
Returns:
| Type | Description |
|---|---|
Iterator[Collection[StructuredLine]]
|
Iterator[Collection[StructuredLine]]: Entity-grouped structured rows. |
iter_test_sequences()
¶
Yield only the test suffix used for detector scoring.
Yields:
| Name | Type | Description |
|---|---|---|
TemplateSequence |
TemplateSequence
|
Sequences assigned to the test split. |
iter_training_sequences()
¶
Yield only the train split used for detector fitting.
For HDFS-style raw-entry prefix protocols, fitting only needs the train population implied by the selected policy. We therefore avoid replaying the full suffix and instead materialise just the selected train entities or raw-prefix rows, depending on the straddler policy.
Yields:
| Name | Type | Description |
|---|---|---|
TemplateSequence |
TemplateSequence
|
Training-split sequences yielded by the configured grouping strategy. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If the requested raw-entry split metadata is missing for a configured before-grouping prefix protocol. |
represent_with(representation)
¶
Return a lazy builder that applies a representation per sequence.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
representation
|
SequenceRepresentation[TRepresentation]
|
Sequence representation to apply lazily to each built sequence. |
required |
Returns:
| Type | Description |
|---|---|
SequenceRepresentationView[TRepresentation]
|
SequenceRepresentationView[TRepresentation]: Lazy represented view of the generated sequences. |
split_count_hint()
¶
Return the exact split-count summary for entity grouping.
Returns:
| Name | Type | Description |
|---|---|---|
SequenceSplitCounts |
SequenceSplitCounts
|
Exact split counts for the entity builder. |
split_summary_train_on_normal_entities_only()
¶
Return entity split-summary metadata for normal-only training.
Returns:
| Name | Type | Description |
|---|---|---|
bool |
bool
|
Whether train was restricted to normal entities only. |
train_fraction_eligible_sequence_count(*, sequence_summary)
¶
Return the denominator for effective train-fraction accounting.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sequence_summary
|
SequenceSummary
|
Aggregate split and label counts. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
Eligible entity-sequence count under the current policy. When normal-only training is enabled, this counts only normal entities in the fixed train pool before the hold-out suffix. |
train_sequence_count_unit_hint()
¶
Return the unit label for entity-grouped train progress.
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
Unit label for entity-grouped train progress. |
with_continuous_context(*, enabled=True)
¶
Treat consecutive entity windows as one continuous stream.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
enabled
|
bool
|
Whether to carry model state across entity boundaries. |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
Self |
Self
|
Copy with updated continuity behaviour. |
with_split_fractions(train_frac, test_frac)
¶
Return a copy with both split fractions updated together.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
train_frac
|
float
|
Requested fraction of the total population to assign to the train prefix. |
required |
test_frac
|
float
|
Requested fraction reserved for the fixed test suffix. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Self |
Self
|
Copy with updated split fractions. |
with_train_on_normal_entities_only(*, enabled=True)
¶
Limit training sequences to entities without anomalies.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
enabled
|
bool
|
Whether to restrict train sequences to normal entities only. |
True
|
Returns:
| Name | Type | Description |
|---|---|---|
Self |
Self
|
Copy with updated normal-only training behavior. |
FixedSequenceBuilder
dataclass
¶
Bases: NonEntitySequenceBuilder
Sequence builder for fixed-size window grouping.
Attributes:
| Name | Type | Description |
|---|---|---|
window_size |
int
|
Number of rows per emitted window. |
step |
int | None
|
Row advance between windows. |
window_basis |
FixedWindowBasis
|
Whether windows are built over the compacted structured rows or over the raw line positions. |
window_alignment_offset |
int
|
Raw-position offset before the first full raw-position window. |
__iter__()
¶
Yield fixed-size windows with the configured basis.
Yields:
| Name | Type | Description |
|---|---|---|
TemplateSequence |
TemplateSequence
|
One grouped template sequence per fixed window. |
__post_init__()
¶
Validate the requested split fractions and raw-entry split inputs.
Raises:
| Type | Description |
|---|---|
ValueError
|
If the requested split settings are inconsistent. |
build_raw_entry_split_summary()
¶
Return diagnostics for a configured raw-entry split, if any.
Returns:
| Type | Description |
|---|---|
RawEntrySplitSummary | None
|
RawEntrySplitSummary | None: Raw-entry split diagnostics when a
before-grouping split is configured, otherwise |
build_split_summary(*, sequence_summary)
¶
Describe requested versus effective split semantics for one run.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sequence_summary
|
SequenceSummary
|
Aggregate split and label counts. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
SequenceSplitSummary |
SequenceSplitSummary
|
Requested and effective split metrics. |
count_windows()
¶
Return the number of fixed-size windows.
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
Count of fixed-size windows implied by the sink and config. |
iter_grouped_rows()
¶
Return rows grouped by fixed-size windows.
Returns:
| Type | Description |
|---|---|
Iterator[Collection[StructuredLine]]
|
Iterator[Collection[StructuredLine]]: Fixed-size row windows. |
iter_test_sequences()
¶
Yield only the test suffix used for detector scoring.
Yields:
| Name | Type | Description |
|---|---|---|
TemplateSequence |
TemplateSequence
|
Sequences assigned to the test split. |
iter_training_sequences()
¶
Yield the training slice used by model fitting.
Yields:
| Name | Type | Description |
|---|---|---|
TemplateSequence |
TemplateSequence
|
Sequences assigned to the training split. |
represent_with(representation)
¶
Return a lazy builder that applies a representation per sequence.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
representation
|
SequenceRepresentation[TRepresentation]
|
Sequence representation to apply lazily to each built sequence. |
required |
Returns:
| Type | Description |
|---|---|
SequenceRepresentationView[TRepresentation]
|
SequenceRepresentationView[TRepresentation]: Lazy represented view of the generated sequences. |
split_count_hint()
¶
Return the exact split-count summary for non-entity grouping.
Returns:
| Name | Type | Description |
|---|---|---|
SequenceSplitCounts |
SequenceSplitCounts
|
Exact split counts for the grouping strategy. |
split_summary_train_on_normal_entities_only()
¶
train_fraction_eligible_sequence_count(*, sequence_summary)
¶
Return the denominator for effective train-fraction accounting.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sequence_summary
|
SequenceSummary
|
Aggregate split and label counts. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
Count of sequences considered eligible when reporting the realised train fraction for this grouping strategy. This is not necessarily the number of sequences that were actually assigned to train. |
train_sequence_count_unit_hint()
¶
Return the unit label for fixed-window train progress.
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
Unit label for fixed-window train progress. |
with_split_fractions(train_frac, test_frac)
¶
Return a copy with both split fractions updated together.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
train_frac
|
float
|
Requested fraction of the total population to assign to the train prefix. |
required |
test_frac
|
float
|
Requested fraction reserved for the fixed test suffix. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Self |
Self
|
Copy with updated split fractions. |
FixedWindowBasis
¶
NonEntitySequenceBuilder
dataclass
¶
Bases: SequenceBuilder
Sequence builder for non-entity grouping strategies.
This is a marker subclass to clarify when normal entity logic does not apply, such as for fixed-size or time-based windowing.
__iter__()
¶
Iterate over template sequences yielded by the configured grouping.
Yields:
| Name | Type | Description |
|---|---|---|
TemplateSequence |
TemplateSequence
|
One grouped and template-enriched sequence. |
__post_init__()
¶
Validate the requested split fractions and raw-entry split inputs.
Raises:
| Type | Description |
|---|---|
ValueError
|
If the requested split settings are inconsistent. |
build_raw_entry_split_summary()
¶
Return diagnostics for a configured raw-entry split, if any.
Returns:
| Type | Description |
|---|---|
RawEntrySplitSummary | None
|
RawEntrySplitSummary | None: Raw-entry split diagnostics when a
before-grouping split is configured, otherwise |
build_split_summary(*, sequence_summary)
¶
Describe requested versus effective split semantics for one run.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sequence_summary
|
SequenceSummary
|
Aggregate split and label counts. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
SequenceSplitSummary |
SequenceSplitSummary
|
Requested and effective split metrics. |
count_windows()
abstractmethod
¶
Return the total number of windows implied by the sink and config.
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
Count of windows implied by the sink and current builder config. |
iter_grouped_rows()
abstractmethod
¶
Return grouped rows for the configured strategy.
Returns:
| Type | Description |
|---|---|
Iterator[Collection[StructuredLine]]
|
Iterator[Collection[StructuredLine]]: Iterator over grouped windows of structured rows. |
iter_test_sequences()
¶
Yield only the test suffix used for detector scoring.
Yields:
| Name | Type | Description |
|---|---|---|
TemplateSequence |
TemplateSequence
|
Sequences assigned to the test split. |
iter_training_sequences()
¶
Yield the training slice used by model fitting.
Yields:
| Name | Type | Description |
|---|---|---|
TemplateSequence |
TemplateSequence
|
Sequences assigned to the training split. |
represent_with(representation)
¶
Return a lazy builder that applies a representation per sequence.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
representation
|
SequenceRepresentation[TRepresentation]
|
Sequence representation to apply lazily to each built sequence. |
required |
Returns:
| Type | Description |
|---|---|
SequenceRepresentationView[TRepresentation]
|
SequenceRepresentationView[TRepresentation]: Lazy represented view of the generated sequences. |
split_count_hint()
¶
Return the exact split-count summary for non-entity grouping.
Returns:
| Name | Type | Description |
|---|---|---|
SequenceSplitCounts |
SequenceSplitCounts
|
Exact split counts for the grouping strategy. |
split_summary_train_on_normal_entities_only()
¶
train_fraction_eligible_sequence_count(*, sequence_summary)
¶
Return the denominator for effective train-fraction accounting.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sequence_summary
|
SequenceSummary
|
Aggregate split and label counts. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
Count of sequences considered eligible when reporting the realised train fraction for this grouping strategy. This is not necessarily the number of sequences that were actually assigned to train. |
train_sequence_count_unit_hint()
¶
Return a human-readable unit label for train-count progress.
This is intended for progress reporting only. Builders should return a
unit when it clarifies what the bounded train count represents, such as
"entities" for entity grouping.
Returns:
| Type | Description |
|---|---|
str | None
|
str | None: Unit label for train-count progress when useful,
otherwise |
with_split_fractions(train_frac, test_frac)
¶
Return a copy with both split fractions updated together.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
train_frac
|
float
|
Requested fraction of the total population to assign to the train prefix. |
required |
test_frac
|
float
|
Requested fraction reserved for the fixed test suffix. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Self |
Self
|
Copy with updated split fractions. |
RawEntrySplitMode
¶
RawEntrySplitSummary
dataclass
¶
Audit summary for a chronological raw-entry split.
Attributes:
| Name | Type | Description |
|---|---|---|
split_mode |
str
|
Configured raw-entry split mode. |
application_order |
str
|
Whether the split was applied before or after grouping. |
cutoff_entry_index |
int
|
Zero-based raw-entry cutoff where the test suffix begins. |
train_raw_entry_count |
int
|
Raw entries assigned to train. |
train_normal_entry_count |
int
|
Normal raw entries assigned to train. |
train_anomalous_entry_count |
int
|
Anomalous raw entries assigned to train. |
test_raw_entry_count |
int
|
Raw entries assigned to test. |
test_normal_entry_count |
int
|
Normal raw entries assigned to test. |
test_anomalous_entry_count |
int
|
Anomalous raw entries assigned to test. |
ignored_raw_entry_count |
int
|
Raw entries withheld from both train and test. |
ignored_normal_entry_count |
int
|
Normal raw entries withheld. |
ignored_anomalous_entry_count |
int
|
Anomalous raw entries withheld. |
straddling_group_count |
int
|
Number of grouped windows that crossed the split boundary. |
straddling_group_policy |
str | None
|
Policy applied to straddling groups. |
SequenceBuilder
dataclass
¶
Bases: ABC, Iterable[TemplateSequence]
Common sequence-building behavior shared across grouping strategies.
Sequence builders stay lazy so expensive grouping, template inference, and label resolution only happen when a caller iterates. The shared base also centralises split assignment so experiment manifests can describe train/test semantics consistently across grouping modes.
Attributes:
| Name | Type | Description |
|---|---|---|
sink |
StructuredSink
|
Structured sink supplying grouped rows. |
infer_template |
Callable[[str], tuple[LogTemplate, ExtractedParameters]]
|
Template inference function for row message text. |
label_for_group |
Callable[[str], int | None]
|
Group-level anomaly label lookup by entity id. |
template_parser |
TemplateParser | None
|
Optional parser object kept
alongside |
raw_replay_state |
_BeforeGroupingRawReplayState
|
Mutable cache reused by before-grouping raw-entry split paths. |
split_mode |
RawEntrySplitMode | None
|
Raw-entry split mode used for
special reproduction protocols. |
split_application_order |
SplitApplicationOrder
|
Whether the split is applied before or after grouping. |
straddling_group_policy |
StraddlingGroupPolicy
|
Policy for grouped rows that cross a raw-entry split boundary. |
train_entry_count |
int | None
|
Requested raw-entry prefix length when
|
train_entry_fraction |
float | None
|
Requested raw-entry prefix
fraction when |
train_normal_entry_fraction |
float | None
|
Requested normal-entry
prefix fraction when |
stream_chunk_size |
int | None
|
Optional chunk size used by stream grouping strategies. |
train_frac |
float
|
Requested training fraction for the builder. |
test_frac |
float
|
Fixed test suffix fraction. |
__iter__()
abstractmethod
¶
Iterate over template sequences yielded by the configured grouping.
Returns:
| Type | Description |
|---|---|
Iterator[TemplateSequence]
|
Iterator[TemplateSequence]: Iterator yielding grouped and template-enriched sequences. |
__post_init__()
¶
Validate the requested split fractions and raw-entry split inputs.
Raises:
| Type | Description |
|---|---|
ValueError
|
If the requested split settings are inconsistent. |
build_raw_entry_split_summary()
¶
Return diagnostics for a configured raw-entry split, if any.
Returns:
| Type | Description |
|---|---|
RawEntrySplitSummary | None
|
RawEntrySplitSummary | None: Raw-entry split diagnostics when a
before-grouping split is configured, otherwise |
build_split_summary(*, sequence_summary)
¶
Describe requested versus effective split semantics for one run.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sequence_summary
|
SequenceSummary
|
Aggregate split and label counts. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
SequenceSplitSummary |
SequenceSplitSummary
|
Requested and effective split metrics. |
iter_grouped_rows()
abstractmethod
¶
Return grouped rows for the configured strategy.
Returns:
| Type | Description |
|---|---|
Iterator[Collection[StructuredLine]]
|
Iterator[Collection[StructuredLine]]: Iterator over grouped windows of structured rows. |
iter_test_sequences()
¶
Yield the test slice used by model scoring.
Yields:
| Name | Type | Description |
|---|---|---|
TemplateSequence |
TemplateSequence
|
Sequences assigned to the test split. |
iter_training_sequences()
¶
Yield the training slice used by model fitting.
Yields:
| Name | Type | Description |
|---|---|---|
TemplateSequence |
TemplateSequence
|
Sequences assigned to the training split. |
represent_with(representation)
¶
Return a lazy builder that applies a representation per sequence.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
representation
|
SequenceRepresentation[TRepresentation]
|
Sequence representation to apply lazily to each built sequence. |
required |
Returns:
| Type | Description |
|---|---|
SequenceRepresentationView[TRepresentation]
|
SequenceRepresentationView[TRepresentation]: Lazy represented view of the generated sequences. |
split_count_hint()
¶
Return a cheap exact split-count summary when the builder knows it.
Returns:
| Type | Description |
|---|---|
SequenceSplitCounts | None
|
SequenceSplitCounts | None: Exact split counts when cheaply
available, otherwise |
split_summary_train_on_normal_entities_only()
¶
train_fraction_eligible_sequence_count(*, sequence_summary)
¶
Return the denominator for effective train-fraction accounting.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sequence_summary
|
SequenceSummary
|
Aggregate split and label counts. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
Count of sequences considered eligible when reporting the realised train fraction for this grouping strategy. This is not necessarily the number of sequences that were actually assigned to train. |
train_sequence_count_unit_hint()
¶
Return a human-readable unit label for train-count progress.
This is intended for progress reporting only. Builders should return a
unit when it clarifies what the bounded train count represents, such as
"entities" for entity grouping.
Returns:
| Type | Description |
|---|---|
str | None
|
str | None: Unit label for train-count progress when useful,
otherwise |
with_split_fractions(train_frac, test_frac)
¶
Return a copy with both split fractions updated together.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
train_frac
|
float
|
Requested fraction of the total population to assign to the train prefix. |
required |
test_frac
|
float
|
Requested fraction reserved for the fixed test suffix. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Self |
Self
|
Copy with updated split fractions. |
SequenceSplitCounts
dataclass
¶
Exact split counts for a concrete sequence builder.
Attributes:
| Name | Type | Description |
|---|---|---|
total_count |
int
|
Total emitted sequence count. |
train_count |
int
|
Count assigned to the current train prefix. |
ignored_count |
int
|
Count withheld between train and test. |
test_count |
int
|
Count assigned to the fixed test suffix. |
SequenceSplitSummary
dataclass
¶
Serialisable summary of requested versus effective split behavior.
The requested train fraction may not equal the effective one after grouping-specific eligibility rules are applied. Persisting both protects downstream experiment manifests from silently overstating how much data was actually available for training.
Attributes:
| Name | Type | Description |
|---|---|---|
requested_train_fraction |
float
|
Requested fraction provided by the caller. |
requested_test_fraction |
float
|
Requested test suffix fraction provided by the caller. |
train_on_normal_entities_only |
bool | None
|
Whether training was restricted to
normal entities only. Only applicable to entity grouping; |
train_pool_sequence_count |
int
|
Number of sequences in the chronological train candidate window before detector-specific filtering is applied. |
ineligible_train_pool_count |
int
|
Number of sequences in the train pool that were ineligible for training under the current policy. |
realised_train_sequence_count |
int
|
Number of sequences actually used for training after any detector-specific filtering. |
excluded_from_train_count |
int
|
Number of sequences withheld from the train pool before scoring, including the ignored middle band and any detector-ineligible prefix items. |
eligible_train_sequence_count |
int
|
Number of sequences in the denominator for the effective train-fraction calculation. In entity-grouped mode this is the fixed chronological train pool, or the normal-only subset of that pool when normal-only training is enabled. |
ignored_sequence_count |
int
|
Number of sequences withheld from the train pool because they fell outside the requested train prefix or were ineligible under the current filtering policy. |
effective_train_fraction_of_eligible |
float
|
Realised train fraction over the eligible set. |
effective_train_fraction_overall |
float
|
Realised train fraction over the full generated sequence population. |
SplitApplicationOrder
¶
SplitLabel
¶
StraddlingGroupPolicy
¶
How to handle grouped rows that cross a raw-entry split boundary.
Attributes:
| Name | Type | Description |
|---|---|---|
SPLIT_PARTIAL_SEQUENCES |
Emit one sequence per contiguous segment. |
|
ASSIGN_BY_FIRST_EVENT |
Assign the whole group by the first segment. |
|
ASSIGN_BY_LAST_EVENT |
Assign the whole group by the last segment. |
|
DROP_STRADDLERS |
Drop groups that span both sides of the split. |
TemplateSequence
dataclass
¶
Grouped log window before any model-specific representation is applied.
This keeps sequence semantics such as event ordering, labels, and entity
membership. Model inputs derived from it live in SequenceSample.
Attributes:
| Name | Type | Description |
|---|---|---|
events |
list[tuple[str, list[str], int | None]]
|
Ordered sequence events
as |
label |
int
|
Sequence-level anomaly label derived from rows and group labels. |
entity_ids |
list[str]
|
Unique entity ids present in the window in first-seen order. |
window_id |
int
|
Stable window identifier assigned by the builder. |
split_label |
SplitLabel
|
Dataset split assigned to the sequence. |
event_labels |
tuple[int | None, ...] | None
|
Optional per-event anomaly
labels aligned positionally with |
training_event_mask |
tuple[bool, ...] | None
|
Optional per-event eligibility mask for training-target selection. This is used when a preserved chronological chunk must stay intact even though only a subset of its events are valid training targets. |
evaluation_event_mask |
tuple[bool, ...] | None
|
Optional per-event eligibility mask for scoring targets. This is used when a preserved chronological chunk must stay intact even though only a subset of its events belong to the evaluation split. |
continuous_context |
bool
|
Whether adjacent sequences should be treated as a single chronological stream for model state carryover. |
sole_entity_id
property
¶
Return the entity id when the sequence belongs to exactly one entity.
If multiple entities appear in the window, None is returned to avoid implying a single owning entity.
templates
property
¶
Return the ordered template strings for this sequence.
__post_init__()
¶
Validate that any event labels stay aligned with the events.
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
TimeSequenceBuilder
dataclass
¶
Bases: NonEntitySequenceBuilder
Sequence builder for time-window grouping.
Attributes:
| Name | Type | Description |
|---|---|---|
time_span_ms |
int
|
Duration of each emitted window in milliseconds. |
step |
int | None
|
Window advance in milliseconds. |
__iter__()
¶
Iterate over time windows with optional raw-entry split semantics.
Yields:
| Name | Type | Description |
|---|---|---|
TemplateSequence |
TemplateSequence
|
One grouped time window, optionally segmented according to a raw-entry split applied before grouping. |
__post_init__()
¶
Validate the requested split fractions and raw-entry split inputs.
Raises:
| Type | Description |
|---|---|
ValueError
|
If the requested split settings are inconsistent. |
build_raw_entry_split_summary()
¶
Return diagnostics for a configured raw-entry split, if any.
Returns:
| Type | Description |
|---|---|
RawEntrySplitSummary | None
|
RawEntrySplitSummary | None: Raw-entry split diagnostics when a
before-grouping split is configured, otherwise |
build_split_summary(*, sequence_summary)
¶
Describe requested versus effective split semantics for one run.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sequence_summary
|
SequenceSummary
|
Aggregate split and label counts. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
SequenceSplitSummary |
SequenceSplitSummary
|
Requested and effective split metrics. |
count_windows()
¶
Return the number of time windows.
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
Count of time windows implied by the sink timestamps and config. |
iter_grouped_rows()
¶
Return rows grouped by time windows.
Returns:
| Type | Description |
|---|---|
Iterator[Collection[StructuredLine]]
|
Iterator[Collection[StructuredLine]]: Time-based row windows. |
iter_test_sequences()
¶
Yield only the test suffix used for detector scoring.
Yields:
| Name | Type | Description |
|---|---|---|
TemplateSequence |
TemplateSequence
|
Sequences assigned to the test split. |
iter_training_sequences()
¶
Yield the training slice used by model fitting.
Yields:
| Name | Type | Description |
|---|---|---|
TemplateSequence |
TemplateSequence
|
Sequences assigned to the training split. |
represent_with(representation)
¶
Return a lazy builder that applies a representation per sequence.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
representation
|
SequenceRepresentation[TRepresentation]
|
Sequence representation to apply lazily to each built sequence. |
required |
Returns:
| Type | Description |
|---|---|
SequenceRepresentationView[TRepresentation]
|
SequenceRepresentationView[TRepresentation]: Lazy represented view of the generated sequences. |
split_count_hint()
¶
Return the exact split-count summary for non-entity grouping.
Returns:
| Name | Type | Description |
|---|---|---|
SequenceSplitCounts |
SequenceSplitCounts
|
Exact split counts for the grouping strategy. |
split_summary_train_on_normal_entities_only()
¶
train_fraction_eligible_sequence_count(*, sequence_summary)
¶
Return the denominator for effective train-fraction accounting.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sequence_summary
|
SequenceSummary
|
Aggregate split and label counts. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
int |
int
|
Count of sequences considered eligible when reporting the realised train fraction for this grouping strategy. This is not necessarily the number of sequences that were actually assigned to train. |
train_sequence_count_unit_hint()
¶
Return the unit label for time-window train progress.
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
Unit label for time-window train progress. |
with_split_fractions(train_frac, test_frac)
¶
Return a copy with both split fractions updated together.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
train_frac
|
float
|
Requested fraction of the total population to assign to the train prefix. |
required |
test_frac
|
float
|
Requested fraction reserved for the fixed test suffix. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Self |
Self
|
Copy with updated split fractions. |