Experiments¶

The experiments package houses the repository-local configuration, model, and result helpers that sit on top of AnomaLog preprocessing.

Use this page when you want the module surface for the experiment layer rather than the workflow overview in Experiments.

>>> from experiments import ConfigError
>>> from experiments.config import load_experiment_bundle
>>> from experiments.models import model_names
>>> "naive_bayes" in model_names()
True

`experiments`¶

Config-driven experiment tooling for AnomaLog.

`ConfigError` ¶

Bases: ValueError

Raised when experiment configuration is invalid.

`experiments.config`¶

Public experiment config API.

`CSVLabelReaderConfig` ¶

Bases: LabelReaderConfig

Read anomaly labels from a CSV file.

Attributes:

Name	Type	Description
`relative_path`	`Path`	CSV path relative to the materialised dataset root.
`entity_column`	`str`	CSV column containing the entity/group id.
`label_column`	`str`	CSV column containing the integer anomaly label.

`build()` ¶

Build a CSV-backed anomaly label reader.

Returns:

Name	Type	Description
`CSVReader`	`CSVReader`	Runtime CSV-backed label reader.

`CachePathsConfigModel` ¶

Bases: Struct

Cache/data root paths for dataset materialisation.

The configuration supports either an explicit data_root/cache_root pair or a shorthand namespace. The shorthand expands to data/<namespace> and .cache/<namespace> relative to the repository root, which keeps the common case short while still allowing manual overrides when needed.

Attributes:

Name	Type	Description
`namespace`	`str \| None`	Optional shared suffix used for both roots.
`data_root`	`Path \| None`	Root for materialised raw datasets.
`cache_root`	`Path \| None`	Root for derived artifacts and cached outputs.

`__post_init__()` ¶

Validate the shorthand and explicit cache-path forms.

Raises:

Type	Description
`ConfigError`	If the configuration mixes shorthand and explicit roots or omits one half of the explicit root pair.

`resolve(*, repo_root)` ¶

Resolve cache/data roots relative to the repository root.

Parameters:

Name	Type	Description	Default
`repo_root`	`Path`	Repository root used to resolve relative cache paths.	required

Returns:

Name	Type	Description
`CachePathsConfig`	`CachePathsConfig`	Concrete cache paths resolved against the repo root.

Raises:

Type	Description
`ConfigError`	If the config is invalid or incomplete.

`ChronologicalStreamSequenceConfig` ¶

Bases: SequenceConfigBase

Chronological raw-entry stream grouping configuration.

Attributes:

Name	Type	Description
`chunk_size`	`int`	Maximum number of raw entries per emitted chunk.
`continuous_context`	`bool`	Whether adjacent chunks should carry model state across sequence boundaries.

`__post_init__()` ¶

Validate the chunk size and shared split settings.

Raises:

Type	Description
`ConfigError`	If the chunk size is not positive or the shared split settings are invalid.

`apply(templated)` ¶

Build a configured sequence view from a templated dataset.

Parameters:

Name	Type	Description	Default
`templated`	`TemplatedDataset`	Built templated dataset to group into sequences.	required

Returns:

Name	Type	Description
`SequenceBuilder`	`SequenceBuilder`	Sequence builder with grouping and split settings applied.

`DatasetSourceConfig` ¶

Bases: Struct

Tagged config base for materialising a dataset source.

`build(*, repo_root)` ¶

Build the runtime dataset source.

Parameters:

Name	Type	Description	Default
`repo_root`	`Path`	Repository root used to resolve relative paths.	required

Raises:

Type	Description
`NotImplementedError`	Always, until implemented by a concrete source config.

`manifest_entry(*, repo_root)` ¶

Return a stable source manifest entry.

Parameters:

Name	Type	Description	Default
`repo_root`	`Path`	Repository root used to resolve relative paths.	required

Raises:

Type	Description
`NotImplementedError`	Always, until implemented by a concrete source config.

`DatasetVariantConfig` ¶

Bases: Struct

Dataset preprocessing and sequence-generation configuration.

Attributes:

Name	Type	Description
`name`	`str`	Human-readable dataset variant name.
`dataset_name`	`str`	Dataset identifier used for runtime caches/artifacts.
`preset`	`str \| None`	Optional built-in dataset preset name.
`source`	`DatasetSourceConfig \| None`	Source config for custom datasets.
`structured_parser`	`str \| None`	Structured parser name for custom datasets.
`template_parser`	`str`	Template parser name.
`label_reader`	`LabelReaderConfig \| None`	Optional anomaly label reader config.
`cache_paths`	`CachePathsConfigModel \| None`	Optional cache/data root override.
`evaluation_unit`	`EvaluationUnit \| None`	Optional primary evaluation abstraction for the run's headline metrics.
`sequence`	`SequenceConfigBase`	Sequence grouping and split config.
`description`	`str \| None`	Optional free-text dataset description.

`__post_init__()` ¶

Validate the minimum dataset config required to build a spec.

Raises:

Type	Description
`ConfigError`	If the dataset config omits required source or parser data.

`custom_dataset_components()` ¶

Return the validated source/parser pair for non-preset datasets.

Returns:

Type	Description
`tuple[DatasetSourceConfig, str]`	tuple[DatasetSourceConfig, str]: Source config and structured parser name.

Raises:

Type	Description
`ConfigError`	If the config is not a valid custom dataset definition.

`source_summary(*, repo_root)` ¶

Return a stable source summary for manifests.

Parameters:

Name	Type	Description	Default
`repo_root`	`Path`	Repository root used to resolve relative source paths.	required

Returns:

Type	Description
`dict[str, object]`	dict[str, object]: Stable JSON-serialisable source summary.

`EntitySequenceConfig` ¶

Bases: SequenceConfigBase

Entity-based sequence configuration.

Attributes:

Name	Type	Description
`train_on_normal_entities_only`	`bool`	Whether anomalous entities are excluded from the training split budget.
`continuous_context`	`bool`	Whether adjacent entity windows should carry state across sequence boundaries.

`__post_init__()` ¶

Validate cross-field split constraints.

Raises:

Type	Description
`ConfigError`	If the requested test suffix is invalid or leaves no room for the train prefix.

`apply(templated)` ¶

Build a configured entity-grouped sequence view.

Parameters:

Name	Type	Description	Default
`templated`	`TemplatedDataset`	Built templated dataset to group by entity.	required

Returns:

Name	Type	Description
`SequenceBuilder`	`SequenceBuilder`	Entity-grouped builder with split settings applied.

`ExperimentBundle` ¶

Bases: Struct

Resolved concrete run config derived from a sweep or inline scenario.

Attributes:

Name	Type	Description
`experiments_root`	`Path`	Root directory containing experiment configs.
`repo_root`	`Path`	Repository root used for path resolution.
`sweep_path`	`Path`	Resolved sweep config path.
`dataset_path`	`Path`	Resolved dataset config path.
`model_path`	`Path`	Resolved model config path.
`sweep`	`ExperimentRunConfig`	Decoded sweep or inline scenario config.
`dataset`	`DatasetVariantConfig`	Decoded dataset config.
`model`	`ExperimentModelConfig`	Decoded model config.
`concrete_name`	`str`	Deterministic label for the concrete run within the sweep.
`run_group`	`str`	Scheduling group used to batch compatible model runs together inside one manifest.
`applied_overrides`	`dict[str, Any]`	Fixed and axis overrides applied to derive the concrete run.
`experiment_name`	`str \| None`	Registry experiment name when the bundle was resolved from the named registry.
`experiment_groups`	`tuple[str, ...]`	Registry groups attached to the bundle when it came from the named registry.

`normalized_config()` ¶

Return a JSON-like normalised config payload for manifests.

Returns:

Type	Description
`dict[str, object]`	dict[str, object]: Normalised config payload for hashing and manifests.

Raises:

Type	Description
`TypeError`	If msgspec returns a non-dict payload unexpectedly.

`with_experiment_metadata(*, experiment_name, experiment_groups)` ¶

Return a copy annotated with registry metadata.

Parameters:

Name	Type	Description	Default
`experiment_name`	`str`	Registry entry name for the selected run.	required
`experiment_groups`	`tuple[str, ...]`	Registry groups attached to the selected run.	required

Returns:

Name	Type	Description
`ExperimentBundle`	`ExperimentBundle`	Bundle annotated with registry provenance.

`ExperimentRegistry` `dataclass` ¶

Validated registry of logical experiments and reusable model sets.

Attributes:

Name	Type	Description
`model_sets`	`tuple[ModelSetDefinition, ...]`	Loaded model-set definitions.
`experiments`	`tuple[RegisteredExperiment, ...]`	Loaded logical experiments.

`__post_init__()` ¶

Build quick lookup tables for validated registry objects.

`model_set(name)` ¶

Return one named model set.

Parameters:

Name	Type	Description	Default
`name`	`str`	Model-set name to resolve.	required

Returns:

Name	Type	Description
`ModelSetDefinition`	`ModelSetDefinition`	Loaded model-set definition.

Raises:

Type	Description
`ConfigError`	If the model set name is not present in the registry.

`names()` ¶

Return experiment names in registry order.

Returns:

Type	Description
`tuple[str, ...]`	tuple[str, ...]: Experiment names in registry order.

`require(name)` ¶

Return one named logical experiment.

Parameters:

Name	Type	Description	Default
`name`	`str`	Registry experiment name to resolve.	required

Returns:

Name	Type	Description
`RegisteredExperiment`	`RegisteredExperiment`	Loaded logical experiment definition.

Raises:

Type	Description
`ConfigError`	If the experiment name is not present in the registry.

`resolve_experiment(name, *, registry_path, repo_root)` ¶

Resolve one logical experiment into concrete bundles.

Parameters:

Name	Type	Description	Default
`name`	`str`	Registry experiment name to resolve.	required
`registry_path`	`Path`	Path to the registry TOML file.	required
`repo_root`	`Path`	Repository root used to resolve relative paths.	required

Returns:

Name	Type	Description
`ResolvedRegistryExperiment`	`ResolvedRegistryExperiment`	Logical registry entry plus concrete bundle expansion.

`select(*, names=(), groups=())` ¶

Select named experiments and/or group-filtered experiments.

Parameters:

Name	Type	Description	Default
`names`	`tuple[str, ...]`	Explicit registry experiment names to include.	`()`
`groups`	`tuple[str, ...]`	Registry groups to include.	`()`

Returns:

Type	Description
`tuple[RegisteredExperiment, ...]`	tuple[RegisteredExperiment, ...]: Registry experiments in their original order.

Raises:

Type	Description
`ConfigError`	If a requested name or group is unknown.

`ExperimentRunConfig` ¶

Bases: Protocol

Shared runtime contract for dataset-owned experiment matrices.

Attributes:

Name	Type	Description
`name`	`str`	Human-readable run name.
`dataset`	`Any`	Decoded dataset config.
`models`	`list[Any]`	Concrete model run entries embedded in the file.
`results_root`	`Path`	Root directory for run outputs.
`description`	`str \| None`	Optional free-text run description.
`max_workers`	`WorkerCount`	Maximum concurrent concrete runs.

`FixedSequenceConfig` ¶

Bases: SequenceConfigBase

Fixed-window sequence configuration.

Attributes:

Name	Type	Description
`window_size`	`int`	Number of rows per fixed window.
`window_basis`	`FixedWindowBasis`	Whether fixed windows operate on the compacted structured rows or the raw line positions.
`window_alignment_offset`	`int`	Raw-position offset before the first full fixed window.

`__post_init__()` ¶

Validate cross-field split constraints.

Raises:

Type	Description
`ConfigError`	If the requested test suffix is invalid or leaves no room for the train prefix.

`apply(templated)` ¶

Build a configured sequence view from a templated dataset.

Parameters:

Name	Type	Description	Default
`templated`	`TemplatedDataset`	Built templated dataset to group into sequences.	required

Returns:

Name	Type	Description
`SequenceBuilder`	`SequenceBuilder`	Sequence builder with grouping and split settings applied.

`LabelReaderConfig` ¶

Bases: Struct

Tagged config base for anomaly-label readers.

`build()` ¶

Build the runtime anomaly-label reader.

Raises:

Type	Description
`NotImplementedError`	Always, until implemented by a concrete label-reader config.

`LocalDirSourceConfig` ¶

Bases: DatasetSourceConfig

Use an existing local directory as the dataset root.

Attributes:

Name	Type	Description
`path`	`Path`	Source directory, relative to the repo when not absolute.
`raw_logs_relpath`	`Path \| None`	Optional raw-log path relative to the source directory.

`build(*, repo_root)` ¶

Build a local-directory dataset source.

Parameters:

Name	Type	Description	Default
`repo_root`	`Path`	Repository root used to resolve relative source paths.	required

Returns:

Name	Type	Description
`LocalDirSource`	`LocalDirSource`	Runtime local-directory source.

`manifest_entry(*, repo_root)` ¶

Return a stable source manifest entry.

Parameters:

Name	Type	Description	Default
`repo_root`	`Path`	Repository root used to resolve relative source paths.	required

Returns:

Type	Description
`dict[str, str \| None]`	dict[str, str \| None]: Manifest entry for the local directory source.

`LocalZipSourceConfig` ¶

Bases: DatasetSourceConfig

Use a local zip archive as the dataset source.

Attributes:

Name	Type	Description
`zip_path`	`Path`	Archive path, relative to the repo when not absolute.
`raw_logs_relpath`	`Path \| None`	Optional raw-log path relative to the extracted dataset root.
`md5_checksum`	`str \| None`	Optional checksum used to verify the archive.

`build(*, repo_root)` ¶

Build a local-zip dataset source.

Parameters:

Name	Type	Description	Default
`repo_root`	`Path`	Repository root used to resolve relative source paths.	required

Returns:

Name	Type	Description
`LocalZipSource`	`LocalZipSource`	Runtime local-zip source.

`manifest_entry(*, repo_root)` ¶

Return a stable source manifest entry.

Parameters:

Name	Type	Description	Default
`repo_root`	`Path`	Repository root used to resolve relative source paths.	required

Returns:

Type	Description
`dict[str, str \| None]`	dict[str, str \| None]: Manifest entry for the local zip source.

`RawEntryPrefixCountSplitConfig` ¶

Bases: RawEntrySplitConfigBase

Split by the first N raw entries in chronological order.

Attributes:

Name	Type	Description
`train_entry_count`	`int`	Number of raw entries to keep in the train prefix.

`RawEntryPrefixFractionSplitConfig` ¶

Bases: RawEntrySplitConfigBase

Split by the first p fraction of raw entries in chronological order.

Attributes:

Name	Type	Description
`train_entry_fraction`	`Annotated[float, Meta(gt=0.0, le=1.0)]`	Fraction of raw entries to keep in the train prefix.

`RawEntryPrefixNormalFractionSplitConfig` ¶

Bases: RawEntrySplitConfigBase

Split by the first p fraction of normal raw entries in chronological order.

Attributes:

Name	Type	Description
`train_normal_entry_fraction`	`Annotated[float, Meta(gt=0.0, le=1.0)]`	Fraction of normal raw entries to keep in the train prefix.

`RegisteredExperiment` `dataclass` ¶

Logical experiment entry resolved from the registry.

Attributes:

Name	Type	Description
`name`	`str`	Registry experiment name.
`dataset`	`str`	Dataset manifest name resolved by the registry.
`models`	`tuple[str, ...]`	Inline model references defined directly on the experiment.
`model_sets`	`tuple[str, ...]`	Shared model sets used by the experiment.
`groups`	`tuple[str, ...]`	Derived reporting and scheduling groups.
`overrides`	`dict[str, dict[str, object]]`	Experiment-specific model overrides keyed by model or model-set name.
`description`	`str \| None`	Optional human-readable description.

`RemoteZipSourceConfig` ¶

Bases: DatasetSourceConfig

Download a remote zip archive for the dataset.

Attributes:

Name	Type	Description
`url`	`str`	Absolute URL of the dataset archive.
`md5_checksum`	`str \| None`	Optional checksum for the archive.
`raw_logs_relpath`	`Path \| None`	Optional raw-log path relative to the extracted dataset root.

`build(*, repo_root)` ¶

Build a remote-zip dataset source.

Parameters:

Name	Type	Description	Default
`repo_root`	`Path`	Repository root. Unused for remote zip sources.	required

Returns:

Name	Type	Description
`RemoteZipSource`	`RemoteZipSource`	Runtime remote-zip source.

`manifest_entry(*, repo_root)` ¶

Return a stable source manifest entry.

Parameters:

Name	Type	Description	Default
`repo_root`	`Path`	Repository root. Unused for remote zip sources.	required

Returns:

Type	Description
`dict[str, str \| None]`	dict[str, str \| None]: Manifest entry for the remote zip source.

`ResolvedRegistryExperiment` `dataclass` ¶

Resolved registry experiment and its concrete bundles.

Attributes:

Name	Type	Description
`experiment`	`RegisteredExperiment`	Logical registry entry.
`bundles`	`tuple[ExperimentBundle, ...]`	Concrete bundle expansion.

`bundle` `property` ¶

Return the only bundle when a logical experiment expands to one.

Raises:

Type	Description
`ConfigError`	If the logical experiment expands to more than one concrete bundle.

`SequenceConfigBase` ¶

Bases: Struct

Shared sequence-generation settings for a dataset variant.

Attributes:

Name	Type	Description
`split`	`RawEntrySplitConfig \| None`	Optional raw-entry split mode to apply before grouping.
`step`	`int \| None`	Grouping-specific step between windows. `None` delegates to the grouping mode's default.
`train_fraction`	`TrainFraction`	Requested training fraction for the total sequence population.
`test_fraction`	`TestFraction`	Fixed test suffix fraction.

`__post_init__()` ¶

Validate cross-field split constraints.

Raises:

Type	Description
`ConfigError`	If the requested test suffix is invalid or leaves no room for the train prefix.

`apply(templated)` ¶

Build a configured sequence view from a templated dataset.

Parameters:

Name	Type	Description	Default
`templated`	`TemplatedDataset`	Built templated dataset to group into sequences.	required

Returns:

Name	Type	Description
`SequenceBuilder`	`SequenceBuilder`	Sequence builder with grouping and split settings applied.

`SplitApplicationOrder` ¶

Bases: str, Enum

When to apply a configured split relative to grouping.

Attributes:

Name	Type	Description
`AFTER_GROUPING`		Apply the split after grouping has produced sequences.
`BEFORE_GROUPING`		Apply the split on raw entries before grouping.

`StraddlingGroupPolicy` ¶

Bases: str, Enum

How to handle grouped rows that cross a raw-entry split boundary.

Attributes:

Name	Type	Description
`SPLIT_PARTIAL_SEQUENCES`		Emit one sequence per contiguous segment.
`ASSIGN_BY_FIRST_EVENT`		Assign the whole group by the first segment.
`ASSIGN_BY_LAST_EVENT`		Assign the whole group by the last segment.
`DROP_STRADDLERS`		Drop groups that span both sides of the split.

`SweepAxisConfig` ¶

Bases: Struct

One Cartesian-product axis for a sweep.

Attributes:

Name	Type	Description
`path`	`str`	Dot-separated override path rooted at `sweep`, `dataset`, or `model`.
`values`	`SweepOverrideValues`	Concrete values to apply at that path.

`__post_init__()` ¶

Validate the override axis shape.

`SweepConfig` ¶

Bases: Struct

Top-level experiment sweep configuration.

A sweep is now the authoritative experiment entrypoint. A config with no axes still represents one concrete run; axes expand that base definition into multiple concrete runs that differ only by validated overrides.

Attributes:

Name	Type	Description
`name`	`str`	Human-readable sweep name.
`dataset`	`str`	Referenced base dataset config name.
`model`	`str`	Referenced base model config name.
`results_root`	`Path`	Root directory for run outputs.
`description`	`str \| None`	Optional free-text sweep description.
`overrides`	`dict[str, Any]`	Fixed overrides applied to every concrete run generated from the sweep.
`axes`	`list[SweepAxisConfig]`	Cartesian-product axes for generating multiple concrete runs.
`max_workers`	`WorkerCount`	Maximum number of concrete runs to execute in parallel. `"auto"` caps parallelism to the concrete run count and the machine CPU count.

`__post_init__()` ¶

Validate override and execution settings.

Raises:

Type	Description
`ConfigError`	If override paths are malformed or execution settings are invalid.

`TimeSequenceConfig` ¶

Bases: SequenceConfigBase

Time-window sequence configuration.

Attributes:

Name	Type	Description
`time_span_ms`	`int`	Duration of each emitted time window in milliseconds.

`__post_init__()` ¶

Validate cross-field split constraints.

Raises:

Type	Description
`ConfigError`	If the requested test suffix is invalid or leaves no room for the train prefix.

`apply(templated)` ¶

Build a configured sequence view from a templated dataset.

Parameters:

Name	Type	Description	Default
`templated`	`TemplatedDataset`	Built templated dataset to group into sequences.	required

Returns:

Name	Type	Description
`SequenceBuilder`	`SequenceBuilder`	Sequence builder with grouping and split settings applied.

`load_experiment_bundles(sweep_config_path)` ¶

Load a dataset-owned experiment matrix and expand it into bundles.

Parameters:

Name	Type	Description	Default
`sweep_config_path`	`Path`	Dataset manifest TOML path to resolve.	required

Returns:

Type	Description
`list[ExperimentBundle]`	list[ExperimentBundle]: Fully resolved concrete runs derived from the manifest or inline scenario.

Raises:

Type	Description
`ConfigError`	If the manifest does not decode or is missing its root `experiments` directory.

`load_experiment_registry(registry_path, *, repo_root=None)` ¶

Load and validate the named experiment registry.

Parameters:

Name	Type	Description	Default
`registry_path`	`Path`	Path to the registry TOML file.	required
`repo_root`	`Path \| None`	Repository root used to resolve relative paths.	`None`

Returns:

Name	Type	Description
`ExperimentRegistry`	`ExperimentRegistry`	Validated registry with resolved model sets, and logical experiments.

Raises:

Type	Description
`ConfigError`	If the registry file is missing or malformed, or if any referenced dataset/model config cannot be resolved.

`resolve_registry_experiment(name, *, registry_path, repo_root=None)` ¶

Resolve one registry experiment into concrete bundles.

Parameters:

Name	Type	Description	Default
`name`	`str`	Registry experiment name to resolve.	required
`registry_path`	`Path`	Path to the registry TOML file.	required
`repo_root`	`Path \| None`	Repository root used to resolve relative paths.	`None`

Returns:

Name	Type	Description
`ResolvedRegistryExperiment`	`ResolvedRegistryExperiment`	Logical registry entry plus concrete bundle expansion.

`serialise_config(value)` ¶

Convert config structs into builtins for hashing and manifests.

Parameters:

Name	Type	Description	Default
`value`	`object`	Config object or struct to serialise.	required

Returns:

Type	Description
`dict[str, object]`	dict[str, object]: JSON-like builtins representation of the config.

Raises:

Type	Description
`TypeError`	If msgspec returns a non-dict payload unexpectedly.

`experiments.models`¶

Experiment model runtime exports.

`EvaluationUnit` ¶

Bases: str, Enum

Stable evaluation and prediction units used in metric reports.

Attributes:

Name	Type	Description
`EVENT`		Individual log events.
`SEQUENCE`		Whole sequence or case abstractions.
`WINDOW`		Fixed-size sliding windows.
`STREAM`		A generic stream segment.
`NEXT_EVENT`		Next-event prediction samples.
`CLUSTER`		Human triage or clustering units.
`CHRONOLOGICAL_EVENT_STREAM`		Chronologically ordered event stream slices.
`CONTINUOUS_EVENT_STREAM`		Continuous event-stream slices.

`ExperimentModelConfig` ¶

Bases: Struct

Tagged experiment-model config base.

`detector` `property` ¶

Return the detector name encoded in the tagged config type.

Raises:

Type	Description
`ConfigError`	If the config type does not define a string detector tag.

`__post_init__()` ¶

Reject direct construction so msgspec metadata remains authoritative.

Raises:

Type	Description
`TypeError`	If a model config is constructed directly.

`build_detector()` ¶

Construct the runtime detector for this config.

Raises:

Type	Description
`NotImplementedError`	Always, until implemented by a concrete model config.

`MetricScope` ¶

Bases: str, Enum

Stable metric blocks reported by experiment runs.

Attributes:

Name	Type	Description
`EVENT_LEVEL_DETECTION`		Event-granularity binary detection metrics.
`SEQUENCE_LEVEL_DETECTION`		Sequence-granularity binary detection metrics.
`WINDOW_LEVEL_DETECTION`		Sliding-window binary detection metrics.
`STREAM_LEVEL_DETECTION`		Continuous-stream binary detection metrics.
`NEXT_EVENT_PREDICTION`		Next-event modelling and hit-rate metrics.
`CLUSTER_LEVEL_TRIAGE`		Cluster or review-group triage metrics.
`MANUAL_WORKLOAD_REDUCTION`		Manual review workload reduction metrics.
`SEMI_AUTOMATIC_WORKLOAD_REDUCTION`		Semi-automatic workload reduction metrics.

`MetricStatus` ¶

Bases: str, Enum

Validity status for one metric block.

Attributes:

Name	Type	Description
`VALID`		The block passed validation and can be treated as headline data.
`INVALID`		The block failed validation and should not be promoted.
`NOT_APPLICABLE`		The scope does not apply to this run.
`DIAGNOSTIC_ONLY`		The block is informative but not a headline metric.

`ModelRunSummary` `dataclass` ¶

Detector outputs and run summaries.

Attributes:

Name	Type	Description
`metrics`	`dict[str, Any]`	Aggregate run metrics.
`model_manifest`	`ModelManifest`	Detector manifest for the run.
`sequence_summary`	`SequenceSummary`	Split and label counts for the run.

`ProgressHint` `dataclass` ¶

Exact bounded-progress metadata for a sequence stage.

Attributes:

Name	Type	Description
`total`	`int`	Exact number of items expected in the stage.
`unit`	`str \| None`	Optional unit label shown beside the count.

`RunProgressPlan` `dataclass` ¶

Shared bounded-progress hints for one experiment model run.

Attributes:

Name	Type	Description
`train`	`ProgressHint \| None`	Exact fit-stage metadata when known.
`score`	`ProgressHint \| None`	Exact test-scoring metadata when known.

`SequenceSummary` `dataclass` ¶

Counts describing the generated sequence dataset.

Attributes:

Name	Type	Description
`sequence_count`	`int`	Total number of generated sequences.
`train_sequence_count`	`int`	Number of train-split sequences.
`test_sequence_count`	`int`	Number of test-split sequences.
`train_label_counts`	`dict[int, int]`	Train label histogram.
`test_label_counts`	`dict[int, int]`	Test label histogram.
`ignored_label_counts`	`dict[int, int]`	Label histogram for sequences withheld from the current train prefix.
`ignored_sequence_count`	`int`	Number of sequences withheld from the current train prefix between the train pool and the fixed test suffix.

`model_names()` ¶

Return supported built-in detector/model names.

Returns:

Type	Description
`tuple[str, ...]`	tuple[str, ...]: Detector names in registration order.

`resolve_model_config_type(name)` `cached` ¶

Resolve a built-in model-config type by detector name.

Parameters:

Name	Type	Description	Default
`name`	`str`	Registered detector name.	required

Returns:

Type	Description
`type[ExperimentModelConfig]`	type[ExperimentModelConfig]: Registered config type for the detector.

Raises:

Type	Description
`KeyError`	If `name` does not match a built-in detector.
`ConfigError`	If the detector module is present but its optional backend is not installed.
`ModuleNotFoundError`	If the detector module fails to import for a reason unrelated to the registered optional dependencies.

`run_model(*, sequence_factory, config, prediction_output, logger, progress_plan=None)` ¶

Fit the configured detector and stream predictions to disk.

Parameters:

Name	Type	Description	Default
`sequence_factory`	`SequenceFactory`	Callable producing the full sequence stream and exposing whether the stream is split-ordered.	required
`config`	`ExperimentModelConfig`	Model config used to build the detector.	required
`prediction_output`	`PredictionOutputConfig`	Prediction stream settings.	required
`logger`	`Logger`	Logger for progress messages.	required
`progress_plan`	`RunProgressPlan \| None`	Exact bounded fit/scoring metadata when the caller can provide it cheaply.	`None`

Returns:

Name	Type	Description
`ModelRunSummary`	`ModelRunSummary`	Metrics, manifest, and sequence summary for the run.

`experiments.results`¶

Result-directory management and manifest utilities.

`ResultPaths` `dataclass` ¶

Concrete artifact paths inside a single run directory.

The run fingerprint is derived from the fully resolved config so repeated executions of the same experiment land under one deterministic fingerprint root. Keeping all artifact paths together avoids ad-hoc filename drift across result writers.

Attributes:

Name	Type	Description
`run_fingerprint`	`str`	Stable fingerprint for the resolved run config.
`run_root`	`Path`	Deterministic fingerprint directory for the concrete run family.
`run_dir`	`Path`	Root directory containing all artifacts for the run.
`config_path`	`Path`	Serialised normalised concrete experiment config path.
`dataset_manifest_path`	`Path`	Dataset provenance manifest path.
`metrics_path`	`Path`	Detector metrics output path.
`predictions_path`	`Path`	Prediction records output path.
`environment_path`	`Path`	Environment/provenance metadata path.
`run_log_path`	`Path`	Captured run log path.

`for_bundle(bundle, *, run_attempt=None)` `classmethod` ¶

Create deterministic result paths for the experiment bundle.

Parameters:

Name	Type	Description	Default
`bundle`	`ExperimentBundle`	Resolved experiment bundle.	required
`run_attempt`	`int \| None`	Optional 1-based attempt number written beneath the fingerprint root. When omitted, the concrete run writes directly to the fingerprint directory.	`None`

Returns:

Name	Type	Description
`ResultPaths`	`ResultPaths`	Deterministic run artifact paths for the bundle.

`ResultWriteContext` `dataclass` ¶

Inputs needed to persist one concrete experiment result bundle.

Attributes:

Name	Type	Description
`bundle`	`ExperimentBundle`	Resolved concrete experiment bundle.
`templated`	`TemplatedDataset`	Materialised templated dataset view.
`sequences`	`SequenceBuilder`	Sequence builder used to replay the run.
`model_summary`	`ModelRunSummary`	Model-side summary for the completed run.
`result_paths`	`ResultPaths`	Deterministic output paths for the bundle.
`debug_reporting`	`bool`	Whether verbose diagnostics should be written.

`build_dataset_manifest(*, context, split_summary=None, raw_entry_split_summary=None)` ¶

Build a provenance manifest for the preprocessed dataset and sequences.

Parameters:

Name	Type	Description	Default
`context`	`ResultWriteContext`	Resolved run inputs and persistence targets for the run.	required
`split_summary`	`SequenceSplitSummary \| None`	Optional precomputed split-summary metadata to reuse instead of replaying the builder.	`None`
`raw_entry_split_summary`	`RawEntrySplitSummary \| None`	Optional precomputed raw-entry split summary to reuse instead of replaying the builder.	`None`

Returns:

Type	Description
`dict[str, object]`	dict[str, object]: Dataset and sequence provenance manifest.

`build_environment_metadata(*, bundle, result_paths)` ¶

Capture the local environment for reproducibility and provenance.

Parameters:

Name	Type	Description	Default
`bundle`	`ExperimentBundle`	Resolved experiment bundle.	required
`result_paths`	`ResultPaths`	Materialised artifact paths for the run.	required

Returns:

Type	Description
`dict[str, object]`	dict[str, object]: Serialisable environment metadata.

`build_metric_metadata(*, bundle, sequences, model_summary, split_summary=None, raw_entry_split_summary=None)` ¶

Build the task metadata that accompanies persisted metric blocks.

Parameters:

Name	Type	Description	Default
`bundle`	`ExperimentBundle`	Experiment bundle being evaluated.	required
`sequences`	`SequenceBuilder`	Sequence builder used for the run.	required
`model_summary`	`ModelRunSummary`	Model-side summary for the run.	required
`split_summary`	`SequenceSplitSummary \| None`	Optional precomputed split-summary metadata to reuse when building the split policy.	`None`
`raw_entry_split_summary`	`RawEntrySplitSummary \| None`	Optional precomputed raw-entry split summary to reuse when building the split policy.	`None`

Returns:

Type	Description
`dict[str, object]`	dict[str, object]: Shared task metadata for the persisted dataset manifest and metrics report.

`build_run_metrics_report(*, bundle, sequences, model_summary, debug_reporting=False, cached_summaries=None)` ¶

Build the final task-aware metric report written to metrics.json.

Parameters:

Name	Type	Description	Default
`bundle`	`ExperimentBundle`	Experiment bundle being evaluated.	required
`sequences`	`SequenceBuilder`	Sequence builder used for the run.	required
`model_summary`	`ModelRunSummary`	Model-side summary for the run.	required
`debug_reporting`	`bool`	Whether to preserve the verbose diagnostic payloads in the written metrics report.	`False`
`cached_summaries`	`_SupportsResultSummaryCache \| None`	Optional precomputed summaries to reuse when building or persisting the report.	`None`

Returns:

Type	Description
`dict[str, object]`	dict[str, object]: Serialised task-aware metric report with metadata and canonical metric blocks.

`build_sequence_split_summary(sequences, *, sequence_summary)` ¶

Describe requested versus effective split semantics for one run.

Parameters:

Name	Type	Description	Default
`sequences`	`SequenceBuilder`	Sequence builder whose split semantics are being summarised.	required
`sequence_summary`	`SequenceSummary`	Aggregate split and label counts.	required

Returns:

Name	Type	Description
`SequenceSplitSummary`	`SequenceSplitSummary`	Requested and effective split metrics.

`prepare_result_paths(bundle, *, run_attempt=None)` ¶

Create deterministic result paths for the experiment bundle.

Parameters:

Name	Type	Description	Default
`bundle`	`ExperimentBundle`	Resolved experiment bundle.	required
`run_attempt`	`int \| None`	Optional 1-based attempt number written beneath the fingerprint root. When omitted, the concrete run writes directly to the fingerprint directory.	`None`

Returns:

Name	Type	Description
`ResultPaths`	`ResultPaths`	Deterministic run artifact paths for the bundle.

`sha256_for_file(path)` ¶

Hash a file without loading it all into memory.

Parameters:

Name	Type	Description	Default
`path`	`Path`	File path to hash.	required

Returns:

Name	Type	Description
`str`	`str`	SHA-256 hex digest for the file contents.

`stable_fingerprint(payload)` ¶

Return a deterministic fingerprint for a JSON-serialisable payload.

Parameters:

Name	Type	Description	Default
`payload`	`object`	JSON-serialisable payload to fingerprint.	required

Returns:

Name	Type	Description
`str`	`str`	SHA-256 fingerprint for the serialised payload.

`write_run_outputs(*, context, split_summary=None, raw_entry_split_summary=None, metric_report=None)` ¶

Persist the full experiment result bundle.

Parameters:

Name	Type	Description	Default
`context`	`ResultWriteContext`	Resolved run inputs and persistence targets for the run.	required
`split_summary`	`SequenceSplitSummary \| None`	Optional precomputed split-summary metadata to reuse when building the manifest.	`None`
`raw_entry_split_summary`	`RawEntrySplitSummary \| None`	Optional precomputed raw-entry split summary to reuse when building the manifest.	`None`
`metric_report`	`dict[str, object] \| None`	Optional precomputed metric report to reuse when writing the metrics artefact.	`None`

Experiments¶

experiments¶

ConfigError ¶

experiments.config¶

CSVLabelReaderConfig ¶

build() ¶

CachePathsConfigModel ¶

__post_init__() ¶

resolve(*, repo_root) ¶

ChronologicalStreamSequenceConfig ¶

__post_init__() ¶

apply(templated) ¶

DatasetSourceConfig ¶

build(*, repo_root) ¶

manifest_entry(*, repo_root) ¶

DatasetVariantConfig ¶

__post_init__() ¶

custom_dataset_components() ¶

source_summary(*, repo_root) ¶

EntitySequenceConfig ¶

__post_init__() ¶

apply(templated) ¶

ExperimentBundle ¶

normalized_config() ¶

with_experiment_metadata(*, experiment_name, experiment_groups) ¶

ExperimentRegistry dataclass ¶

__post_init__() ¶

model_set(name) ¶

names() ¶

require(name) ¶

resolve_experiment(name, *, registry_path, repo_root) ¶

select(*, names=(), groups=()) ¶

ExperimentRunConfig ¶

FixedSequenceConfig ¶

__post_init__() ¶

apply(templated) ¶

LabelReaderConfig ¶

build() ¶

LocalDirSourceConfig ¶

build(*, repo_root) ¶

manifest_entry(*, repo_root) ¶

LocalZipSourceConfig ¶

build(*, repo_root) ¶

manifest_entry(*, repo_root) ¶

RawEntryPrefixCountSplitConfig ¶

RawEntryPrefixFractionSplitConfig ¶

RawEntryPrefixNormalFractionSplitConfig ¶

RegisteredExperiment dataclass ¶

RemoteZipSourceConfig ¶

build(*, repo_root) ¶

manifest_entry(*, repo_root) ¶

ResolvedRegistryExperiment dataclass ¶

bundle property ¶

SequenceConfigBase ¶

__post_init__() ¶

apply(templated) ¶

SplitApplicationOrder ¶

StraddlingGroupPolicy ¶

SweepAxisConfig ¶

__post_init__() ¶

SweepConfig ¶

__post_init__() ¶

TimeSequenceConfig ¶

__post_init__() ¶

apply(templated) ¶

load_experiment_bundles(sweep_config_path) ¶

load_experiment_registry(registry_path, *, repo_root=None) ¶

resolve_registry_experiment(name, *, registry_path, repo_root=None) ¶

serialise_config(value) ¶

experiments.models¶

EvaluationUnit ¶

ExperimentModelConfig ¶

detector property ¶

__post_init__() ¶

build_detector() ¶

MetricScope ¶

MetricStatus ¶

ModelRunSummary dataclass ¶

ProgressHint dataclass ¶

RunProgressPlan dataclass ¶

`experiments`¶

`ConfigError` ¶

`experiments.config`¶

`CSVLabelReaderConfig` ¶

`build()` ¶

`CachePathsConfigModel` ¶

`__post_init__()` ¶

`resolve(*, repo_root)` ¶

`ChronologicalStreamSequenceConfig` ¶

`__post_init__()` ¶

`apply(templated)` ¶

`DatasetSourceConfig` ¶

`build(*, repo_root)` ¶

`manifest_entry(*, repo_root)` ¶

`DatasetVariantConfig` ¶

`__post_init__()` ¶

`custom_dataset_components()` ¶

`source_summary(*, repo_root)` ¶

`EntitySequenceConfig` ¶

`__post_init__()` ¶

`apply(templated)` ¶

`ExperimentBundle` ¶

`normalized_config()` ¶

`with_experiment_metadata(*, experiment_name, experiment_groups)` ¶

`ExperimentRegistry` `dataclass` ¶

`__post_init__()` ¶

`model_set(name)` ¶

`names()` ¶

`require(name)` ¶

`resolve_experiment(name, *, registry_path, repo_root)` ¶

`select(*, names=(), groups=())` ¶

`ExperimentRunConfig` ¶

`FixedSequenceConfig` ¶

`__post_init__()` ¶

`apply(templated)` ¶

`LabelReaderConfig` ¶

`build()` ¶

`LocalDirSourceConfig` ¶

`build(*, repo_root)` ¶

`manifest_entry(*, repo_root)` ¶

`LocalZipSourceConfig` ¶

`build(*, repo_root)` ¶

`manifest_entry(*, repo_root)` ¶

`RawEntryPrefixCountSplitConfig` ¶

`RawEntryPrefixFractionSplitConfig` ¶

`RawEntryPrefixNormalFractionSplitConfig` ¶

`RegisteredExperiment` `dataclass` ¶

`RemoteZipSourceConfig` ¶

`build(*, repo_root)` ¶

`manifest_entry(*, repo_root)` ¶

`ResolvedRegistryExperiment` `dataclass` ¶

`bundle` `property` ¶

`SequenceConfigBase` ¶

`__post_init__()` ¶

`apply(templated)` ¶

`SplitApplicationOrder` ¶

`StraddlingGroupPolicy` ¶

`SweepAxisConfig` ¶

`__post_init__()` ¶

`SweepConfig` ¶

`__post_init__()` ¶

`TimeSequenceConfig` ¶

`__post_init__()` ¶

`apply(templated)` ¶

`load_experiment_bundles(sweep_config_path)` ¶

`load_experiment_registry(registry_path, *, repo_root=None)` ¶

`resolve_registry_experiment(name, *, registry_path, repo_root=None)` ¶

`serialise_config(value)` ¶

`experiments.models`¶

`EvaluationUnit` ¶

`ExperimentModelConfig` ¶

`detector` `property` ¶

`__post_init__()` ¶

`build_detector()` ¶

`MetricScope` ¶

`MetricStatus` ¶

`ModelRunSummary` `dataclass` ¶

`ProgressHint` `dataclass` ¶

`RunProgressPlan` `dataclass` ¶

`SequenceSummary` `dataclass` ¶