Experiments¶
The experiments package houses the repository-local configuration, model,
and result helpers that sit on top of AnomaLog preprocessing.
Use this page when you want the module surface for the experiment layer rather than the workflow overview in Experiments.
>>> from experiments import ConfigError
>>> from experiments.config import load_experiment_bundle
>>> from experiments.models import model_names
>>> "naive_bayes" in model_names()
True
experiments¶
Config-driven experiment tooling for AnomaLog.
ConfigError
¶
Bases: ValueError
Raised when experiment configuration is invalid.
experiments.config¶
Public experiment config API.
CSVLabelReaderConfig
¶
Bases: LabelReaderConfig
Read anomaly labels from a CSV file.
Attributes:
| Name | Type | Description |
|---|---|---|
relative_path |
Path
|
CSV path relative to the materialised dataset root. |
entity_column |
str
|
CSV column containing the entity/group id. |
label_column |
str
|
CSV column containing the integer anomaly label. |
CachePathsConfigModel
¶
Bases: Struct
Cache/data root paths for dataset materialisation.
The configuration supports either an explicit data_root/cache_root
pair or a shorthand namespace. The shorthand expands to
data/<namespace> and .cache/<namespace> relative to the repository
root, which keeps the common case short while still allowing manual
overrides when needed.
Attributes:
| Name | Type | Description |
|---|---|---|
namespace |
str | None
|
Optional shared suffix used for both roots. |
data_root |
Path | None
|
Root for materialised raw datasets. |
cache_root |
Path | None
|
Root for derived artifacts and cached outputs. |
__post_init__()
¶
Validate the shorthand and explicit cache-path forms.
Raises:
| Type | Description |
|---|---|
ConfigError
|
If the configuration mixes shorthand and explicit roots or omits one half of the explicit root pair. |
resolve(*, repo_root)
¶
Resolve cache/data roots relative to the repository root.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
repo_root
|
Path
|
Repository root used to resolve relative cache paths. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
CachePathsConfig |
CachePathsConfig
|
Concrete cache paths resolved against the repo root. |
Raises:
| Type | Description |
|---|---|
ConfigError
|
If the config is invalid or incomplete. |
ChronologicalStreamSequenceConfig
¶
Bases: SequenceConfigBase
Chronological raw-entry stream grouping configuration.
Attributes:
| Name | Type | Description |
|---|---|---|
chunk_size |
int
|
Maximum number of raw entries per emitted chunk. |
continuous_context |
bool
|
Whether adjacent chunks should carry model state across sequence boundaries. |
__post_init__()
¶
Validate the chunk size and shared split settings.
Raises:
| Type | Description |
|---|---|
ConfigError
|
If the chunk size is not positive or the shared split settings are invalid. |
apply(templated)
¶
Build a configured sequence view from a templated dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
templated
|
TemplatedDataset
|
Built templated dataset to group into sequences. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
SequenceBuilder |
SequenceBuilder
|
Sequence builder with grouping and split settings applied. |
DatasetSourceConfig
¶
Bases: Struct
Tagged config base for materialising a dataset source.
build(*, repo_root)
¶
Build the runtime dataset source.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
repo_root
|
Path
|
Repository root used to resolve relative paths. |
required |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
Always, until implemented by a concrete source config. |
manifest_entry(*, repo_root)
¶
Return a stable source manifest entry.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
repo_root
|
Path
|
Repository root used to resolve relative paths. |
required |
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
Always, until implemented by a concrete source config. |
DatasetVariantConfig
¶
Bases: Struct
Dataset preprocessing and sequence-generation configuration.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Human-readable dataset variant name. |
dataset_name |
str
|
Dataset identifier used for runtime caches/artifacts. |
preset |
str | None
|
Optional built-in dataset preset name. |
source |
DatasetSourceConfig | None
|
Source config for custom datasets. |
structured_parser |
str | None
|
Structured parser name for custom datasets. |
template_parser |
str
|
Template parser name. |
label_reader |
LabelReaderConfig | None
|
Optional anomaly label reader config. |
cache_paths |
CachePathsConfigModel | None
|
Optional cache/data root override. |
evaluation_unit |
EvaluationUnit | None
|
Optional primary evaluation abstraction for the run's headline metrics. |
sequence |
SequenceConfigBase
|
Sequence grouping and split config. |
description |
str | None
|
Optional free-text dataset description. |
__post_init__()
¶
Validate the minimum dataset config required to build a spec.
Raises:
| Type | Description |
|---|---|
ConfigError
|
If the dataset config omits required source or parser data. |
custom_dataset_components()
¶
Return the validated source/parser pair for non-preset datasets.
Returns:
| Type | Description |
|---|---|
tuple[DatasetSourceConfig, str]
|
tuple[DatasetSourceConfig, str]: Source config and structured parser name. |
Raises:
| Type | Description |
|---|---|
ConfigError
|
If the config is not a valid custom dataset definition. |
source_summary(*, repo_root)
¶
EntitySequenceConfig
¶
Bases: SequenceConfigBase
Entity-based sequence configuration.
Attributes:
| Name | Type | Description |
|---|---|---|
train_on_normal_entities_only |
bool
|
Whether anomalous entities are excluded from the training split budget. |
continuous_context |
bool
|
Whether adjacent entity windows should carry state across sequence boundaries. |
__post_init__()
¶
Validate cross-field split constraints.
Raises:
| Type | Description |
|---|---|
ConfigError
|
If the requested test suffix is invalid or leaves no room for the train prefix. |
apply(templated)
¶
Build a configured entity-grouped sequence view.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
templated
|
TemplatedDataset
|
Built templated dataset to group by entity. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
SequenceBuilder |
SequenceBuilder
|
Entity-grouped builder with split settings applied. |
ExperimentBundle
¶
Bases: Struct
Resolved concrete run config derived from a sweep or inline scenario.
Attributes:
| Name | Type | Description |
|---|---|---|
experiments_root |
Path
|
Root directory containing experiment configs. |
repo_root |
Path
|
Repository root used for path resolution. |
sweep_path |
Path
|
Resolved sweep config path. |
dataset_path |
Path
|
Resolved dataset config path. |
model_path |
Path
|
Resolved model config path. |
sweep |
ExperimentRunConfig
|
Decoded sweep or inline scenario config. |
dataset |
DatasetVariantConfig
|
Decoded dataset config. |
model |
ExperimentModelConfig
|
Decoded model config. |
concrete_name |
str
|
Deterministic label for the concrete run within the sweep. |
run_group |
str
|
Scheduling group used to batch compatible model runs together inside one manifest. |
applied_overrides |
dict[str, Any]
|
Fixed and axis overrides applied to derive the concrete run. |
experiment_name |
str | None
|
Registry experiment name when the bundle was resolved from the named registry. |
experiment_groups |
tuple[str, ...]
|
Registry groups attached to the bundle when it came from the named registry. |
normalized_config()
¶
with_experiment_metadata(*, experiment_name, experiment_groups)
¶
Return a copy annotated with registry metadata.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
experiment_name
|
str
|
Registry entry name for the selected run. |
required |
experiment_groups
|
tuple[str, ...]
|
Registry groups attached to the selected run. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
ExperimentBundle |
ExperimentBundle
|
Bundle annotated with registry provenance. |
ExperimentRegistry
dataclass
¶
Validated registry of logical experiments and reusable model sets.
Attributes:
| Name | Type | Description |
|---|---|---|
model_sets |
tuple[ModelSetDefinition, ...]
|
Loaded model-set definitions. |
experiments |
tuple[RegisteredExperiment, ...]
|
Loaded logical experiments. |
__post_init__()
¶
Build quick lookup tables for validated registry objects.
model_set(name)
¶
Return one named model set.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Model-set name to resolve. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
ModelSetDefinition |
ModelSetDefinition
|
Loaded model-set definition. |
Raises:
| Type | Description |
|---|---|
ConfigError
|
If the model set name is not present in the registry. |
names()
¶
require(name)
¶
Return one named logical experiment.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Registry experiment name to resolve. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
RegisteredExperiment |
RegisteredExperiment
|
Loaded logical experiment definition. |
Raises:
| Type | Description |
|---|---|
ConfigError
|
If the experiment name is not present in the registry. |
resolve_experiment(name, *, registry_path, repo_root)
¶
Resolve one logical experiment into concrete bundles.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Registry experiment name to resolve. |
required |
registry_path
|
Path
|
Path to the registry TOML file. |
required |
repo_root
|
Path
|
Repository root used to resolve relative paths. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
ResolvedRegistryExperiment |
ResolvedRegistryExperiment
|
Logical registry entry plus concrete bundle expansion. |
select(*, names=(), groups=())
¶
Select named experiments and/or group-filtered experiments.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
names
|
tuple[str, ...]
|
Explicit registry experiment names to include. |
()
|
groups
|
tuple[str, ...]
|
Registry groups to include. |
()
|
Returns:
| Type | Description |
|---|---|
tuple[RegisteredExperiment, ...]
|
tuple[RegisteredExperiment, ...]: Registry experiments in their original order. |
Raises:
| Type | Description |
|---|---|
ConfigError
|
If a requested name or group is unknown. |
ExperimentRunConfig
¶
Bases: Protocol
Shared runtime contract for dataset-owned experiment matrices.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Human-readable run name. |
dataset |
Any
|
Decoded dataset config. |
models |
list[Any]
|
Concrete model run entries embedded in the file. |
results_root |
Path
|
Root directory for run outputs. |
description |
str | None
|
Optional free-text run description. |
max_workers |
WorkerCount
|
Maximum concurrent concrete runs. |
FixedSequenceConfig
¶
Bases: SequenceConfigBase
Fixed-window sequence configuration.
Attributes:
| Name | Type | Description |
|---|---|---|
window_size |
int
|
Number of rows per fixed window. |
window_basis |
FixedWindowBasis
|
Whether fixed windows operate on the compacted structured rows or the raw line positions. |
window_alignment_offset |
int
|
Raw-position offset before the first full fixed window. |
__post_init__()
¶
Validate cross-field split constraints.
Raises:
| Type | Description |
|---|---|
ConfigError
|
If the requested test suffix is invalid or leaves no room for the train prefix. |
apply(templated)
¶
Build a configured sequence view from a templated dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
templated
|
TemplatedDataset
|
Built templated dataset to group into sequences. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
SequenceBuilder |
SequenceBuilder
|
Sequence builder with grouping and split settings applied. |
LabelReaderConfig
¶
Bases: Struct
Tagged config base for anomaly-label readers.
build()
¶
Build the runtime anomaly-label reader.
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
Always, until implemented by a concrete label-reader config. |
LocalDirSourceConfig
¶
Bases: DatasetSourceConfig
Use an existing local directory as the dataset root.
Attributes:
| Name | Type | Description |
|---|---|---|
path |
Path
|
Source directory, relative to the repo when not absolute. |
raw_logs_relpath |
Path | None
|
Optional raw-log path relative to the source directory. |
build(*, repo_root)
¶
Build a local-directory dataset source.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
repo_root
|
Path
|
Repository root used to resolve relative source paths. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
LocalDirSource |
LocalDirSource
|
Runtime local-directory source. |
manifest_entry(*, repo_root)
¶
LocalZipSourceConfig
¶
Bases: DatasetSourceConfig
Use a local zip archive as the dataset source.
Attributes:
| Name | Type | Description |
|---|---|---|
zip_path |
Path
|
Archive path, relative to the repo when not absolute. |
raw_logs_relpath |
Path | None
|
Optional raw-log path relative to the extracted dataset root. |
md5_checksum |
str | None
|
Optional checksum used to verify the archive. |
build(*, repo_root)
¶
Build a local-zip dataset source.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
repo_root
|
Path
|
Repository root used to resolve relative source paths. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
LocalZipSource |
LocalZipSource
|
Runtime local-zip source. |
manifest_entry(*, repo_root)
¶
RawEntryPrefixCountSplitConfig
¶
Bases: RawEntrySplitConfigBase
Split by the first N raw entries in chronological order.
Attributes:
| Name | Type | Description |
|---|---|---|
train_entry_count |
int
|
Number of raw entries to keep in the train prefix. |
RawEntryPrefixFractionSplitConfig
¶
RawEntryPrefixNormalFractionSplitConfig
¶
RegisteredExperiment
dataclass
¶
Logical experiment entry resolved from the registry.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Registry experiment name. |
dataset |
str
|
Dataset manifest name resolved by the registry. |
models |
tuple[str, ...]
|
Inline model references defined directly on the experiment. |
model_sets |
tuple[str, ...]
|
Shared model sets used by the experiment. |
groups |
tuple[str, ...]
|
Derived reporting and scheduling groups. |
overrides |
dict[str, dict[str, object]]
|
Experiment-specific model overrides keyed by model or model-set name. |
description |
str | None
|
Optional human-readable description. |
RemoteZipSourceConfig
¶
Bases: DatasetSourceConfig
Download a remote zip archive for the dataset.
Attributes:
| Name | Type | Description |
|---|---|---|
url |
str
|
Absolute URL of the dataset archive. |
md5_checksum |
str | None
|
Optional checksum for the archive. |
raw_logs_relpath |
Path | None
|
Optional raw-log path relative to the extracted dataset root. |
build(*, repo_root)
¶
Build a remote-zip dataset source.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
repo_root
|
Path
|
Repository root. Unused for remote zip sources. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
RemoteZipSource |
RemoteZipSource
|
Runtime remote-zip source. |
ResolvedRegistryExperiment
dataclass
¶
Resolved registry experiment and its concrete bundles.
Attributes:
| Name | Type | Description |
|---|---|---|
experiment |
RegisteredExperiment
|
Logical registry entry. |
bundles |
tuple[ExperimentBundle, ...]
|
Concrete bundle expansion. |
bundle
property
¶
Return the only bundle when a logical experiment expands to one.
Raises:
| Type | Description |
|---|---|
ConfigError
|
If the logical experiment expands to more than one concrete bundle. |
SequenceConfigBase
¶
Bases: Struct
Shared sequence-generation settings for a dataset variant.
Attributes:
| Name | Type | Description |
|---|---|---|
split |
RawEntrySplitConfig | None
|
Optional raw-entry split mode to apply before grouping. |
step |
int | None
|
Grouping-specific step between windows. |
train_fraction |
TrainFraction
|
Requested training fraction for the total sequence population. |
test_fraction |
TestFraction
|
Fixed test suffix fraction. |
__post_init__()
¶
Validate cross-field split constraints.
Raises:
| Type | Description |
|---|---|
ConfigError
|
If the requested test suffix is invalid or leaves no room for the train prefix. |
apply(templated)
¶
Build a configured sequence view from a templated dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
templated
|
TemplatedDataset
|
Built templated dataset to group into sequences. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
SequenceBuilder |
SequenceBuilder
|
Sequence builder with grouping and split settings applied. |
SplitApplicationOrder
¶
StraddlingGroupPolicy
¶
How to handle grouped rows that cross a raw-entry split boundary.
Attributes:
| Name | Type | Description |
|---|---|---|
SPLIT_PARTIAL_SEQUENCES |
Emit one sequence per contiguous segment. |
|
ASSIGN_BY_FIRST_EVENT |
Assign the whole group by the first segment. |
|
ASSIGN_BY_LAST_EVENT |
Assign the whole group by the last segment. |
|
DROP_STRADDLERS |
Drop groups that span both sides of the split. |
SweepAxisConfig
¶
SweepConfig
¶
Bases: Struct
Top-level experiment sweep configuration.
A sweep is now the authoritative experiment entrypoint. A config with no axes still represents one concrete run; axes expand that base definition into multiple concrete runs that differ only by validated overrides.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Human-readable sweep name. |
dataset |
str
|
Referenced base dataset config name. |
model |
str
|
Referenced base model config name. |
results_root |
Path
|
Root directory for run outputs. |
description |
str | None
|
Optional free-text sweep description. |
overrides |
dict[str, Any]
|
Fixed overrides applied to every concrete run generated from the sweep. |
axes |
list[SweepAxisConfig]
|
Cartesian-product axes for generating multiple concrete runs. |
max_workers |
WorkerCount
|
Maximum number of concrete runs to execute
in parallel. |
__post_init__()
¶
Validate override and execution settings.
Raises:
| Type | Description |
|---|---|
ConfigError
|
If override paths are malformed or execution settings are invalid. |
TimeSequenceConfig
¶
Bases: SequenceConfigBase
Time-window sequence configuration.
Attributes:
| Name | Type | Description |
|---|---|---|
time_span_ms |
int
|
Duration of each emitted time window in milliseconds. |
__post_init__()
¶
Validate cross-field split constraints.
Raises:
| Type | Description |
|---|---|
ConfigError
|
If the requested test suffix is invalid or leaves no room for the train prefix. |
apply(templated)
¶
Build a configured sequence view from a templated dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
templated
|
TemplatedDataset
|
Built templated dataset to group into sequences. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
SequenceBuilder |
SequenceBuilder
|
Sequence builder with grouping and split settings applied. |
load_experiment_bundles(sweep_config_path)
¶
Load a dataset-owned experiment matrix and expand it into bundles.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sweep_config_path
|
Path
|
Dataset manifest TOML path to resolve. |
required |
Returns:
| Type | Description |
|---|---|
list[ExperimentBundle]
|
list[ExperimentBundle]: Fully resolved concrete runs derived from the manifest or inline scenario. |
Raises:
| Type | Description |
|---|---|
ConfigError
|
If the manifest does not decode or is missing its root
|
load_experiment_registry(registry_path, *, repo_root=None)
¶
Load and validate the named experiment registry.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
registry_path
|
Path
|
Path to the registry TOML file. |
required |
repo_root
|
Path | None
|
Repository root used to resolve relative paths. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
ExperimentRegistry |
ExperimentRegistry
|
Validated registry with resolved model sets, and logical experiments. |
Raises:
| Type | Description |
|---|---|
ConfigError
|
If the registry file is missing or malformed, or if any referenced dataset/model config cannot be resolved. |
resolve_registry_experiment(name, *, registry_path, repo_root=None)
¶
Resolve one registry experiment into concrete bundles.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Registry experiment name to resolve. |
required |
registry_path
|
Path
|
Path to the registry TOML file. |
required |
repo_root
|
Path | None
|
Repository root used to resolve relative paths. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
ResolvedRegistryExperiment |
ResolvedRegistryExperiment
|
Logical registry entry plus concrete bundle expansion. |
serialise_config(value)
¶
Convert config structs into builtins for hashing and manifests.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
value
|
object
|
Config object or struct to serialise. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, object]
|
dict[str, object]: JSON-like builtins representation of the config. |
Raises:
| Type | Description |
|---|---|
TypeError
|
If msgspec returns a non-dict payload unexpectedly. |
experiments.models¶
Experiment model runtime exports.
EvaluationUnit
¶
Stable evaluation and prediction units used in metric reports.
Attributes:
| Name | Type | Description |
|---|---|---|
EVENT |
Individual log events. |
|
SEQUENCE |
Whole sequence or case abstractions. |
|
WINDOW |
Fixed-size sliding windows. |
|
STREAM |
A generic stream segment. |
|
NEXT_EVENT |
Next-event prediction samples. |
|
CLUSTER |
Human triage or clustering units. |
|
CHRONOLOGICAL_EVENT_STREAM |
Chronologically ordered event stream slices. |
|
CONTINUOUS_EVENT_STREAM |
Continuous event-stream slices. |
ExperimentModelConfig
¶
Bases: Struct
Tagged experiment-model config base.
detector
property
¶
Return the detector name encoded in the tagged config type.
Raises:
| Type | Description |
|---|---|
ConfigError
|
If the config type does not define a string detector tag. |
__post_init__()
¶
Reject direct construction so msgspec metadata remains authoritative.
Raises:
| Type | Description |
|---|---|
TypeError
|
If a model config is constructed directly. |
build_detector()
¶
Construct the runtime detector for this config.
Raises:
| Type | Description |
|---|---|
NotImplementedError
|
Always, until implemented by a concrete model config. |
MetricScope
¶
Stable metric blocks reported by experiment runs.
Attributes:
| Name | Type | Description |
|---|---|---|
EVENT_LEVEL_DETECTION |
Event-granularity binary detection metrics. |
|
SEQUENCE_LEVEL_DETECTION |
Sequence-granularity binary detection metrics. |
|
WINDOW_LEVEL_DETECTION |
Sliding-window binary detection metrics. |
|
STREAM_LEVEL_DETECTION |
Continuous-stream binary detection metrics. |
|
NEXT_EVENT_PREDICTION |
Next-event modelling and hit-rate metrics. |
|
CLUSTER_LEVEL_TRIAGE |
Cluster or review-group triage metrics. |
|
MANUAL_WORKLOAD_REDUCTION |
Manual review workload reduction metrics. |
|
SEMI_AUTOMATIC_WORKLOAD_REDUCTION |
Semi-automatic workload reduction metrics. |
MetricStatus
¶
Validity status for one metric block.
Attributes:
| Name | Type | Description |
|---|---|---|
VALID |
The block passed validation and can be treated as headline data. |
|
INVALID |
The block failed validation and should not be promoted. |
|
NOT_APPLICABLE |
The scope does not apply to this run. |
|
DIAGNOSTIC_ONLY |
The block is informative but not a headline metric. |
ModelRunSummary
dataclass
¶
Detector outputs and run summaries.
Attributes:
| Name | Type | Description |
|---|---|---|
metrics |
dict[str, Any]
|
Aggregate run metrics. |
model_manifest |
ModelManifest
|
Detector manifest for the run. |
sequence_summary |
SequenceSummary
|
Split and label counts for the run. |
ProgressHint
dataclass
¶
RunProgressPlan
dataclass
¶
Shared bounded-progress hints for one experiment model run.
Attributes:
| Name | Type | Description |
|---|---|---|
train |
ProgressHint | None
|
Exact fit-stage metadata when known. |
score |
ProgressHint | None
|
Exact test-scoring metadata when known. |
SequenceSummary
dataclass
¶
Counts describing the generated sequence dataset.
Attributes:
| Name | Type | Description |
|---|---|---|
sequence_count |
int
|
Total number of generated sequences. |
train_sequence_count |
int
|
Number of train-split sequences. |
test_sequence_count |
int
|
Number of test-split sequences. |
train_label_counts |
dict[int, int]
|
Train label histogram. |
test_label_counts |
dict[int, int]
|
Test label histogram. |
ignored_label_counts |
dict[int, int]
|
Label histogram for sequences withheld from the current train prefix. |
ignored_sequence_count |
int
|
Number of sequences withheld from the current train prefix between the train pool and the fixed test suffix. |
model_names()
¶
resolve_model_config_type(name)
cached
¶
Resolve a built-in model-config type by detector name.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str
|
Registered detector name. |
required |
Returns:
| Type | Description |
|---|---|
type[ExperimentModelConfig]
|
type[ExperimentModelConfig]: Registered config type for the detector. |
Raises:
| Type | Description |
|---|---|
KeyError
|
If |
ConfigError
|
If the detector module is present but its optional backend is not installed. |
ModuleNotFoundError
|
If the detector module fails to import for a reason unrelated to the registered optional dependencies. |
run_model(*, sequence_factory, config, prediction_output, logger, progress_plan=None)
¶
Fit the configured detector and stream predictions to disk.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sequence_factory
|
SequenceFactory
|
Callable producing the full sequence stream and exposing whether the stream is split-ordered. |
required |
config
|
ExperimentModelConfig
|
Model config used to build the detector. |
required |
prediction_output
|
PredictionOutputConfig
|
Prediction stream settings. |
required |
logger
|
Logger
|
Logger for progress messages. |
required |
progress_plan
|
RunProgressPlan | None
|
Exact bounded fit/scoring metadata when the caller can provide it cheaply. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
ModelRunSummary |
ModelRunSummary
|
Metrics, manifest, and sequence summary for the run. |
experiments.results¶
Result-directory management and manifest utilities.
ResultPaths
dataclass
¶
Concrete artifact paths inside a single run directory.
The run fingerprint is derived from the fully resolved config so repeated executions of the same experiment land under one deterministic fingerprint root. Keeping all artifact paths together avoids ad-hoc filename drift across result writers.
Attributes:
| Name | Type | Description |
|---|---|---|
run_fingerprint |
str
|
Stable fingerprint for the resolved run config. |
run_root |
Path
|
Deterministic fingerprint directory for the concrete run family. |
run_dir |
Path
|
Root directory containing all artifacts for the run. |
config_path |
Path
|
Serialised normalised concrete experiment config path. |
dataset_manifest_path |
Path
|
Dataset provenance manifest path. |
metrics_path |
Path
|
Detector metrics output path. |
predictions_path |
Path
|
Prediction records output path. |
environment_path |
Path
|
Environment/provenance metadata path. |
run_log_path |
Path
|
Captured run log path. |
for_bundle(bundle, *, run_attempt=None)
classmethod
¶
Create deterministic result paths for the experiment bundle.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bundle
|
ExperimentBundle
|
Resolved experiment bundle. |
required |
run_attempt
|
int | None
|
Optional 1-based attempt number written beneath the fingerprint root. When omitted, the concrete run writes directly to the fingerprint directory. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
ResultPaths |
ResultPaths
|
Deterministic run artifact paths for the bundle. |
ResultWriteContext
dataclass
¶
Inputs needed to persist one concrete experiment result bundle.
Attributes:
| Name | Type | Description |
|---|---|---|
bundle |
ExperimentBundle
|
Resolved concrete experiment bundle. |
templated |
TemplatedDataset
|
Materialised templated dataset view. |
sequences |
SequenceBuilder
|
Sequence builder used to replay the run. |
model_summary |
ModelRunSummary
|
Model-side summary for the completed run. |
result_paths |
ResultPaths
|
Deterministic output paths for the bundle. |
debug_reporting |
bool
|
Whether verbose diagnostics should be written. |
build_dataset_manifest(*, context, split_summary=None, raw_entry_split_summary=None)
¶
Build a provenance manifest for the preprocessed dataset and sequences.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
context
|
ResultWriteContext
|
Resolved run inputs and persistence targets for the run. |
required |
split_summary
|
SequenceSplitSummary | None
|
Optional precomputed split-summary metadata to reuse instead of replaying the builder. |
None
|
raw_entry_split_summary
|
RawEntrySplitSummary | None
|
Optional precomputed raw-entry split summary to reuse instead of replaying the builder. |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, object]
|
dict[str, object]: Dataset and sequence provenance manifest. |
build_environment_metadata(*, bundle, result_paths)
¶
Capture the local environment for reproducibility and provenance.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bundle
|
ExperimentBundle
|
Resolved experiment bundle. |
required |
result_paths
|
ResultPaths
|
Materialised artifact paths for the run. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, object]
|
dict[str, object]: Serialisable environment metadata. |
build_metric_metadata(*, bundle, sequences, model_summary, split_summary=None, raw_entry_split_summary=None)
¶
Build the task metadata that accompanies persisted metric blocks.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bundle
|
ExperimentBundle
|
Experiment bundle being evaluated. |
required |
sequences
|
SequenceBuilder
|
Sequence builder used for the run. |
required |
model_summary
|
ModelRunSummary
|
Model-side summary for the run. |
required |
split_summary
|
SequenceSplitSummary | None
|
Optional precomputed split-summary metadata to reuse when building the split policy. |
None
|
raw_entry_split_summary
|
RawEntrySplitSummary | None
|
Optional precomputed raw-entry split summary to reuse when building the split policy. |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, object]
|
dict[str, object]: Shared task metadata for the persisted dataset manifest and metrics report. |
build_run_metrics_report(*, bundle, sequences, model_summary, debug_reporting=False, cached_summaries=None)
¶
Build the final task-aware metric report written to metrics.json.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bundle
|
ExperimentBundle
|
Experiment bundle being evaluated. |
required |
sequences
|
SequenceBuilder
|
Sequence builder used for the run. |
required |
model_summary
|
ModelRunSummary
|
Model-side summary for the run. |
required |
debug_reporting
|
bool
|
Whether to preserve the verbose diagnostic payloads in the written metrics report. |
False
|
cached_summaries
|
_SupportsResultSummaryCache | None
|
Optional precomputed summaries to reuse when building or persisting the report. |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, object]
|
dict[str, object]: Serialised task-aware metric report with metadata and canonical metric blocks. |
build_sequence_split_summary(sequences, *, sequence_summary)
¶
Describe requested versus effective split semantics for one run.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sequences
|
SequenceBuilder
|
Sequence builder whose split semantics are being summarised. |
required |
sequence_summary
|
SequenceSummary
|
Aggregate split and label counts. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
SequenceSplitSummary |
SequenceSplitSummary
|
Requested and effective split metrics. |
prepare_result_paths(bundle, *, run_attempt=None)
¶
Create deterministic result paths for the experiment bundle.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bundle
|
ExperimentBundle
|
Resolved experiment bundle. |
required |
run_attempt
|
int | None
|
Optional 1-based attempt number written beneath the fingerprint root. When omitted, the concrete run writes directly to the fingerprint directory. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
ResultPaths |
ResultPaths
|
Deterministic run artifact paths for the bundle. |
sha256_for_file(path)
¶
stable_fingerprint(payload)
¶
write_run_outputs(*, context, split_summary=None, raw_entry_split_summary=None, metric_report=None)
¶
Persist the full experiment result bundle.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
context
|
ResultWriteContext
|
Resolved run inputs and persistence targets for the run. |
required |
split_summary
|
SequenceSplitSummary | None
|
Optional precomputed split-summary metadata to reuse when building the manifest. |
None
|
raw_entry_split_summary
|
RawEntrySplitSummary | None
|
Optional precomputed raw-entry split summary to reuse when building the manifest. |
None
|
metric_report
|
dict[str, object] | None
|
Optional precomputed metric report to reuse when writing the metrics artefact. |
None
|