V0.23 Scope - Split / Leakage Provenance Diagnostics#
Status: COMPLETE
v0.23 is a downstream supportability release for reporting provenance risks across user-supplied partitions.
Stable path:
user-supplied partition labels + FieldBatch/orbit-batch provenance
-> JSON-compatible split provenance report
-> leakage-risk diagnostics
-> optional downstream workflow summary integration
Public Runtime APIs#
v0.23 adds submodule-only runtime APIs:
pdelie.reporting.summarize_split_leakage_provenance(...)pdelie.examples.run_split_leakage_provenance_example(...)
It also extends:
pdelie.reporting.summarize_downstream_discovery_workflow(..., split_provenance=None)
No root pdelie exports are added.
Frozen Report Semantics#
Split provenance reports return JSON-compatible runtime reports with:
summary_schema_version = "0.1"summary_type = "split_leakage_provenance"partition_countssample_countprovenance_availablesource_index_traceableshift_index_traceableduplicate_source_across_partitionsduplicate_shifted_source_across_partitionsidentity_shift_cross_partition_overlappartition_pair_diagnosticsrisk_labelrisk_reasonscomponent_statusesextra_metrics
Frozen risk_label values:
no_detected_overlaptraceable_overlapmissing_provenanceinconclusive
Interpretation#
v0.23 reports detectable provenance overlap only.
no_detected_overlap: available provenance shows no same-source or same-source/same-shift overlap across distinct partitions.traceable_overlap: available provenance shows at least one source or orbit-derived sample appears across distinct partitions.missing_provenance: sample count is known but source/shift provenance is absent or insufficient.inconclusive: provenance is partially available but not enough to classify cleanly.
The helper accepts user-supplied partition labels and optional source/sample metadata. It does not choose label names or require a fixed train/heldout/test vocabulary.
Orbit-Batch Provenance Policy#
When orbit_batch is supplied, it may be:
an
OrbitBatchResulta
uniform_translation_orbit_batchreport mapping
The report uses existing source_batch_indices, shift_indices, raw shifts, and normalized shifts when present. Duplicate shifts remain preserved. Identity-shift overlap is reported separately from source-only overlap.
Non-goals#
v0.23 does not add:
no split creation
train/test split management
no leakage prevention
benchmark policy
downstream success criteria
automatic augmentation policy
file loaders
xarray.DatasetsupportPDEBench or The Well adapters
multidimensional support
nonuniform-grid support
new PDEs
KS runtime promotion
weak-form expansion
time-translation APIs
neural or callable generator APIs
operator-facing APIs
root
pdelieexports
Milestone Status#
Milestone 0: COMPLETE - scope freeze
Milestone 1: COMPLETE - semantics freeze
Milestone 2: COMPLETE - split provenance helper
Milestone 3: COMPLETE - orbit-batch risk coverage
Milestone 4: COMPLETE - workflow integration and example
Milestone 5: COMPLETE - API/public-surface audit
Milestone 6: COMPLETE - release gate and readiness
Release Gate Expectations#
v0.23 is complete only if:
partition labels are user supplied and validated as non-empty strings
partition/sample count mismatches raise typed validation errors
optional metadata is strict JSON-compatible
no-overlap, source-overlap, source-and-shift-overlap, identity-shift-overlap, missing-provenance, and partial-provenance cases are tested
summarize_downstream_discovery_workflow(...)nests split provenance reports without changing split policyexample output is JSON only
new APIs are importable from their submodules only
root
pdelieremains unchangedno split manager, leakage-prevention helper, file loader, Dataset adapter, broad backend framework, new PDE, KS runtime API, weak-form expansion, time-translation API, neural/callable API, or operator API lands