V0.23 Scope - Split / Leakage Provenance Diagnostics#

Status: COMPLETE

v0.23 is a downstream supportability release for reporting provenance risks across user-supplied partitions.

Stable path:

user-supplied partition labels + FieldBatch/orbit-batch provenance
-> JSON-compatible split provenance report
-> leakage-risk diagnostics
-> optional downstream workflow summary integration

Public Runtime APIs#

v0.23 adds submodule-only runtime APIs:

  • pdelie.reporting.summarize_split_leakage_provenance(...)

  • pdelie.examples.run_split_leakage_provenance_example(...)

It also extends:

  • pdelie.reporting.summarize_downstream_discovery_workflow(..., split_provenance=None)

No root pdelie exports are added.

Frozen Report Semantics#

Split provenance reports return JSON-compatible runtime reports with:

  • summary_schema_version = "0.1"

  • summary_type = "split_leakage_provenance"

  • partition_counts

  • sample_count

  • provenance_available

  • source_index_traceable

  • shift_index_traceable

  • duplicate_source_across_partitions

  • duplicate_shifted_source_across_partitions

  • identity_shift_cross_partition_overlap

  • partition_pair_diagnostics

  • risk_label

  • risk_reasons

  • component_statuses

  • extra_metrics

Frozen risk_label values:

  • no_detected_overlap

  • traceable_overlap

  • missing_provenance

  • inconclusive

Interpretation#

v0.23 reports detectable provenance overlap only.

  • no_detected_overlap: available provenance shows no same-source or same-source/same-shift overlap across distinct partitions.

  • traceable_overlap: available provenance shows at least one source or orbit-derived sample appears across distinct partitions.

  • missing_provenance: sample count is known but source/shift provenance is absent or insufficient.

  • inconclusive: provenance is partially available but not enough to classify cleanly.

The helper accepts user-supplied partition labels and optional source/sample metadata. It does not choose label names or require a fixed train/heldout/test vocabulary.

Orbit-Batch Provenance Policy#

When orbit_batch is supplied, it may be:

  • an OrbitBatchResult

  • a uniform_translation_orbit_batch report mapping

The report uses existing source_batch_indices, shift_indices, raw shifts, and normalized shifts when present. Duplicate shifts remain preserved. Identity-shift overlap is reported separately from source-only overlap.

Non-goals#

v0.23 does not add:

  • no split creation

  • train/test split management

  • no leakage prevention

  • benchmark policy

  • downstream success criteria

  • automatic augmentation policy

  • file loaders

  • xarray.Dataset support

  • PDEBench or The Well adapters

  • multidimensional support

  • nonuniform-grid support

  • new PDEs

  • KS runtime promotion

  • weak-form expansion

  • time-translation APIs

  • neural or callable generator APIs

  • operator-facing APIs

  • root pdelie exports

Milestone Status#

  • Milestone 0: COMPLETE - scope freeze

  • Milestone 1: COMPLETE - semantics freeze

  • Milestone 2: COMPLETE - split provenance helper

  • Milestone 3: COMPLETE - orbit-batch risk coverage

  • Milestone 4: COMPLETE - workflow integration and example

  • Milestone 5: COMPLETE - API/public-surface audit

  • Milestone 6: COMPLETE - release gate and readiness

Release Gate Expectations#

v0.23 is complete only if:

  • partition labels are user supplied and validated as non-empty strings

  • partition/sample count mismatches raise typed validation errors

  • optional metadata is strict JSON-compatible

  • no-overlap, source-overlap, source-and-shift-overlap, identity-shift-overlap, missing-provenance, and partial-provenance cases are tested

  • summarize_downstream_discovery_workflow(...) nests split provenance reports without changing split policy

  • example output is JSON only

  • new APIs are importable from their submodules only

  • root pdelie remains unchanged

  • no split manager, leakage-prevention helper, file loader, Dataset adapter, broad backend framework, new PDE, KS runtime API, weak-form expansion, time-translation API, neural/callable API, or operator API lands