V0.6 Scope#

Summary#

v0.6 is the first symmetry-guided PDE discovery utilities release for PDELie.

It does not introduce a new numerical regime.
It does not promote KdV to a stable PDE path.
It does not make weak-form methods, operator methods, external dataset ingestion, or broad adapters part of the stable library.

Instead, it asks:

Can PDELie expose a small, generic public-library layer that supports controlled symmetry-guided PDE discovery workflows in the existing canonical Heat/Burgers regime?

Frozen release definition:

PDE data -> generator family -> translation-canonical inputs -> sparse PDE discovery -> recovery metrics

v0.6 is therefore a discovery-utility release, not a broader data-ingestion release and not a broader numerics release.


Stable Scope#

Stable scope for v0.6:

  • uniform rectilinear grids only

  • synthetic PDE data only

  • scalar, periodic Heat/Burgers regime only

  • polynomial Lie point generator families only

  • current to_pysindy_trajectories(...) bridge remains the stable discovery entry shape

  • discovery recovery metrics

  • one thin PySINDy discovery adapter

  • one translation-canonical discovery-input builder

  • simple robustness utilities

  • one compact v0.6 release gate

  • no new stable canonical object

Deferred from stable v0.6:

  • stable KdV path

  • external dataset-ingestion axis

  • weak-form methods

  • operator methods

  • broad adapters

  • paper-specific experiment orchestration


Core User Story#

v0.6 should support the generic public-library portion of this workflow:

  1. start from canonical Heat/Burgers FieldBatch data

  2. start from a known, imported/coerced, or discovered translation generator family

  3. build translation-canonical discovery inputs

  4. run one thin sparse discovery adapter

  5. evaluate recovery against a known target equation with generic metrics

Unsupported in stable v0.6:

  • arbitrary symbolic-term alias resolution

  • full differential-invariant generation

  • multiple discovery backends

  • broad dataset-loading workflows

  • manuscript-facing tables, figures, or claim logic


Milestone 1 — Discovery recovery metrics#

Status: Complete

Public runtime API:

  • pdelie.discovery.evaluate_discovery_recovery(target_terms, discovered_terms, *, support_epsilon=1e-8, train_residual=None, heldout_residual=None) -> dict[str, object]

Frozen input policy:

  • target_terms and discovered_terms are mappings of canonical term string to scalar coefficient

  • exact string equality defines support identity

  • aliases, symbolic simplification, and term normalization are out of scope

  • callers are responsible for consistent naming

  • term keys must be non-empty strings

  • coefficients must be finite

  • non-finite coefficients or invalid term keys raise SchemaValidationError

  • support_epsilon must be finite and non-negative

  • support is defined by abs(coef) > support_epsilon

Frozen edge-case policy:

  • empty target support + empty discovered support = exact

  • empty target support + non-empty discovered support = failed

  • non-empty target support + empty discovered support = failed

  • classification is support-based only:

    • exact

    • partial

    • failed

Frozen outputs:

  • support precision / recall / F1

  • support exact-match flag

  • coefficient L2, relative-L2, and Linf error on union support

  • sparsity

  • train residual norm if provided

  • held-out residual norm if provided

  • normalized held-out residual if provided

  • equation-string summary


Milestone 2 — Thin PySINDy discovery adapter#

Status: Complete

Public runtime API:

  • pdelie.discovery.fit_pysindy_discovery(trajectories, time_values, feature_names, *, config=None) -> dict[str, object]

Frozen scope:

  • PySINDy only

  • continuous-time only

  • accepts only the current trajectory shape from to_pysindy_trajectories(...)

  • all trajectories must share identical shape as a frozen M2 simplification, not as a general PySINDy limitation

  • feature_names are the input trajectory columns and must be unique non-empty strings

  • default config is a private runtime deterministic PySINDy profile

  • config must be None in v0.6 M2

  • this is not a general discovery-backend framework

  • this is not yet a canonical PDE-level u_t = ... equation extractor

  • this returns a runtime backend report dict, not a JSON-compatible artifact schema

Frozen extraction policy:

  • returns library_feature_names

  • returns a dense 2D coefficients matrix with rows aligned to feature_names and columns aligned to library_feature_names

  • coefficients remain runtime NumPy arrays in v0.6 M2

  • returns sparse backend-native equation_terms

  • returns backend-native debug equation_strings

  • default coefficient threshold is 1e-8

  • raw multi-target backend matrices are out of scope

  • canonical PDE term extraction is out of scope

  • equation_terms and equation_strings must not be fed directly into evaluate_discovery_recovery(...) without a later canonicalization step

Frozen failure policy:

  • missing dependency remains ImportError

  • invalid inputs remain typed validation errors

  • backend fitting failures return status="failed" with stable failure information

Frozen returned fields:

  • status

  • backend

  • feature_names

  • library_feature_names

  • coefficients

  • equation_terms

  • equation_strings

  • fit_config

  • fit_diagnostics

  • failure_reason when failed


Milestone 3 — Translation-canonical discovery inputs#

Status: Complete

Public runtime API:

  • pdelie.discovery.build_translation_canonical_discovery_inputs(field, *, generator_family=None, invariant_spec_template=None) -> dict[str, object]

Frozen scope:

  • translation/canonical path only

  • current invariant-application path only

  • scalar variable only

  • periodic x only

  • masked fields are rejected

  • supports known/oracle translation families

  • supports imported/coerced translation families

  • supports discovered translation families

  • rejects unsupported non-translation families and non-uniform translation-like families

  • invariant_spec_template is explicit template mode, not direct fixed-shift mode

  • template parameters may be {} or {"axis": "x"}

  • template parameters must not include shift

  • no full differential-invariant generation

Frozen alignment behavior:

  • alignment is per sample

  • alignment uses the initial-time slice only

  • alignment uses values[batch, 0, :, 0]

  • peak index selection uses first-index tie-breaking

  • shift is x[0] - x[peak_index]

  • output shifts are deterministic and reported in batch order

  • alignment policy is a deterministic heuristic canonicalization rule, not a strong invariant-theoretic guarantee

Frozen implementation policy:

  • split batch inputs into single-sample fields

  • apply the existing InvariantApplier one sample at a time

  • reassemble into one batched FieldBatch

  • append exactly one batch-level preprocess-log entry instead of concatenating per-sample logs

Frozen returned fields:

  • transformed_field

  • trajectories

  • time_values

  • feature_names

  • generator_metadata

  • construction_method

  • alignment_policy

  • alignment_shifts

  • provenance


Milestone 4 — Robustness utilities#

Status: Complete

Public runtime APIs under pdelie.data:

  • add_gaussian_noise(field, *, std_fraction, seed) -> FieldBatch

  • subsample_time(field, *, stride) -> FieldBatch

  • subsample_x(field, *, stride) -> FieldBatch

  • split_batch_train_heldout(field, *, train_size, seed) -> tuple[FieldBatch, FieldBatch]

Public runtime API under pdelie.discovery:

  • summarize_recovery_grid(records) -> list[dict[str, object]]

Frozen utility policy:

  • FieldBatch in / FieldBatch out

  • deterministic noise, subsampling, and splitting

  • metadata is deep-copied

  • existing preprocess-log entries are deep-copied before appending new entries

  • coords are copied

  • var_names are copied

  • masks are preserved

  • NaN values remain NaN

  • noise is applied only to finite unmasked values

  • subsampling is stride-only

  • subsample_time may leave one time point

  • subsample_x must leave at least two x-points

  • train/heldout split is deterministic and preserves original order within each split

  • train_size is an integer count only

  • integer-like parameters accept Python int and NumPy integer types but reject bool

  • no dataframe object

  • no plot layer

  • no table-rendering system

  • summarize_recovery_grid(...) is a runtime convenience API, not a canonical artifact schema and not a manuscript table format

Frozen noise policy:

  • add_gaussian_noise(...) computes its reference RMS from finite unmasked values only

  • masked entries remain unchanged

  • existing non-finite values remain unchanged

  • no eligible finite unmasked values raises SchemaValidationError

Frozen recovery-grid policy:

  • summarize_recovery_grid(...) accepts nested records of the form {"conditions": {...}, "recovery": {...}}

  • condition values must be JSON-scalar values only

  • condition floats must be finite

  • grouped output rows are sorted deterministically using typed condition sort keys

  • recovery classification must be one of:

    • exact

    • partial

    • failed

  • required recovery metrics must be finite numeric scalars

  • optional residual means are included only when every record in a group contains that field

Frozen preprocess-log policy:

  • each utility appends exactly one new plain-dict entry

  • each entry includes at least:

    • operation

    • parameters

  • no broader preprocess-log schema is introduced in v0.6


Milestone 5 — V0.6 release gate#

Status: Complete

The v0.6 release gate remains compact and representative only.

Frozen representative slices:

  • one discovery-metrics slice

  • one deterministic Heat smoke fit

  • one deterministic Burgers smoke fit

  • one raw/vanilla input slice

  • one oracle/known translation family input slice

  • one imported/coerced translation family input slice

  • one robustness slice

Frozen release-gate policy:

  • no discovery superiority claim

  • no exact PySINDy model-string assertions

  • no KdV stable-surface addition

  • no broad performance claim beyond finite, reproducible, structurally valid outputs

  • the dedicated v0_6-release-gate CI job is the authoritative release gate for this milestone


Release Gate#

v0.6 is releasable only if:

  • discovery recovery metrics are deterministic and typed

  • the thin PySINDy adapter runs reproducibly in the current scalar periodic regime

  • translation-canonical inputs are deterministic for representative known/imported translation families

  • robustness utilities preserve FieldBatch validity, coordinate-copying behavior, mask handling, and preprocess-log behavior

  • Heat/Burgers stable paths still pass unchanged

  • no new canonical object is introduced

  • no stable KdV surface is added

  • no weak-form, operator, broad-adapter, or manuscript-facing feature is required


KdV Policy#

KdV is explicitly deferred from stable v0.6 scope.

v0.5 established only a tests-first feasibility result. That does not promote KdV into the stable library.

Stable v0.6 therefore does not include:

  • stable third-derivative backend support

  • stable synthetic KdV data generation

  • stable KdV residual evaluation

  • stable KdV discovery coverage

  • any stable KdV public API surface

KdV may be reconsidered in a later release only under an explicit scope reset.


Paper / Private Repo Boundary#

The public pdelie repo may contain only generic discovery utilities in v0.6.

Allowed in public v0.6 scope:

  • generic recovery metrics

  • one thin PySINDy adapter

  • one translation-canonical discovery-input builder

  • generic robustness utilities

  • compact release-gate coverage

Not part of public v0.6 scope:

  • paper-specific experiment matrices

  • manuscript thresholds

  • manuscript figures or tables

  • representative-aware losses

  • private orchestration comparing many methods

  • venue-specific presentation logic


Explicit Non-goals#

  • new canonical discovery objects

  • root exports from pdelie.__init__

  • general discovery-backend framework

  • general invariant-theory engine

  • stable KdV promotion

  • external dataset-ingestion axis

  • weak-form methods

  • operator methods

  • broad adapters

  • dataframe / plotting / manuscript layer

  • paper-specific experiment matrix, thresholds, figures, or manuscript logic