# V0.6 Scope ## Summary `v0.6` is the first **symmetry-guided PDE discovery utilities** release for PDELie. It does **not** introduce a new numerical regime. It does **not** promote KdV to a stable PDE path. It does **not** make weak-form methods, operator methods, external dataset ingestion, or broad adapters part of the stable library. Instead, it asks: > Can PDELie expose a small, generic public-library layer that supports controlled symmetry-guided PDE discovery workflows in the existing canonical Heat/Burgers regime? Frozen release definition: `PDE data -> generator family -> translation-canonical inputs -> sparse PDE discovery -> recovery metrics` `v0.6` is therefore a discovery-utility release, not a broader data-ingestion release and not a broader numerics release. --- ## Stable Scope Stable scope for `v0.6`: - uniform rectilinear grids only - synthetic PDE data only - scalar, periodic Heat/Burgers regime only - polynomial Lie point generator families only - current `to_pysindy_trajectories(...)` bridge remains the stable discovery entry shape - discovery recovery metrics - one thin PySINDy discovery adapter - one translation-canonical discovery-input builder - simple robustness utilities - one compact `v0.6` release gate - no new stable canonical object Deferred from stable `v0.6`: - stable KdV path - external dataset-ingestion axis - weak-form methods - operator methods - broad adapters - paper-specific experiment orchestration --- ## Core User Story `v0.6` should support the generic public-library portion of this workflow: 1. start from canonical Heat/Burgers `FieldBatch` data 2. start from a known, imported/coerced, or discovered translation generator family 3. build translation-canonical discovery inputs 4. run one thin sparse discovery adapter 5. evaluate recovery against a known target equation with generic metrics Unsupported in stable `v0.6`: - arbitrary symbolic-term alias resolution - full differential-invariant generation - multiple discovery backends - broad dataset-loading workflows - manuscript-facing tables, figures, or claim logic --- ## Milestone 1 — Discovery recovery metrics **Status:** Complete Public runtime API: - `pdelie.discovery.evaluate_discovery_recovery(target_terms, discovered_terms, *, support_epsilon=1e-8, train_residual=None, heldout_residual=None) -> dict[str, object]` Frozen input policy: - `target_terms` and `discovered_terms` are mappings of canonical term string to scalar coefficient - exact string equality defines support identity - aliases, symbolic simplification, and term normalization are out of scope - callers are responsible for consistent naming - term keys must be non-empty strings - coefficients must be finite - non-finite coefficients or invalid term keys raise `SchemaValidationError` - `support_epsilon` must be finite and non-negative - support is defined by `abs(coef) > support_epsilon` Frozen edge-case policy: - empty target support + empty discovered support = `exact` - empty target support + non-empty discovered support = `failed` - non-empty target support + empty discovered support = `failed` - classification is support-based only: - `exact` - `partial` - `failed` Frozen outputs: - support precision / recall / F1 - support exact-match flag - coefficient L2, relative-L2, and Linf error on union support - sparsity - train residual norm if provided - held-out residual norm if provided - normalized held-out residual if provided - equation-string summary --- ## Milestone 2 — Thin PySINDy discovery adapter **Status:** Complete Public runtime API: - `pdelie.discovery.fit_pysindy_discovery(trajectories, time_values, feature_names, *, config=None) -> dict[str, object]` Frozen scope: - PySINDy only - continuous-time only - accepts only the current trajectory shape from `to_pysindy_trajectories(...)` - all trajectories must share identical shape as a frozen M2 simplification, not as a general PySINDy limitation - `feature_names` are the input trajectory columns and must be unique non-empty strings - default config is a private runtime deterministic PySINDy profile - `config` must be `None` in `v0.6` M2 - this is not a general discovery-backend framework - this is not yet a canonical PDE-level `u_t = ...` equation extractor - this returns a runtime backend report dict, not a JSON-compatible artifact schema Frozen extraction policy: - returns `library_feature_names` - returns a dense 2D `coefficients` matrix with rows aligned to `feature_names` and columns aligned to `library_feature_names` - `coefficients` remain runtime NumPy arrays in `v0.6` M2 - returns sparse backend-native `equation_terms` - returns backend-native debug `equation_strings` - default coefficient threshold is `1e-8` - raw multi-target backend matrices are out of scope - canonical PDE term extraction is out of scope - `equation_terms` and `equation_strings` must not be fed directly into `evaluate_discovery_recovery(...)` without a later canonicalization step Frozen failure policy: - missing dependency remains `ImportError` - invalid inputs remain typed validation errors - backend fitting failures return `status="failed"` with stable failure information Frozen returned fields: - `status` - `backend` - `feature_names` - `library_feature_names` - `coefficients` - `equation_terms` - `equation_strings` - `fit_config` - `fit_diagnostics` - `failure_reason` when failed --- ## Milestone 3 — Translation-canonical discovery inputs **Status:** Complete Public runtime API: - `pdelie.discovery.build_translation_canonical_discovery_inputs(field, *, generator_family=None, invariant_spec_template=None) -> dict[str, object]` Frozen scope: - translation/canonical path only - current invariant-application path only - scalar variable only - periodic `x` only - masked fields are rejected - supports known/oracle translation families - supports imported/coerced translation families - supports discovered translation families - rejects unsupported non-translation families and non-uniform translation-like families - `invariant_spec_template` is explicit template mode, not direct fixed-shift mode - template parameters may be `{}` or `{"axis": "x"}` - template parameters must not include `shift` - no full differential-invariant generation Frozen alignment behavior: - alignment is per sample - alignment uses the initial-time slice only - alignment uses `values[batch, 0, :, 0]` - peak index selection uses first-index tie-breaking - shift is `x[0] - x[peak_index]` - output shifts are deterministic and reported in batch order - alignment policy is a deterministic heuristic canonicalization rule, not a strong invariant-theoretic guarantee Frozen implementation policy: - split batch inputs into single-sample fields - apply the existing `InvariantApplier` one sample at a time - reassemble into one batched `FieldBatch` - append exactly one batch-level preprocess-log entry instead of concatenating per-sample logs Frozen returned fields: - `transformed_field` - `trajectories` - `time_values` - `feature_names` - `generator_metadata` - `construction_method` - `alignment_policy` - `alignment_shifts` - `provenance` --- ## Milestone 4 — Robustness utilities **Status:** Complete Public runtime APIs under `pdelie.data`: - `add_gaussian_noise(field, *, std_fraction, seed) -> FieldBatch` - `subsample_time(field, *, stride) -> FieldBatch` - `subsample_x(field, *, stride) -> FieldBatch` - `split_batch_train_heldout(field, *, train_size, seed) -> tuple[FieldBatch, FieldBatch]` Public runtime API under `pdelie.discovery`: - `summarize_recovery_grid(records) -> list[dict[str, object]]` Frozen utility policy: - `FieldBatch` in / `FieldBatch` out - deterministic noise, subsampling, and splitting - metadata is deep-copied - existing preprocess-log entries are deep-copied before appending new entries - coords are copied - `var_names` are copied - masks are preserved - `NaN` values remain `NaN` - noise is applied only to finite unmasked values - subsampling is stride-only - `subsample_time` may leave one time point - `subsample_x` must leave at least two x-points - train/heldout split is deterministic and preserves original order within each split - `train_size` is an integer count only - integer-like parameters accept Python `int` and NumPy integer types but reject `bool` - no dataframe object - no plot layer - no table-rendering system - `summarize_recovery_grid(...)` is a runtime convenience API, not a canonical artifact schema and not a manuscript table format Frozen noise policy: - `add_gaussian_noise(...)` computes its reference RMS from finite unmasked values only - masked entries remain unchanged - existing non-finite values remain unchanged - no eligible finite unmasked values raises `SchemaValidationError` Frozen recovery-grid policy: - `summarize_recovery_grid(...)` accepts nested records of the form `{"conditions": {...}, "recovery": {...}}` - condition values must be JSON-scalar values only - condition floats must be finite - grouped output rows are sorted deterministically using typed condition sort keys - recovery `classification` must be one of: - `exact` - `partial` - `failed` - required recovery metrics must be finite numeric scalars - optional residual means are included only when every record in a group contains that field Frozen preprocess-log policy: - each utility appends exactly one new plain-dict entry - each entry includes at least: - `operation` - `parameters` - no broader preprocess-log schema is introduced in `v0.6` --- ## Milestone 5 — V0.6 release gate **Status:** Complete The `v0.6` release gate remains compact and representative only. Frozen representative slices: - one discovery-metrics slice - one deterministic Heat smoke fit - one deterministic Burgers smoke fit - one raw/vanilla input slice - one oracle/known translation family input slice - one imported/coerced translation family input slice - one robustness slice Frozen release-gate policy: - no discovery superiority claim - no exact PySINDy model-string assertions - no KdV stable-surface addition - no broad performance claim beyond finite, reproducible, structurally valid outputs - the dedicated `v0_6-release-gate` CI job is the authoritative release gate for this milestone --- ## Release Gate `v0.6` is releasable only if: - discovery recovery metrics are deterministic and typed - the thin PySINDy adapter runs reproducibly in the current scalar periodic regime - translation-canonical inputs are deterministic for representative known/imported translation families - robustness utilities preserve `FieldBatch` validity, coordinate-copying behavior, mask handling, and preprocess-log behavior - Heat/Burgers stable paths still pass unchanged - no new canonical object is introduced - no stable KdV surface is added - no weak-form, operator, broad-adapter, or manuscript-facing feature is required --- ## KdV Policy KdV is explicitly deferred from stable `v0.6` scope. `v0.5` established only a tests-first feasibility result. That does **not** promote KdV into the stable library. Stable `v0.6` therefore does **not** include: - stable third-derivative backend support - stable synthetic KdV data generation - stable KdV residual evaluation - stable KdV discovery coverage - any stable KdV public API surface KdV may be reconsidered in a later release only under an explicit scope reset. --- ## Paper / Private Repo Boundary The public `pdelie` repo may contain only generic discovery utilities in `v0.6`. Allowed in public `v0.6` scope: - generic recovery metrics - one thin PySINDy adapter - one translation-canonical discovery-input builder - generic robustness utilities - compact release-gate coverage Not part of public `v0.6` scope: - paper-specific experiment matrices - manuscript thresholds - manuscript figures or tables - representative-aware losses - private orchestration comparing many methods - venue-specific presentation logic --- ## Explicit Non-goals - new canonical discovery objects - root exports from `pdelie.__init__` - general discovery-backend framework - general invariant-theory engine - stable KdV promotion - external dataset-ingestion axis - weak-form methods - operator methods - broad adapters - dataframe / plotting / manuscript layer - paper-specific experiment matrix, thresholds, figures, or manuscript logic