This document is non-normative.
All contracts are defined in ../specs/SPEC.md.
INTEROPERABILITY & BENCHMARKING CONTEXT#
PURPOSE#
This document defines:
external datasets to support
competing / complementary methods
data formats and canonical representation
preprocessing invariants
benchmarking and verification protocols
Goal:
Make
pdeliea hub that connects PDE data → symmetry generators → invariants → downstream methods.
NOT a silo.
CORE DESIGN PRINCIPLE#
Canonical internal representation#
ALL external data must be converted to a unified format:
FieldBatch(
values, # array-like
dims, # ("batch", "time", spatial..., "var")
coords, # coordinate arrays (authoritative)
var_names, # ["u", "v", ...]
metadata, # structured metadata (see below)
preprocess_log # transformations applied
)
FieldBatch contract (STRICT)#
dims MUST be explicit and authoritative
spatial axes MUST be ordered and named (x, y, z)
time is optional (stationary PDEs allowed)
grids MUST be structured rectilinear in v0.x
coordinates MUST specify:
node-centered vs cell-centred
domain bounds
metadata MUST include:
PDE family (if known)
boundary conditions (periodic, Dirichlet Neumann)
grid regularity (uniform/nonuniform)
parameter tags (per-trajectory coefficients)
multivariate fields:
encoded via var axis (channel-last)
missing data:
MUST be represented via masks or NaNs
Canonical pipeline objects#
ALL stages must produce structured outputs:
FieldBatch
DerivativeBatch
ResidualEvaluator
GeneratorFamily
InvariantMap
InvariantLibrary
DiscoveryResult
VerificationReport
These are stable contracts, not implementation details.
Residual abstraction (CORE)#
All symmetry fitting MUST be defined relative to a residual:
class ResidualEvaluator:
def evaluate(field: FieldBatch) -> ResidualBatch
Supported residual types:
analytic PDE residual
weak-form residual
learned surrogate residual
operator pushforward residual
SUPPORTED DATA FORMATS#
Tier 1 (MUST support)#
HDF5
NumPy (.npz)
in-memory NumPy arrays
xarray Dataset / DataArray
Tier 2 (SHOULD support)#
netCDF
Zarr
Mathematica HDF5 exports
Tier 3 (DO NOT prioritize)#
custom solver-specific formats
proprietary binary formats
DATASET ADAPTERS#
All adapters MUST convert external data into the canonical FieldBatch format.
Required adapters#
from_hdf5_pdebench(...)
from_hdf5_thewell(...)
from_numpy(...)
from_xarray(...)
from_wolfram_hdf5(...)
from_sympy_expression(...)
Export adapters#
to_xarray(...)
to_netcdf(...)
to_zarr(...)
to_pysindy_library(...)
to_neuraloperator_dataset(...)
to_json_report(...)
KEY DATA SOURCES#
1. PDEBench#
structured HDF5 PDE rollouts
canonical benchmark
2. The Well#
large-scale multi-physics dataset
stress testing only in v0.x
3. Wolfram / Mathematica#
GT symmetry validation
exact PDE control
4. SymPy#
symbolic validation only
5. RealPDEBench (future)#
real + simulated PDE data
paired experiments
Use later for:
robustness validation
SCOPE (x0.x)#
Stable:
structured-grid PDE data
Lie point symmetries only
polynomial generator parameterisations
small PDE set (heat, Burgers, wave)
Experimental:
neural generators
weak-form advanced variants
operator symmetry
IDENTIFIABILITY CONVENTIONS#
Generators are not unique.
Therefore:
generators MUST be normalised (e.g. unit norm)
comparison MUST be via span, not coefficients
closure MUST be evaluated via Lie bracket residual
approximate symmetries MUST be labeled explicitly
PREPROCESSING (CRITICAL INVARIANT)#
Preprocessing is a transformation and MUST be tracked.
Allowed before configured symmetry diagnostics#
dtype conversion
coordinate harmonization
mild denoising
Restricted#
normalisation
amplitude scaling
aggresive smoothing
REQUIRED: preprocessing log#
{
"transform_type”: "...",
“parameters”: {...},
"invertible": true/false
}
Preprocessing Modes#
Mode 1: physical#
minimal transforms
Mode 2: analysis#
smoothing/interpolation
Mode 3: ml_standardized#
normalization/batching
DERIVATIVE PROVENANCE#
Each DerivativeBatch MUST include:
backend (spectral / finite diff / weak)
smoothing parameters
boundary assumptions
stencil / spectral config
VERIFICATION PROTOCOL (STRICT)#
Every symmetry claim MUST report:
norm used (L2 / relative / normalized)
ε-range for finite transforms
held-out initial conditions
held-out parameter sets
error vs ε curve
residual error vs baseline
Verification must distinguish:
exact symmetry
approximate symmetry
failure
FAILURE MODES#
dataset symmetry ≠ PDE symmetry
derivative noise
overexpressive generators
conditioning vs symmetry confusion
LIBRARY POSITIONING#
pdelie is:
A bridge from PDE data → symmetry → invariants → downstream methods.
ROADMAP#
v0.1#
FieldBatch contract
polynomial symmetry detection
spectral derivatives
PDEBench integration
v0.2#
invariant coordinate pipeline
weak-form derivatives
v0.3#
NeuralOperator integration
operator symmetry (experimental)
FINAL INSTRUCTION FOR AGENT#
When extending the code:
Respect canonical contracts
Track all transformations
Validate all results numerically
Use simplest correct implementation
Distinguish stable vs experimental code