Note
This page renders committed notebook outputs. The Read the Docs build does not execute notebook code.
Downstream task template: symmetry-aware inputs without paper policy#
Current surface: V0.29.
Purpose#
Reusable template for external users: start from generated or imported data, optionally materialize a translation orbit, validate a generator, export arrays, and plug in a downstream method.
What you will learn#
How to adapt generated or external arrays into canonical
FieldBatchobjects.Where optional orbit materialization belongs in a workflow.
How to validate the generator candidate used by a downstream task.
What PDELie deliberately leaves to the user: split policy, leakage control, thresholds, and claims.
Required extras#
.[downstream] or .[test] for optional PySINDy cells; core data/import/validation examples still run without Jupyter as a runtime dependency.
Expected runtime#
About 1 minute when PySINDy is installed; faster when the optional fit is skipped.
Out of scope#
No paper-specific logic, no operator-learning code, no broad adapters, no PDEBench/The Well loaders, no train/test automation.
These notebooks are tutorials, not API contracts. Example outputs are runtime summaries, not canonical paper artifacts.
[1]:
from pathlib import Path
import sys
ROOT = Path.cwd()
if not (ROOT / "pyproject.toml").exists():
ROOT = ROOT.parent
if str(ROOT) not in sys.path:
sys.path.insert(0, str(ROOT))
import importlib.util
import numpy as np
from notebooks._tutorial_utils import confidence_card, print_cards, pretty_json
from pdelie.data import from_numpy, generate_heat_1d_field_batch, split_batch_train_heldout
from pdelie.discovery import (
evaluate_discovery_recovery,
fit_pysindy_discovery,
summarize_discovery_bridge_output,
summarize_discovery_result,
to_pysindy_trajectories,
)
from pdelie.invariants import build_uniform_translation_orbit_batch
from pdelie.reporting import (
summarize_downstream_discovery_workflow,
summarize_field_batch_readiness,
summarize_generator_confidence,
summarize_generator_fit_diagnostics,
summarize_split_leakage_provenance,
summarize_verification_report,
)
from pdelie.residuals import HeatResidualEvaluator
from pdelie.symmetry import fit_translation_generator, validate_symmetry_candidate
from pdelie.verification import verify_translation_generator
CONFIG = {
"fit_epsilon": 1e-4,
"orbit_shifts": [0.0, np.pi / 8.0, -np.pi / 8.0],
"use_orbit_batch": True,
}
CONFIG
[1]:
{'fit_epsilon': 0.0001,
'orbit_shifts': [0.0, 0.39269908169872414, -0.39269908169872414],
'use_orbit_batch': True}
1. Create a train/heldout split before optional orbit materialization#
The orbit helper records source/shift provenance, but it does not choose split policy. Split first when leakage matters.
[2]:
field = generate_heat_1d_field_batch(batch_size=6, num_times=17, num_points=32, seed=680)
train, heldout = split_batch_train_heldout(field, train_size=3, seed=681)
if CONFIG["use_orbit_batch"]:
orbit = build_uniform_translation_orbit_batch(
train,
shifts=CONFIG["orbit_shifts"],
source_field_id="heat_train_seed_680_split_681",
)
downstream_field = orbit.field
orbit_report = orbit.report
else:
downstream_field = train
orbit_report = None
print(pretty_json({
"train_shape": list(train.values.shape),
"downstream_shape": list(downstream_field.values.shape),
"orbit_report_type": None if orbit_report is None else orbit_report["summary_type"],
"leakage_policy": "caller-owned; split before materialization in this template",
}))
{
"downstream_shape": [
9,
17,
32,
1
],
"leakage_policy": "caller-owned; split before materialization in this template",
"orbit_report_type": "uniform_translation_orbit_batch",
"train_shape": [
3,
17,
32,
1
]
}
2. Fit and validate the generator used by the workflow#
[3]:
evaluator = HeatResidualEvaluator()
train_readiness = summarize_field_batch_readiness(
train,
residual_evaluator=evaluator,
expected_equation="heat_1d",
)
generator = fit_translation_generator(downstream_field, evaluator, epsilon=CONFIG["fit_epsilon"])
verification = verify_translation_generator(heldout, generator, evaluator)
validation = validate_symmetry_candidate(
heldout,
generator,
residual_evaluator=evaluator,
source_candidate_id="downstream_template_generator",
)
fit_summary = summarize_generator_fit_diagnostics(generator)
verification_summary = summarize_verification_report(verification)
generator_confidence = summarize_generator_confidence(
generator=generator,
fit_diagnostics=fit_summary,
verification=verification,
candidate_validation=validation,
thresholds={"verification_first_error": 1e-5},
extra_metrics={"workflow_role": "downstream_preprocessing_gate"},
)
card = confidence_card(
label="downstream template generator",
fit=fit_summary,
verification=verification_summary,
validation=validation,
)
print_cards([card])
print(pretty_json({
"field_readiness": train_readiness["readiness_label"],
"generator_confidence": generator_confidence["confidence_label"],
}, max_chars=1800))
[
{
"candidate_kind": "generator_family",
"condition_number": 108587.97472105523,
"evidence_label": "direct_svd_in_tolerance",
"first_epsilon": 0.0001,
"first_error": 1.3904654612463122e-08,
"fit_mode": "svd",
"label": "downstream template generator",
"max_error": 1.385275387507436e-05,
"reference_fallback_used": false,
"selected_span_distance": 2.501245182569677e-05,
"singular_value_count": 4,
"svd_span_distance": 2.501245182569677e-05,
"validation_conclusion": "validated",
"verification_classification": "exact"
}
]
{
"field_readiness": "not_ready",
"generator_confidence": "strong"
}
3. Build backend-native trajectories#
to_pysindy_trajectories(...) is a narrow bridge format. Its output is not a PDELie canonical object.
[4]:
trajectories, time_values, feature_names = to_pysindy_trajectories(downstream_field)
bridge_summary = summarize_discovery_bridge_output(
trajectories,
time_values,
feature_names,
source_field_id="downstream_field_after_optional_orbit",
provenance={"orbit_materialized": CONFIG["use_orbit_batch"]},
)
print(pretty_json({
"num_trajectories": len(trajectories),
"trajectory_shape": list(trajectories[0].shape),
"num_feature_names": len(feature_names),
"bridge_summary_type": bridge_summary["summary_type"],
"bridge_finite": bridge_summary["finite"],
"bridge_strictly_increasing_time": bridge_summary["strictly_increasing_time"],
}, max_chars=2500))
{
"bridge_finite": true,
"bridge_strictly_increasing_time": true,
"bridge_summary_type": "discovery_bridge_output",
"num_feature_names": 32,
"num_trajectories": 9,
"trajectory_shape": [
17,
32
]
}
4. Optional PySINDy smoke fit#
If PySINDy is installed, run the backend adapter. Either way, keep recovery metrics separate from generator-confidence metrics.
[5]:
if importlib.util.find_spec("pysindy") is None:
discovery = {"status": "failed", "backend": "pysindy", "reason": "pysindy is not installed"}
else:
discovery = fit_pysindy_discovery(trajectories, time_values, feature_names)
# Tiny paper-agnostic recovery-metric example over caller-supplied canonical terms.
recovery = evaluate_discovery_recovery(
target_terms={"u_xx": 1.0},
discovered_terms={"u_xx": 0.98, "u": 0.01},
support_epsilon=0.05,
)
backend_neutral_result = {
"status": "success",
"backend": "tutorial_manual_sparse_result",
"feature_names": ["u"],
"equation_terms": {"u": {"u_xx": 0.98, "u": 0.01}},
"equation_strings": {"u": "u_t = 0.98 u_xx + 0.01 u"},
"coefficients": np.asarray([[0.98, 0.01]], dtype=float),
"diagnostics": {"source": "paper_agnostic_tutorial_placeholder"},
}
discovery_summary = summarize_discovery_result(
backend_neutral_result,
target_terms={"u": {"u_xx": 1.0}},
support_epsilon=0.05,
)
partitions = ["train"] * downstream_field.values.shape[0]
split_provenance = summarize_split_leakage_provenance(
partitions=partitions,
orbit_batch=orbit_report,
source_report_id="downstream_template_split_before_orbit",
extra_metrics={"policy_owner": "user"},
)
workflow = summarize_downstream_discovery_workflow(
field_readiness=train_readiness,
generator_confidence=generator_confidence,
orbit_batch=orbit_report,
discovery_inputs=bridge_summary,
discovery_result=discovery_summary,
split_provenance=split_provenance,
extra_metrics={"paper_policy": "not_managed_by_pdelie"},
)
print(pretty_json({
"optional_pysindy_status": discovery["status"],
"manual_recovery_classification": recovery["classification"],
"discovery_summary_status": discovery_summary["status"],
"split_risk_label": split_provenance["risk_label"],
"workflow_summary_type": workflow["summary_type"],
}, max_chars=3500))
{
"discovery_summary_status": "success",
"manual_recovery_classification": "exact",
"optional_pysindy_status": "success",
"split_risk_label": "no_detected_overlap",
"workflow_summary_type": "downstream_discovery_workflow"
}
5. Adapting this to your own PDE data#
Checklist for external data:
ensure dims can be interpreted as
batch/time/x/varuse
from_numpy(...),from_xarray(...), or V0.29from_xarray_dataset(...)for one explicit scalar Dataset variableensure
xis uniform periodic and endpoint-excluded before using spectral or invariant toolssupply metadata tags that match the residual evaluator you plan to use
validate finite unmasked scalar values
keep file loaders, nonuniform grids, multidimensional data, PDEBench/The Well, and operator-learning data outside the current stable notebook path
[6]:
external_like_values = field.values[:1].copy()
external_metadata = dict(field.metadata)
external_metadata["parameter_tags"] = dict(field.metadata["parameter_tags"])
external_metadata["source"] = "tutorial_external_like_array"
external_like = from_numpy(
external_like_values,
dims=("batch", "time", "x", "var"),
coords={"time": field.coords["time"], "x": field.coords["x"]},
var_name=field.var_names[0],
metadata=external_metadata,
)
readiness = summarize_field_batch_readiness(
external_like,
residual_evaluator=HeatResidualEvaluator(),
expected_equation="heat_1d",
)
print(pretty_json({
"imported_shape": list(external_like.values.shape),
"readiness_label": readiness["readiness_label"],
"readiness_components": {
name: status["status"]
for name, status in readiness["component_statuses"].items()
},
"parameter_tags": external_like.metadata["parameter_tags"],
"preprocess_tail": external_like.preprocess_log[-1],
}, max_chars=3000))
{
"imported_shape": [
1,
17,
32,
1
],
"parameter_tags": {
"nu": 0.1
},
"preprocess_tail": {
"operation": "from_numpy",
"parameters": {
"canonical_shape": [
1,
17,
32,
1
],
"imported_shape": [
1,
17,
32,
1
],
"injected_batch_axis": false,
"injected_var_axis": false,
"mask_provided": false,
"source_layout": [
"batch",
"time",
"x",
"var"
]
}
},
"readiness_components": {
"expected_equation": "failed",
"field": "passed",
"mask": "passed",
"metadata": "passed",
"residual_preflight": "passed",
"time_coordinate": "passed",
"values": "passed",
"x_coordinate": "passed"
},
"readiness_label": "not_ready"
}
Recap#
The reusable pattern is: validate or construct a canonical FieldBatch, decide split policy outside PDELie, optionally materialize a translation orbit with provenance, fit and validate a generator, then hand arrays to downstream code.
Common pitfalls#
Materializing orbits before deciding train/heldout policy.
Letting downstream thresholds become hidden PDELie assumptions.
Treating backend-native arrays or labels as canonical artifacts.
Applying spectral/invariant tools to nonuniform or multidimensional data without a supported adapter.
Extension ideas#
Replace the generated Heat field with your own
from_numpy(...),from_xarray(...), or explicit scalarfrom_xarray_dataset(...)data.Use Fisher-KPP when your workflow needs a reaction-diffusion strong-path example.
Compare downstream recovery with and without orbit materialization, but keep the success criteria in your own experiment layer.
What to read/run next#
Return to 00_pde_timeseries_to_generators.ipynb for the core evidence flow, or use this notebook as a template for your own project.