Note

This page renders committed notebook outputs. The Read the Docs build does not execute notebook code.

Downstream task template: symmetry-aware inputs without paper policy#

Current surface: V0.29.

Purpose#

Reusable template for external users: start from generated or imported data, optionally materialize a translation orbit, validate a generator, export arrays, and plug in a downstream method.

What you will learn#

How to adapt generated or external arrays into canonical FieldBatch objects.
Where optional orbit materialization belongs in a workflow.
How to validate the generator candidate used by a downstream task.
What PDELie deliberately leaves to the user: split policy, leakage control, thresholds, and claims.

Required extras#

.[downstream] or .[test] for optional PySINDy cells; core data/import/validation examples still run without Jupyter as a runtime dependency.

Expected runtime#

About 1 minute when PySINDy is installed; faster when the optional fit is skipped.

Out of scope#

No paper-specific logic, no operator-learning code, no broad adapters, no PDEBench/The Well loaders, no train/test automation.

These notebooks are tutorials, not API contracts. Example outputs are runtime summaries, not canonical paper artifacts.

[1]:

import sys
from pathlib import Path

ROOT = Path.cwd()
if not (ROOT / "pyproject.toml").exists():
    ROOT = ROOT.parent
if str(ROOT) not in sys.path:
    sys.path.insert(0, str(ROOT))
import importlib.util

import numpy as np

from notebooks._tutorial_utils import confidence_card, pretty_json, print_cards
from pdelie.data import from_numpy, generate_heat_1d_field_batch, split_batch_train_heldout
from pdelie.discovery import (
    evaluate_discovery_recovery,
    fit_pysindy_discovery,
    summarize_discovery_bridge_output,
    summarize_discovery_result,
    to_pysindy_trajectories,
)
from pdelie.invariants import build_uniform_translation_orbit_batch
from pdelie.reporting import (
    summarize_downstream_discovery_workflow,
    summarize_field_batch_readiness,
    summarize_generator_confidence,
    summarize_generator_fit_diagnostics,
    summarize_split_leakage_provenance,
    summarize_verification_report,
)
from pdelie.residuals import HeatResidualEvaluator
from pdelie.symmetry import fit_translation_generator, validate_symmetry_candidate
from pdelie.verification import verify_translation_generator

CONFIG = {
    "fit_epsilon": 1e-4,
    "orbit_shifts": [0.0, np.pi / 8.0, -np.pi / 8.0],
    "use_orbit_batch": True,
}
CONFIG

[1]:

{'fit_epsilon': 0.0001,
 'orbit_shifts': [0.0, 0.39269908169872414, -0.39269908169872414],
 'use_orbit_batch': True}

1. Create a train/heldout split before optional orbit materialization#

The orbit helper records source/shift provenance, but it does not choose split policy. Split first when leakage matters.

[2]:

field = generate_heat_1d_field_batch(batch_size=6, num_times=17, num_points=32, seed=680)
train, heldout = split_batch_train_heldout(field, train_size=3, seed=681)
if CONFIG["use_orbit_batch"]:
    orbit = build_uniform_translation_orbit_batch(
        train,
        shifts=CONFIG["orbit_shifts"],
        source_field_id="heat_train_seed_680_split_681",
    )
    downstream_field = orbit.field
    orbit_report = orbit.report
else:
    downstream_field = train
    orbit_report = None
print(pretty_json({
    "train_shape": list(train.values.shape),
    "downstream_shape": list(downstream_field.values.shape),
    "orbit_report_type": None if orbit_report is None else orbit_report["summary_type"],
    "leakage_policy": "caller-owned; split before materialization in this template",
}))

{
  "downstream_shape": [
    9,
    17,
    32,
    1
  ],
  "leakage_policy": "caller-owned; split before materialization in this template",
  "orbit_report_type": "uniform_translation_orbit_batch",
  "train_shape": [
    3,
    17,
    32,
    1
  ]
}

2. Fit and validate the generator used by the workflow#

[3]:

evaluator = HeatResidualEvaluator()
train_readiness = summarize_field_batch_readiness(
    train,
    residual_evaluator=evaluator,
    expected_equation="heat_1d",
)
generator = fit_translation_generator(downstream_field, evaluator, epsilon=CONFIG["fit_epsilon"])
verification = verify_translation_generator(heldout, generator, evaluator)
validation = validate_symmetry_candidate(
    heldout,
    generator,
    residual_evaluator=evaluator,
    source_candidate_id="downstream_template_generator",
)
fit_summary = summarize_generator_fit_diagnostics(generator)
verification_summary = summarize_verification_report(verification)
generator_confidence = summarize_generator_confidence(
    generator=generator,
    fit_diagnostics=fit_summary,
    verification=verification,
    candidate_validation=validation,
    thresholds={"verification_first_error": 1e-5},
    extra_metrics={"workflow_role": "downstream_preprocessing_gate"},
)
card = confidence_card(
    label="downstream template generator",
    fit=fit_summary,
    verification=verification_summary,
    validation=validation,
)
print_cards([card])
print(pretty_json({
    "field_readiness": train_readiness["readiness_label"],
    "generator_confidence": generator_confidence["confidence_label"],
}, max_chars=1800))

[
  {
    "candidate_kind": "generator_family",
    "condition_number": 108587.97472105523,
    "evidence_label": "direct_svd_in_tolerance",
    "first_epsilon": 0.0001,
    "first_error": 1.3904654612463122e-08,
    "fit_mode": "svd",
    "label": "downstream template generator",
    "max_error": 1.385275387507436e-05,
    "reference_fallback_used": false,
    "selected_span_distance": 2.501245182569677e-05,
    "singular_value_count": 4,
    "svd_span_distance": 2.501245182569677e-05,
    "validation_conclusion": "validated",
    "verification_classification": "exact"
  }
]
{
  "field_readiness": "not_ready",
  "generator_confidence": "strong"
}

3. Build backend-native trajectories#

to_pysindy_trajectories(...) is a narrow bridge format. Its output is not a PDELie canonical object.

[4]:

trajectories, time_values, feature_names = to_pysindy_trajectories(downstream_field)
bridge_summary = summarize_discovery_bridge_output(
    trajectories,
    time_values,
    feature_names,
    source_field_id="downstream_field_after_optional_orbit",
    provenance={"orbit_materialized": CONFIG["use_orbit_batch"]},
)
print(pretty_json({
    "num_trajectories": len(trajectories),
    "trajectory_shape": list(trajectories[0].shape),
    "num_feature_names": len(feature_names),
    "bridge_summary_type": bridge_summary["summary_type"],
    "bridge_finite": bridge_summary["finite"],
    "bridge_strictly_increasing_time": bridge_summary["strictly_increasing_time"],
}, max_chars=2500))

{
  "bridge_finite": true,
  "bridge_strictly_increasing_time": true,
  "bridge_summary_type": "discovery_bridge_output",
  "num_feature_names": 32,
  "num_trajectories": 9,
  "trajectory_shape": [
    17,
    32
  ]
}

4. Optional PySINDy smoke fit#

If PySINDy is installed, run the backend adapter. Either way, keep recovery metrics separate from generator-confidence metrics.

[5]:

if importlib.util.find_spec("pysindy") is None:
    discovery = {"status": "failed", "backend": "pysindy", "reason": "pysindy is not installed"}
else:
    discovery = fit_pysindy_discovery(trajectories, time_values, feature_names)

# Tiny paper-agnostic recovery-metric example over caller-supplied canonical terms.
recovery = evaluate_discovery_recovery(
    target_terms={"u_xx": 1.0},
    discovered_terms={"u_xx": 0.98, "u": 0.01},
    support_epsilon=0.05,
)

backend_neutral_result = {
    "status": "success",
    "backend": "tutorial_manual_sparse_result",
    "feature_names": ["u"],
    "equation_terms": {"u": {"u_xx": 0.98, "u": 0.01}},
    "equation_strings": {"u": "u_t = 0.98 u_xx + 0.01 u"},
    "coefficients": np.asarray([[0.98, 0.01]], dtype=float),
    "diagnostics": {"source": "paper_agnostic_tutorial_placeholder"},
}
discovery_summary = summarize_discovery_result(
    backend_neutral_result,
    target_terms={"u": {"u_xx": 1.0}},
    support_epsilon=0.05,
)

partitions = ["train"] * downstream_field.values.shape[0]
split_provenance = summarize_split_leakage_provenance(
    partitions=partitions,
    orbit_batch=orbit_report,
    source_report_id="downstream_template_split_before_orbit",
    extra_metrics={"policy_owner": "user"},
)
workflow = summarize_downstream_discovery_workflow(
    field_readiness=train_readiness,
    generator_confidence=generator_confidence,
    orbit_batch=orbit_report,
    discovery_inputs=bridge_summary,
    discovery_result=discovery_summary,
    split_provenance=split_provenance,
    extra_metrics={"paper_policy": "not_managed_by_pdelie"},
)
print(pretty_json({
    "optional_pysindy_status": discovery["status"],
    "manual_recovery_classification": recovery["classification"],
    "discovery_summary_status": discovery_summary["status"],
    "split_risk_label": split_provenance["risk_label"],
    "workflow_summary_type": workflow["summary_type"],
}, max_chars=3500))

{
  "discovery_summary_status": "success",
  "manual_recovery_classification": "exact",
  "optional_pysindy_status": "success",
  "split_risk_label": "no_detected_overlap",
  "workflow_summary_type": "downstream_discovery_workflow"
}

5. Adapting this to your own PDE data#

Checklist for external data:

ensure dims can be interpreted as batch/time/x/var
use from_numpy(...), from_xarray(...), or V0.29 from_xarray_dataset(...) for one explicit scalar Dataset variable
ensure x is uniform periodic and endpoint-excluded before using spectral or invariant tools
supply metadata tags that match the residual evaluator you plan to use
validate finite unmasked scalar values
keep file loaders, nonuniform grids, multidimensional data, PDEBench/The Well, and operator-learning data outside the current stable notebook path

[6]:

external_like_values = field.values[:1].copy()
external_metadata = dict(field.metadata)
external_metadata["parameter_tags"] = dict(field.metadata["parameter_tags"])
external_metadata["source"] = "tutorial_external_like_array"
external_like = from_numpy(
    external_like_values,
    dims=("batch", "time", "x", "var"),
    coords={"time": field.coords["time"], "x": field.coords["x"]},
    var_name=field.var_names[0],
    metadata=external_metadata,
)
readiness = summarize_field_batch_readiness(
    external_like,
    residual_evaluator=HeatResidualEvaluator(),
    expected_equation="heat_1d",
)
print(pretty_json({
    "imported_shape": list(external_like.values.shape),
    "readiness_label": readiness["readiness_label"],
    "readiness_components": {
        name: status["status"]
        for name, status in readiness["component_statuses"].items()
    },
    "parameter_tags": external_like.metadata["parameter_tags"],
    "preprocess_tail": external_like.preprocess_log[-1],
}, max_chars=3000))

{
  "imported_shape": [
    1,
    17,
    32,
    1
  ],
  "parameter_tags": {
    "nu": 0.1
  },
  "preprocess_tail": {
    "operation": "from_numpy",
    "parameters": {
      "canonical_shape": [
        1,
        17,
        32,
        1
      ],
      "imported_shape": [
        1,
        17,
        32,
        1
      ],
      "injected_batch_axis": false,
      "injected_var_axis": false,
      "mask_provided": false,
      "source_layout": [
        "batch",
        "time",
        "x",
        "var"
      ]
    }
  },
  "readiness_components": {
    "expected_equation": "failed",
    "field": "passed",
    "mask": "passed",
    "metadata": "passed",
    "residual_preflight": "passed",
    "time_coordinate": "passed",
    "values": "passed",
    "x_coordinate": "passed"
  },
  "readiness_label": "not_ready"
}

Recap#

The reusable pattern is: validate or construct a canonical FieldBatch, decide split policy outside PDELie, optionally materialize a translation orbit with provenance, fit and validate a generator, then hand arrays to downstream code.

Common pitfalls#

Materializing orbits before deciding train/heldout policy.
Letting downstream thresholds become hidden PDELie assumptions.
Treating backend-native arrays or labels as canonical artifacts.
Applying spectral/invariant tools to nonuniform or multidimensional data without a supported adapter.

Extension ideas#

Replace the generated Heat field with your own from_numpy(...), from_xarray(...), or explicit scalar from_xarray_dataset(...) data.
Use Fisher-KPP when your workflow needs a reaction-diffusion strong-path example.
Compare downstream recovery with and without orbit materialization, but keep the success criteria in your own experiment layer.

What to read/run next#

Return to 00_pde_timeseries_to_generators.ipynb for the core evidence flow, or use this notebook as a template for your own project.

Downstream task template: symmetry-aware inputs without paper policy#

Purpose#

What you will learn#

Required extras#

Expected runtime#

Out of scope#

1. Create a train/heldout split before optional orbit materialization#

2. Fit and validate the generator used by the workflow#

3. Build backend-native trajectories#

4. Optional PySINDy smoke fit#

5. Adapting this to your own PDE data#

Recap#

Common pitfalls#

Extension ideas#

What to read/run next#

This Page