Note

This page renders committed notebook outputs. The Read the Docs build does not execute notebook code.

Downstream task template: symmetry-aware inputs without paper policy#

Current surface: V0.29.

Purpose#

Reusable template for external users: start from generated or imported data, optionally materialize a translation orbit, validate a generator, export arrays, and plug in a downstream method.

What you will learn#

  • How to adapt generated or external arrays into canonical FieldBatch objects.

  • Where optional orbit materialization belongs in a workflow.

  • How to validate the generator candidate used by a downstream task.

  • What PDELie deliberately leaves to the user: split policy, leakage control, thresholds, and claims.

Required extras#

.[downstream] or .[test] for optional PySINDy cells; core data/import/validation examples still run without Jupyter as a runtime dependency.

Expected runtime#

About 1 minute when PySINDy is installed; faster when the optional fit is skipped.

Out of scope#

No paper-specific logic, no operator-learning code, no broad adapters, no PDEBench/The Well loaders, no train/test automation.

These notebooks are tutorials, not API contracts. Example outputs are runtime summaries, not canonical paper artifacts.

[1]:
from pathlib import Path
import sys

ROOT = Path.cwd()
if not (ROOT / "pyproject.toml").exists():
    ROOT = ROOT.parent
if str(ROOT) not in sys.path:
    sys.path.insert(0, str(ROOT))
import importlib.util
import numpy as np

from notebooks._tutorial_utils import confidence_card, print_cards, pretty_json
from pdelie.data import from_numpy, generate_heat_1d_field_batch, split_batch_train_heldout
from pdelie.discovery import (
    evaluate_discovery_recovery,
    fit_pysindy_discovery,
    summarize_discovery_bridge_output,
    summarize_discovery_result,
    to_pysindy_trajectories,
)
from pdelie.invariants import build_uniform_translation_orbit_batch
from pdelie.reporting import (
    summarize_downstream_discovery_workflow,
    summarize_field_batch_readiness,
    summarize_generator_confidence,
    summarize_generator_fit_diagnostics,
    summarize_split_leakage_provenance,
    summarize_verification_report,
)
from pdelie.residuals import HeatResidualEvaluator
from pdelie.symmetry import fit_translation_generator, validate_symmetry_candidate
from pdelie.verification import verify_translation_generator

CONFIG = {
    "fit_epsilon": 1e-4,
    "orbit_shifts": [0.0, np.pi / 8.0, -np.pi / 8.0],
    "use_orbit_batch": True,
}
CONFIG

[1]:
{'fit_epsilon': 0.0001,
 'orbit_shifts': [0.0, 0.39269908169872414, -0.39269908169872414],
 'use_orbit_batch': True}

1. Create a train/heldout split before optional orbit materialization#

The orbit helper records source/shift provenance, but it does not choose split policy. Split first when leakage matters.

[2]:
field = generate_heat_1d_field_batch(batch_size=6, num_times=17, num_points=32, seed=680)
train, heldout = split_batch_train_heldout(field, train_size=3, seed=681)
if CONFIG["use_orbit_batch"]:
    orbit = build_uniform_translation_orbit_batch(
        train,
        shifts=CONFIG["orbit_shifts"],
        source_field_id="heat_train_seed_680_split_681",
    )
    downstream_field = orbit.field
    orbit_report = orbit.report
else:
    downstream_field = train
    orbit_report = None
print(pretty_json({
    "train_shape": list(train.values.shape),
    "downstream_shape": list(downstream_field.values.shape),
    "orbit_report_type": None if orbit_report is None else orbit_report["summary_type"],
    "leakage_policy": "caller-owned; split before materialization in this template",
}))

{
  "downstream_shape": [
    9,
    17,
    32,
    1
  ],
  "leakage_policy": "caller-owned; split before materialization in this template",
  "orbit_report_type": "uniform_translation_orbit_batch",
  "train_shape": [
    3,
    17,
    32,
    1
  ]
}

2. Fit and validate the generator used by the workflow#

[3]:
evaluator = HeatResidualEvaluator()
train_readiness = summarize_field_batch_readiness(
    train,
    residual_evaluator=evaluator,
    expected_equation="heat_1d",
)
generator = fit_translation_generator(downstream_field, evaluator, epsilon=CONFIG["fit_epsilon"])
verification = verify_translation_generator(heldout, generator, evaluator)
validation = validate_symmetry_candidate(
    heldout,
    generator,
    residual_evaluator=evaluator,
    source_candidate_id="downstream_template_generator",
)
fit_summary = summarize_generator_fit_diagnostics(generator)
verification_summary = summarize_verification_report(verification)
generator_confidence = summarize_generator_confidence(
    generator=generator,
    fit_diagnostics=fit_summary,
    verification=verification,
    candidate_validation=validation,
    thresholds={"verification_first_error": 1e-5},
    extra_metrics={"workflow_role": "downstream_preprocessing_gate"},
)
card = confidence_card(
    label="downstream template generator",
    fit=fit_summary,
    verification=verification_summary,
    validation=validation,
)
print_cards([card])
print(pretty_json({
    "field_readiness": train_readiness["readiness_label"],
    "generator_confidence": generator_confidence["confidence_label"],
}, max_chars=1800))
[
  {
    "candidate_kind": "generator_family",
    "condition_number": 108587.97472105523,
    "evidence_label": "direct_svd_in_tolerance",
    "first_epsilon": 0.0001,
    "first_error": 1.3904654612463122e-08,
    "fit_mode": "svd",
    "label": "downstream template generator",
    "max_error": 1.385275387507436e-05,
    "reference_fallback_used": false,
    "selected_span_distance": 2.501245182569677e-05,
    "singular_value_count": 4,
    "svd_span_distance": 2.501245182569677e-05,
    "validation_conclusion": "validated",
    "verification_classification": "exact"
  }
]
{
  "field_readiness": "not_ready",
  "generator_confidence": "strong"
}

3. Build backend-native trajectories#

to_pysindy_trajectories(...) is a narrow bridge format. Its output is not a PDELie canonical object.

[4]:
trajectories, time_values, feature_names = to_pysindy_trajectories(downstream_field)
bridge_summary = summarize_discovery_bridge_output(
    trajectories,
    time_values,
    feature_names,
    source_field_id="downstream_field_after_optional_orbit",
    provenance={"orbit_materialized": CONFIG["use_orbit_batch"]},
)
print(pretty_json({
    "num_trajectories": len(trajectories),
    "trajectory_shape": list(trajectories[0].shape),
    "num_feature_names": len(feature_names),
    "bridge_summary_type": bridge_summary["summary_type"],
    "bridge_finite": bridge_summary["finite"],
    "bridge_strictly_increasing_time": bridge_summary["strictly_increasing_time"],
}, max_chars=2500))
{
  "bridge_finite": true,
  "bridge_strictly_increasing_time": true,
  "bridge_summary_type": "discovery_bridge_output",
  "num_feature_names": 32,
  "num_trajectories": 9,
  "trajectory_shape": [
    17,
    32
  ]
}

4. Optional PySINDy smoke fit#

If PySINDy is installed, run the backend adapter. Either way, keep recovery metrics separate from generator-confidence metrics.

[5]:
if importlib.util.find_spec("pysindy") is None:
    discovery = {"status": "failed", "backend": "pysindy", "reason": "pysindy is not installed"}
else:
    discovery = fit_pysindy_discovery(trajectories, time_values, feature_names)

# Tiny paper-agnostic recovery-metric example over caller-supplied canonical terms.
recovery = evaluate_discovery_recovery(
    target_terms={"u_xx": 1.0},
    discovered_terms={"u_xx": 0.98, "u": 0.01},
    support_epsilon=0.05,
)

backend_neutral_result = {
    "status": "success",
    "backend": "tutorial_manual_sparse_result",
    "feature_names": ["u"],
    "equation_terms": {"u": {"u_xx": 0.98, "u": 0.01}},
    "equation_strings": {"u": "u_t = 0.98 u_xx + 0.01 u"},
    "coefficients": np.asarray([[0.98, 0.01]], dtype=float),
    "diagnostics": {"source": "paper_agnostic_tutorial_placeholder"},
}
discovery_summary = summarize_discovery_result(
    backend_neutral_result,
    target_terms={"u": {"u_xx": 1.0}},
    support_epsilon=0.05,
)

partitions = ["train"] * downstream_field.values.shape[0]
split_provenance = summarize_split_leakage_provenance(
    partitions=partitions,
    orbit_batch=orbit_report,
    source_report_id="downstream_template_split_before_orbit",
    extra_metrics={"policy_owner": "user"},
)
workflow = summarize_downstream_discovery_workflow(
    field_readiness=train_readiness,
    generator_confidence=generator_confidence,
    orbit_batch=orbit_report,
    discovery_inputs=bridge_summary,
    discovery_result=discovery_summary,
    split_provenance=split_provenance,
    extra_metrics={"paper_policy": "not_managed_by_pdelie"},
)
print(pretty_json({
    "optional_pysindy_status": discovery["status"],
    "manual_recovery_classification": recovery["classification"],
    "discovery_summary_status": discovery_summary["status"],
    "split_risk_label": split_provenance["risk_label"],
    "workflow_summary_type": workflow["summary_type"],
}, max_chars=3500))
{
  "discovery_summary_status": "success",
  "manual_recovery_classification": "exact",
  "optional_pysindy_status": "success",
  "split_risk_label": "no_detected_overlap",
  "workflow_summary_type": "downstream_discovery_workflow"
}

5. Adapting this to your own PDE data#

Checklist for external data:

  • ensure dims can be interpreted as batch/time/x/var

  • use from_numpy(...), from_xarray(...), or V0.29 from_xarray_dataset(...) for one explicit scalar Dataset variable

  • ensure x is uniform periodic and endpoint-excluded before using spectral or invariant tools

  • supply metadata tags that match the residual evaluator you plan to use

  • validate finite unmasked scalar values

  • keep file loaders, nonuniform grids, multidimensional data, PDEBench/The Well, and operator-learning data outside the current stable notebook path

[6]:
external_like_values = field.values[:1].copy()
external_metadata = dict(field.metadata)
external_metadata["parameter_tags"] = dict(field.metadata["parameter_tags"])
external_metadata["source"] = "tutorial_external_like_array"
external_like = from_numpy(
    external_like_values,
    dims=("batch", "time", "x", "var"),
    coords={"time": field.coords["time"], "x": field.coords["x"]},
    var_name=field.var_names[0],
    metadata=external_metadata,
)
readiness = summarize_field_batch_readiness(
    external_like,
    residual_evaluator=HeatResidualEvaluator(),
    expected_equation="heat_1d",
)
print(pretty_json({
    "imported_shape": list(external_like.values.shape),
    "readiness_label": readiness["readiness_label"],
    "readiness_components": {
        name: status["status"]
        for name, status in readiness["component_statuses"].items()
    },
    "parameter_tags": external_like.metadata["parameter_tags"],
    "preprocess_tail": external_like.preprocess_log[-1],
}, max_chars=3000))
{
  "imported_shape": [
    1,
    17,
    32,
    1
  ],
  "parameter_tags": {
    "nu": 0.1
  },
  "preprocess_tail": {
    "operation": "from_numpy",
    "parameters": {
      "canonical_shape": [
        1,
        17,
        32,
        1
      ],
      "imported_shape": [
        1,
        17,
        32,
        1
      ],
      "injected_batch_axis": false,
      "injected_var_axis": false,
      "mask_provided": false,
      "source_layout": [
        "batch",
        "time",
        "x",
        "var"
      ]
    }
  },
  "readiness_components": {
    "expected_equation": "failed",
    "field": "passed",
    "mask": "passed",
    "metadata": "passed",
    "residual_preflight": "passed",
    "time_coordinate": "passed",
    "values": "passed",
    "x_coordinate": "passed"
  },
  "readiness_label": "not_ready"
}

Recap#

The reusable pattern is: validate or construct a canonical FieldBatch, decide split policy outside PDELie, optionally materialize a translation orbit with provenance, fit and validate a generator, then hand arrays to downstream code.

Common pitfalls#

  • Materializing orbits before deciding train/heldout policy.

  • Letting downstream thresholds become hidden PDELie assumptions.

  • Treating backend-native arrays or labels as canonical artifacts.

  • Applying spectral/invariant tools to nonuniform or multidimensional data without a supported adapter.

Extension ideas#

  • Replace the generated Heat field with your own from_numpy(...), from_xarray(...), or explicit scalar from_xarray_dataset(...) data.

  • Use Fisher-KPP when your workflow needs a reaction-diffusion strong-path example.

  • Compare downstream recovery with and without orbit materialization, but keep the success criteria in your own experiment layer.

What to read/run next#

Return to 00_pde_timeseries_to_generators.ipynb for the core evidence flow, or use this notebook as a template for your own project.