Note

This page renders committed notebook outputs. The Read the Docs build does not execute notebook code.

Robustness sweeps as diagnostic evidence#

Current surface: V0.29.

Purpose#

Treat perturbation sweeps as diagnostics, not guarantees: inspect residual RMS, conditioning, span distance, and verification error together.

What you will learn#

  • How noise, subsampling, and fit epsilon change generator evidence.

  • How to read singular values and condition numbers as fit-health diagnostics.

  • How Heat and Burgers use the same confidence-card language without implying universal robustness.

Required extras#

Matplotlib for compact diagnostic plots; included in .[test].

Expected runtime#

About 1-2 minutes.

Out of scope#

No robustness theorem, no benchmark success claim, no automatic threshold policy, no new fitting algorithm.

These notebooks are tutorials, not API contracts. Example outputs are runtime summaries, not canonical paper artifacts.

[1]:
from pathlib import Path
import sys

ROOT = Path.cwd()
if not (ROOT / "pyproject.toml").exists():
    ROOT = ROOT.parent
if str(ROOT) not in sys.path:
    sys.path.insert(0, str(ROOT))
import numpy as np
import matplotlib.pyplot as plt

from notebooks._tutorial_utils import confidence_card, print_cards
from pdelie.data import add_gaussian_noise, generate_burgers_1d_field_batch, generate_heat_1d_field_batch, subsample_time, subsample_x
from pdelie.derivatives import compute_spectral_fd_derivatives
from pdelie.reporting import summarize_generator_fit_diagnostics, summarize_residual_batch, summarize_verification_report
from pdelie.residuals import BurgersResidualEvaluator, HeatResidualEvaluator
from pdelie.symmetry import fit_translation_generator
from pdelie.verification import verify_translation_generator

CONFIG = {
    "fit_epsilons": [1e-5, 1e-4, 1e-3],
    "noise_fraction": 0.02,
    "span_tolerance": 5e-2,
}
CONFIG

[1]:
{'fit_epsilons': [1e-05, 0.0001, 0.001],
 'noise_fraction': 0.02,
 'span_tolerance': 0.05}

1. Build matched train/heldout variants#

Each row below fits on one variant and verifies on the matching heldout variant. This keeps the comparison focused on perturbation sensitivity rather than shape mismatch.

[2]:
base_train = generate_heat_1d_field_batch(batch_size=4, num_times=33, num_points=64, seed=620)
base_heldout = generate_heat_1d_field_batch(batch_size=3, num_times=33, num_points=64, seed=621)
variants = {
    "clean": (base_train, base_heldout),
    "noisy_2pct": (
        add_gaussian_noise(base_train, std_fraction=CONFIG["noise_fraction"], seed=622),
        add_gaussian_noise(base_heldout, std_fraction=CONFIG["noise_fraction"], seed=623),
    ),
    "time_stride_2": (subsample_time(base_train, stride=2), subsample_time(base_heldout, stride=2)),
    "x_stride_2": (subsample_x(base_train, stride=2), subsample_x(base_heldout, stride=2)),
}
{key: value[0].values.shape for key, value in variants.items()}

[2]:
{'clean': (4, 33, 64, 1),
 'noisy_2pct': (4, 33, 64, 1),
 'time_stride_2': (4, 17, 64, 1),
 'x_stride_2': (4, 33, 32, 1)}

2. Sweep fit epsilon and keep the evidence visible#

[3]:
evaluator = HeatResidualEvaluator()
records = []
for variant_name, (train, heldout) in variants.items():
    residual = evaluator.evaluate(train, compute_spectral_fd_derivatives(train))
    residual_summary = summarize_residual_batch(residual)
    for epsilon in CONFIG["fit_epsilons"]:
        generator = fit_translation_generator(train, evaluator, epsilon=epsilon)
        verification = verify_translation_generator(
            heldout,
            generator,
            evaluator,
            span_tolerance=CONFIG["span_tolerance"],
        )
        fit_summary = summarize_generator_fit_diagnostics(generator)
        verification_summary = summarize_verification_report(verification)
        card = confidence_card(
            label=f"{variant_name} epsilon={epsilon:g}",
            residual=residual_summary,
            fit=fit_summary,
            verification=verification_summary,
        )
        card["variant"] = variant_name
        card["fit_epsilon"] = epsilon
        records.append(card)

print_cards(records[:6])

[
  {
    "condition_number": 1592402.5811672113,
    "evidence_label": "direct_svd_in_tolerance",
    "first_epsilon": 0.0001,
    "first_error": 4.514761151902394e-09,
    "fit_epsilon": 1e-05,
    "fit_mode": "svd",
    "label": "clean epsilon=1e-05",
    "max_error": 4.49785044070442e-06,
    "reference_fallback_used": false,
    "residual_max_abs": 7.923377265028897e-05,
    "residual_rms": 1.5219882569749286e-05,
    "selected_span_distance": 1.0096847977793663e-05,
    "singular_value_count": 4,
    "svd_span_distance": 1.0096847977793663e-05,
    "variant": "clean",
    "verification_classification": "exact"
  },
  {
    "condition_number": 1592402.2067103665,
    "evidence_label": "direct_svd_in_tolerance",
    "first_epsilon": 0.0001,
    "first_error": 4.514761151902394e-09,
    "fit_epsilon": 0.0001,
    "fit_mode": "svd",
    "label": "clean epsilon=0.0001",
    "max_error": 4.49785044070442e-06,
    "reference_fallback_used": false,
    "residual_max_abs": 7.923377265028897e-05,
    "residual_rms": 1.5219882569749286e-05,
    "selected_span_distance": 1.0096841488075772e-05,
    "singular_value_count": 4,
    "svd_span_distance": 1.0096841488075772e-05,
    "variant": "clean",
    "verification_classification": "exact"
  },
  {
    "condition_number": 1592402.1679358482,
    "evidence_label": "direct_svd_in_tolerance",
    "first_epsilon": 0.0001,
    "first_error": 4.514761151902394e-09,
    "fit_epsilon": 0.001,
    "fit_mode": "svd",
    "label": "clean epsilon=0.001",
    "max_error": 4.49785044070442e-06,
    "reference_fallback_used": false,
    "residual_max_abs": 7.923377265028897e-05,
    "residual_rms": 1.5219882569749286e-05,
    "selected_span_distance": 1.009684104362181e-05,
    "singular_value_count": 4,
    "svd_span_distance": 1.009684104362181e-05,
    "variant": "clean",
    "verification_classification": "exact"
  },
  {
    "condition_number": 21.417299025350534,
    "evidence_label": "direct_svd_out_of_tolerance",
    "first_epsilon": 0.0001,
    "first_error": 0.00043939046954815053,
    "fit_epsilon": 1e-05,
    "fit_mode": "svd",
    "label": "noisy_2pct epsilon=1e-05",
    "max_error": 0.43939046954755706,
    "reference_fallback_used": false,
    "residual_max_abs": 14.710257947945559,
    "residual_rms": 1.9607738077410153,
    "selected_span_distance": 1.1298559903521876,
    "singular_value_count": 4,
    "svd_span_distance": 1.1298559903521876,
    "variant": "noisy_2pct",
    "verification_classification": "failed"
  },
  {
    "condition_number": 21.41729902524864,
    "evidence_label": "direct_svd_out_of_tolerance",
    "first_epsilon": 0.0001,
    "first_error": 0.00043939046954725487,
    "fit_epsilon": 0.0001,
    "fit_mode": "svd",
    "label": "noisy_2pct epsilon=0.0001",
    "max_error": 0.43939046954667976,
    "reference_fallback_used": false,
    "residual_max_abs": 14.710257947945559,
    "residual_rms": 1.9607738077410153,
    "selected_span_distance": 1.1298559903545764,
    "singular_value_count": 4,
    "svd_span_distance": 1.1298559903545764,
    "variant": "noisy_2pct",
    "verification_classification": "failed"
  },
  {
    "condition_number": 21.41729902525099,
    "evidence_label": "direct_svd_out_of_tolerance",
    "first_epsilon": 0.0001,
    "first_error": 0.0004393904695468112,
    "fit_epsilon": 0.001,
    "fit_mode": "svd",
    "label": "noisy_2pct epsilon=0.001",
    "max_error": 0.43939046954663763,
    "reference_fallback_used": false,
    "residual_max_abs": 14.710257947945559,
    "residual_rms": 1.9607738077410153,
    "selected_span_distance": 1.1298559903547096,
    "singular_value_count": 4,
    "svd_span_distance": 1.1298559903547096,
    "variant": "noisy_2pct",
    "verification_classification": "failed"
  }
]

3. Plot the diagnostics rather than hiding them#

The point is not to declare one scalar winner. You want to see when conditioning, residual error, and verification disagree.

[4]:
fig, axes = plt.subplots(1, 3, figsize=(13, 3.2))
for variant_name in variants:
    rows = [row for row in records if row["variant"] == variant_name]
    eps = np.asarray([row["fit_epsilon"] for row in rows], dtype=float)
    cond = np.asarray([row["condition_number"] for row in rows], dtype=float)
    span = np.asarray([row["selected_span_distance"] for row in rows], dtype=float)
    err = np.asarray([row["first_error"] for row in rows], dtype=float)
    axes[0].loglog(eps, cond, marker="o", label=variant_name)
    axes[1].loglog(eps, span, marker="o", label=variant_name)
    axes[2].loglog(eps, err, marker="o", label=variant_name)
axes[0].set_title("condition number")
axes[1].set_title("selected span distance")
axes[2].set_title("first verification error")
for ax in axes:
    ax.set_xlabel("fit epsilon")
    ax.grid(True, which="both", alpha=0.3)
axes[0].legend(fontsize=8)
plt.tight_layout()

../_images/tutorials_02_robustness_sweeps_7_0.png

4. Same diagnostic language on Burgers#

Burgers is included to show that the confidence card is a reporting pattern, not a Heat-specific metric. The exact thresholds remain task-specific.

[5]:
burgers_train = generate_burgers_1d_field_batch(batch_size=4, num_times=33, num_points=64, seed=624)
burgers_heldout = generate_burgers_1d_field_batch(batch_size=3, num_times=33, num_points=64, seed=625)
burgers_evaluator = BurgersResidualEvaluator()
burgers_residual = burgers_evaluator.evaluate(burgers_train, compute_spectral_fd_derivatives(burgers_train))
burgers_generator = fit_translation_generator(burgers_train, burgers_evaluator, epsilon=CONFIG["fit_epsilons"][1])
burgers_verification = verify_translation_generator(burgers_heldout, burgers_generator, burgers_evaluator)

burgers_card = confidence_card(
    label="burgers diagnostic check",
    residual=summarize_residual_batch(burgers_residual),
    fit=summarize_generator_fit_diagnostics(burgers_generator),
    verification=summarize_verification_report(burgers_verification),
)
print_cards([burgers_card])
[
  {
    "condition_number": 1199.3791068536707,
    "evidence_label": "reference_fallback",
    "first_epsilon": 0.0001,
    "first_error": 2.540177684328114e-09,
    "fit_mode": "reference_fallback",
    "label": "burgers diagnostic check",
    "max_error": 2.5137922401083095e-06,
    "reference_fallback_used": true,
    "residual_max_abs": 9.002696193464998e-06,
    "residual_rms": 9.971877293088462e-07,
    "selected_span_distance": 0.0,
    "singular_value_count": 4,
    "svd_span_distance": 0.4443193452662649,
    "verification_classification": "exact"
  }
]

Recap#

A robustness sweep is useful because it exposes disagreements between residual health, fit conditioning, span recovery, and finite-transform verification.

Common pitfalls#

  • Reporting only the best-looking coefficient vector.

  • Turning one perturbation sweep into a robustness guarantee.

  • Comparing variants with different residual targets or incompatible shapes.

  • Treating condition numbers as pass/fail without context.

Extension ideas#

  • Add a Fisher-KPP row and compare against Heat/Burgers.

  • Sweep verification epsilon grids separately from fit epsilon.

  • Save confidence cards as your own experiment logs outside the notebook.

What to read/run next#

Run 03_portability_round_trips.ipynb to see why serialized generators should be revalidated.