Numerical Validation Methodology

Purpose

This document defines the validation protocol used to support numerical claims for FluxGraph model integration behavior. This document is the authoritative criteria/source for numerical evidence.

Systems Under Test

Current validation scope targets thermal_mass with two canonical scenarios:

thermal.cooling.v1
thermal.forced_response.v1

Each scenario is evaluated with:

forward_euler
rk4

Reference Solution

Validation uses the analytical solution of the first-order linear thermal model:

dT/dt = (P - h*(T - T_amb)) / C

Closed-form references are used for both cooling and forced-response settings.

Error Metrics

For each (scenario, method, dt) run:

L2 error over the sampled trajectory
Linf error over the sampled trajectory
Final-time absolute error

Convergence Estimation

Observed order is estimated by linear regression on log-log scale:

log(error) = p * log(dt) + b

p is reported as:

observed_order_l2
observed_order_linf

Evidence Thresholds

Current CI-enforced minima (Linf):

forward_euler >= 0.9
rk4 >= 3.5

These thresholds are conservative guards. They are not intended as publication claims by themselves; publication claims should cite full artifact sets.

Reproducibility

Local run:

python scripts/run_numerical_validation.py --preset dev-release --enforce-order

Windows run:

python .\scripts\run_numerical_validation.py --preset dev-windows-release --config Release --enforce-order

CI run:

GitHub Actions workflow: .github/workflows/numerical-validation-evidence.yml
Output artifact directory: artifacts/validation/<timestamp_or_runid>_*

Each evidence run includes:

validation_results.json
validation_results.csv
validation_evaluation.json
stdout/stderr and build/configure logs

Threats to Validity

Current suite validates only one model family (thermal_mass).
Convergence is measured on fixed dt sweeps and selected parameter regimes.
Floating-point behavior may vary slightly across compilers/architectures.
Threshold-based pass/fail does not replace full statistical or multi-platform analysis.

Next Extensions

Add validation scenarios for additional models as they are introduced.
Extend to transform stochastic validation (e.g., noise distribution checks).
Add publication-grade plotting/notebook pipeline over stored CSV results.