Performance regression suite

SimVX ships an opt-in performance regression suite: high-level, feature-oriented benchmarks that answer the questions a developer actually perceives – what frame rate do I get with N objects, how many particles can I push, how many physics bodies before the step blows the frame budget – so you can spot regressions after a large rework, understand the engine’s limits, and compare against other engines.

These benchmarks measure speed only. They do not verify correctness, so they are kept out of normal test runs and only execute when you explicitly ask for them.

How it differs from the normal tests

  • Marked @pytest.mark.perf. Every package’s default addopts carries -m "not perf", so a plain pytest run never collects them. A command-line -m perf overrides that and runs only the benchmarks.

  • Results are local-only. Each run appends to a per-machine history and is compared against that machine’s own pinned baseline. Nothing is committed – absolute numbers are hardware-specific, so the gate is relative drift on this machine, not a magic threshold.

  • The whole .perf/ directory is gitignored.

Running

# Everything, compared to this machine's baseline (fails on regression):
uv run python tools/run_benchmarks.py

# Pin the current numbers as the new baseline (after an intentional change):
uv run python tools/run_benchmarks.py --update-baseline

# Record + report drift but never fail:
uv run python tools/run_benchmarks.py --report-only

# Only one area, or a custom tolerance:
uv run python tools/run_benchmarks.py --suite rendering
uv run python tools/run_benchmarks.py --tolerance 0.15

To check a single area fast, run its file directly (graphics suites need pytest-forked for Vulkan process isolation):

uv run --with pytest-forked --package simvx-graphics pytest -m perf \
    packages/graphics/tests/test_perf_rendering.py -s

uv run --package simvx-core pytest -m perf \
    packages/core/tests/test_perf_physics.py -s

GPU capability gating (3-state policy)

GPU benchmarks (and all GPU tests) are gated by the vulkan marker, which the graphics conftest auto-applies to any test using a GPU fixture (headless_app / capture / require_vulkan), so “uses a GPU fixture” and “is a GPU test” are the same fact: see simvx.graphics.gpu_gate. There are three outcomes, never a silent false-green:

  • GPU present → the test runs.

  • GPU absent (no vulkan binding or no device) → the test skips by default (so a GPU-less contributor or CI host stays green), unless --require-gpu is passed, which makes absence a failure. tools/run_benchmarks.py passes --require-gpu automatically (override with --allow-gpu-skip), because invoking the runner is an explicit request to measure: an empty GPU run must not look green.

  • GPU broken (a device is present but init raises) → the test fails regardless of --require-gpu: a broken driver is a defect, not an absence.

A test that drives the device without a GPU fixture is caught at runtime: if it errors with a missing-GPU message while ungated, the report gains a hint to add the fixture or @pytest.mark.vulkan.

Where results live

Resolved from $SIMVX_PERF_DIR, defaulting to <repo>/.perf/:

.perf/history/<host>/<suite>.jsonl   append-only, one line per measured point
.perf/baseline/<host>.json           {record_key: record} pinned baseline
.perf/reports/<timestamp>.{md,json}  consolidated drift report per run

A record is keyed by (suite, name, backend, count, metric) and stamps the machine (host, CPU, GPU, Python build incl. free-threaded, git commit), the headline metric, and percentiles. The run_benchmarks.py report is a table of value vs baseline vs Δ vs verdict (ok / regressed / improved / new).

Pass / fail

The first run on a machine has no baseline: every point is seeded and passes. After you --update-baseline, subsequent runs compare against it and fail when a metric moves more than the tolerance (default 20%) in the worse direction. For frame-time metrics lower is better; limit-style metrics (e.g. max objects at 60 FPS) set lower_is_better=False so higher is better.

When you intentionally change performance, re-pin with --update-baseline and note the reason in your commit.

Optional acceleration backends

Some subsystems have (or will have) an accelerated path alongside a pure-Python fallback. Benchmarks stamp a backend onto each record so the two never mix in history:

  • Physics records backend="python" today. When the Jolt path lands, its arm records backend="jolt" and skips when the optional dependency is absent.

  • The occlusion-culling benchmark records occlusion_on / occlusion_off as backends so the Hi-Z cull’s payoff (drawn vs total instances) is tracked as its own series.

Authoring a benchmark

Use the shared harness in simvx.core.testing.benchmark and the perf_recorder fixture:

import pytest
from simvx.core.testing import bench_headless_render

pytestmark = pytest.mark.perf


@pytest.mark.vulkan
def test_my_scene(perf_recorder):
    result = bench_headless_render(MyScene, frames=60, warmup=10, n=10_000,
                                   telemetry_keys=("transform_high_water",))
    perf_recorder.record(result)
  • bench_scene_runner(root, frames) – CPU-only (no GPU), via SceneRunner.

  • bench_headless_render(scene_or_cls, frames, ...) – full headless Vulkan render; copies named app.last_telemetry keys (GPU phase times, occlusion counts, instance high-water) into the result.

  • perf_recorder.record(result) appends to history, compares to baseline, and (on teardown) fails the test on regression unless --perf-report-only.

Keep counts as a sweep to capture the scaling curve, and note any engine ceiling you hit (e.g. the GPU draw-batch cap) in the test rather than silently staying under it.