# Performance regression suite SimVX ships an **opt-in performance regression suite**: high-level, feature-oriented benchmarks that answer the questions a developer actually perceives -- *what frame rate do I get with N objects, how many particles can I push, how many physics bodies before the step blows the frame budget* -- so you can spot regressions after a large rework, understand the engine's limits, and compare against other engines. These benchmarks **measure speed only**. They do not verify correctness, so they are kept out of normal test runs and only execute when you explicitly ask for them. ## How it differs from the normal tests - Marked `@pytest.mark.perf`. Every package's default `addopts` carries `-m "not perf"`, so a plain `pytest` run **never** collects them. A command-line `-m perf` overrides that and runs **only** the benchmarks. - Results are **local-only**. Each run appends to a per-machine history and is compared against that machine's own pinned baseline. Nothing is committed -- absolute numbers are hardware-specific, so the gate is *relative drift on this machine*, not a magic threshold. - The whole `.perf/` directory is gitignored. ## Running ```bash # Everything, compared to this machine's baseline (fails on regression): uv run python tools/run_benchmarks.py # Pin the current numbers as the new baseline (after an intentional change): uv run python tools/run_benchmarks.py --update-baseline # Record + report drift but never fail: uv run python tools/run_benchmarks.py --report-only # Only one area, or a custom tolerance: uv run python tools/run_benchmarks.py --suite rendering uv run python tools/run_benchmarks.py --tolerance 0.15 ``` To check a single area fast, run its file directly (graphics suites need `pytest-forked` for Vulkan process isolation): ```bash uv run --with pytest-forked --package simvx-graphics pytest -m perf \ packages/graphics/tests/test_perf_rendering.py -s uv run --package simvx-core pytest -m perf \ packages/core/tests/test_perf_physics.py -s ``` ### GPU capability gating (3-state policy) GPU benchmarks (and all GPU tests) are gated by the `vulkan` marker, which the graphics `conftest` **auto-applies** to any test using a GPU fixture (`headless_app` / `capture` / `require_vulkan`), so "uses a GPU fixture" and "is a GPU test" are the same fact: see `simvx.graphics.gpu_gate`. There are three outcomes, never a silent false-green: - **GPU present** → the test runs. - **GPU absent** (no `vulkan` binding or no device) → the test **skips** by default (so a GPU-less contributor or CI host stays green), **unless `--require-gpu`** is passed, which makes absence a **failure**. `tools/run_benchmarks.py` passes `--require-gpu` automatically (override with `--allow-gpu-skip`), because invoking the runner is an explicit request to measure: an empty GPU run must not look green. - **GPU broken** (a device is present but init raises) → the test **fails** regardless of `--require-gpu`: a broken driver is a defect, not an absence. A test that drives the device without a GPU fixture is caught at runtime: if it errors with a missing-GPU message while ungated, the report gains a hint to add the fixture or `@pytest.mark.vulkan`. ## Where results live Resolved from `$SIMVX_PERF_DIR`, defaulting to `/.perf/`: ``` .perf/history//.jsonl append-only, one line per measured point .perf/baseline/.json {record_key: record} pinned baseline .perf/reports/.{md,json} consolidated drift report per run ``` A record is keyed by `(suite, name, backend, count, metric)` and stamps the machine (host, CPU, GPU, Python build incl. free-threaded, git commit), the headline metric, and percentiles. The `run_benchmarks.py` report is a table of *value vs baseline vs Δ vs verdict* (`ok` / `regressed` / `improved` / `new`). ## Pass / fail The first run on a machine has no baseline: every point is **seeded** and passes. After you `--update-baseline`, subsequent runs compare against it and **fail** when a metric moves more than the tolerance (default 20%) in the worse direction. For frame-time metrics lower is better; limit-style metrics (e.g. *max objects at 60 FPS*) set `lower_is_better=False` so higher is better. When you intentionally change performance, re-pin with `--update-baseline` and note the reason in your commit. ## Optional acceleration backends Some subsystems have (or will have) an accelerated path alongside a pure-Python fallback. Benchmarks stamp a `backend` onto each record so the two never mix in history: - **Physics** records `backend="python"` today. When the Jolt path lands, its arm records `backend="jolt"` and skips when the optional dependency is absent. - The occlusion-culling benchmark records `occlusion_on` / `occlusion_off` as backends so the Hi-Z cull's payoff (drawn vs total instances) is tracked as its own series. ## Authoring a benchmark Use the shared harness in `simvx.core.testing.benchmark` and the `perf_recorder` fixture: ```python import pytest from simvx.core.testing import bench_headless_render pytestmark = pytest.mark.perf @pytest.mark.vulkan def test_my_scene(perf_recorder): result = bench_headless_render(MyScene, frames=60, warmup=10, n=10_000, telemetry_keys=("transform_high_water",)) perf_recorder.record(result) ``` - `bench_scene_runner(root, frames)` -- CPU-only (no GPU), via `SceneRunner`. - `bench_headless_render(scene_or_cls, frames, ...)` -- full headless Vulkan render; copies named `app.last_telemetry` keys (GPU phase times, occlusion counts, instance high-water) into the result. - `perf_recorder.record(result)` appends to history, compares to baseline, and (on teardown) fails the test on regression unless `--perf-report-only`. Keep counts as a sweep to capture the scaling curve, and note any engine ceiling you hit (e.g. the GPU draw-batch cap) in the test rather than silently staying under it.