Performance regression suite¶
SimVX ships an opt-in performance regression suite: high-level, feature-oriented benchmarks that answer the questions a developer actually perceives – what frame rate do I get with N objects, how many particles can I push, how many physics bodies before the step blows the frame budget – so you can spot regressions after a large rework, understand the engine’s limits, and compare against other engines.
These benchmarks measure speed only. They do not verify correctness, so they are kept out of normal test runs and only execute when you explicitly ask for them.
How it differs from the normal tests¶
Marked
@pytest.mark.perf. Every package’s defaultaddoptscarries-m "not perf", so a plainpytestrun never collects them. A command-line-m perfoverrides that and runs only the benchmarks.Results are local-only. Each run appends to a per-machine history and is compared against that machine’s own pinned baseline. Nothing is committed – absolute numbers are hardware-specific, so the gate is relative drift on this machine, not a magic threshold.
The whole
.perf/directory is gitignored.
Running¶
# Everything, compared to this machine's baseline (fails on regression):
uv run python tools/run_benchmarks.py
# Pin the current numbers as the new baseline (after an intentional change):
uv run python tools/run_benchmarks.py --update-baseline
# Record + report drift but never fail:
uv run python tools/run_benchmarks.py --report-only
# Only one area, or a custom tolerance:
uv run python tools/run_benchmarks.py --suite rendering
uv run python tools/run_benchmarks.py --tolerance 0.15
To check a single area fast, run its file directly (graphics suites need pytest-forked
for Vulkan process isolation):
uv run --with pytest-forked --package simvx-graphics pytest -m perf \
packages/graphics/tests/test_perf_rendering.py -s
uv run --package simvx-core pytest -m perf \
packages/core/tests/test_perf_physics.py -s
GPU capability gating (3-state policy)¶
GPU benchmarks (and all GPU tests) are gated by the vulkan marker, which the
graphics conftest auto-applies to any test using a GPU fixture
(headless_app / capture / require_vulkan), so “uses a GPU fixture” and “is a
GPU test” are the same fact: see simvx.graphics.gpu_gate. There are three
outcomes, never a silent false-green:
GPU present → the test runs.
GPU absent (no
vulkanbinding or no device) → the test skips by default (so a GPU-less contributor or CI host stays green), unless--require-gpuis passed, which makes absence a failure.tools/run_benchmarks.pypasses--require-gpuautomatically (override with--allow-gpu-skip), because invoking the runner is an explicit request to measure: an empty GPU run must not look green.GPU broken (a device is present but init raises) → the test fails regardless of
--require-gpu: a broken driver is a defect, not an absence.
A test that drives the device without a GPU fixture is caught at runtime: if it
errors with a missing-GPU message while ungated, the report gains a hint to add
the fixture or @pytest.mark.vulkan.
Where results live¶
Resolved from $SIMVX_PERF_DIR, defaulting to <repo>/.perf/:
.perf/history/<host>/<suite>.jsonl append-only, one line per measured point
.perf/baseline/<host>.json {record_key: record} pinned baseline
.perf/reports/<timestamp>.{md,json} consolidated drift report per run
A record is keyed by (suite, name, backend, count, metric) and stamps the machine
(host, CPU, GPU, Python build incl. free-threaded, git commit), the headline metric, and
percentiles. The run_benchmarks.py report is a table of value vs baseline vs Δ vs
verdict (ok / regressed / improved / new).
Pass / fail¶
The first run on a machine has no baseline: every point is seeded and passes. After
you --update-baseline, subsequent runs compare against it and fail when a metric
moves more than the tolerance (default 20%) in the worse direction. For frame-time metrics
lower is better; limit-style metrics (e.g. max objects at 60 FPS) set
lower_is_better=False so higher is better.
When you intentionally change performance, re-pin with --update-baseline and note the
reason in your commit.
Optional acceleration backends¶
Some subsystems have (or will have) an accelerated path alongside a pure-Python fallback.
Benchmarks stamp a backend onto each record so the two never mix in history:
Physics records
backend="python"today. When the Jolt path lands, its arm recordsbackend="jolt"and skips when the optional dependency is absent.The occlusion-culling benchmark records
occlusion_on/occlusion_offas backends so the Hi-Z cull’s payoff (drawn vs total instances) is tracked as its own series.