# Performance regression suite

SimVX ships an **opt-in performance regression suite**: high-level, feature-oriented
benchmarks that answer the questions a developer actually perceives -- *what frame rate do
I get with N objects, how many particles can I push, how many physics bodies before the
step blows the frame budget* -- so you can spot regressions after a large rework,
understand the engine's limits, and compare against other engines.

These benchmarks **measure speed only**. They do not verify correctness, so they are kept
out of normal test runs and only execute when you explicitly ask for them.

## How it differs from the normal tests

- Marked `@pytest.mark.perf`. Every package's default `addopts` carries `-m "not perf"`, so
  a plain `pytest` run **never** collects them. A command-line `-m perf` overrides that and
  runs **only** the benchmarks.
- Results are **local-only**. Each run appends to a per-machine history and is compared
  against that machine's own pinned baseline. Nothing is committed -- absolute numbers are
  hardware-specific, so the gate is *relative drift on this machine*, not a magic threshold.
- The whole `.perf/` directory is gitignored.

## Running

```bash
# Everything, compared to this machine's baseline (fails on regression):
uv run python tools/run_benchmarks.py

# Pin the current numbers as the new baseline (after an intentional change):
uv run python tools/run_benchmarks.py --update-baseline

# Record + report drift but never fail:
uv run python tools/run_benchmarks.py --report-only

# Only one area, or a custom tolerance:
uv run python tools/run_benchmarks.py --suite rendering
uv run python tools/run_benchmarks.py --tolerance 0.15
```

To check a single area fast, run its file directly (graphics suites need `pytest-forked`
for Vulkan process isolation):

```bash
uv run --with pytest-forked --package simvx-graphics pytest -m perf \
    packages/graphics/tests/test_perf_rendering.py -s

uv run --package simvx-core pytest -m perf \
    packages/core/tests/test_perf_physics.py -s
```

### GPU capability gating (3-state policy)

GPU benchmarks (and all GPU tests) are gated by the `vulkan` marker, which the
graphics `conftest` **auto-applies** to any test using a GPU fixture
(`headless_app` / `capture` / `require_vulkan`), so "uses a GPU fixture" and "is a
GPU test" are the same fact: see `simvx.graphics.gpu_gate`. There are three
outcomes, never a silent false-green:

- **GPU present** → the test runs.
- **GPU absent** (no `vulkan` binding or no device) → the test **skips** by default
  (so a GPU-less contributor or CI host stays green), **unless `--require-gpu`** is
  passed, which makes absence a **failure**. `tools/run_benchmarks.py` passes
  `--require-gpu` automatically (override with `--allow-gpu-skip`), because invoking
  the runner is an explicit request to measure: an empty GPU run must not look green.
- **GPU broken** (a device is present but init raises) → the test **fails** regardless
  of `--require-gpu`: a broken driver is a defect, not an absence.

A test that drives the device without a GPU fixture is caught at runtime: if it
errors with a missing-GPU message while ungated, the report gains a hint to add
the fixture or `@pytest.mark.vulkan`.

## Where results live

Resolved from `$SIMVX_PERF_DIR`, defaulting to `<repo>/.perf/`:

```
.perf/history/<host>/<suite>.jsonl   append-only, one line per measured point
.perf/baseline/<host>.json           {record_key: record} pinned baseline
.perf/reports/<timestamp>.{md,json}  consolidated drift report per run
```

A record is keyed by `(suite, name, backend, count, metric)` and stamps the machine
(host, CPU, GPU, Python build incl. free-threaded, git commit), the headline metric, and
percentiles. The `run_benchmarks.py` report is a table of *value vs baseline vs Δ vs
verdict* (`ok` / `regressed` / `improved` / `new`).

## Pass / fail

The first run on a machine has no baseline: every point is **seeded** and passes. After
you `--update-baseline`, subsequent runs compare against it and **fail** when a metric
moves more than the tolerance (default 20%) in the worse direction. For frame-time metrics
lower is better; limit-style metrics (e.g. *max objects at 60 FPS*) set
`lower_is_better=False` so higher is better.

When you intentionally change performance, re-pin with `--update-baseline` and note the
reason in your commit.

## Optional acceleration backends

Some subsystems have (or will have) an accelerated path alongside a pure-Python fallback.
Benchmarks stamp a `backend` onto each record so the two never mix in history:

- **Physics** records `backend="python"` today. When the Jolt path lands, its arm records
  `backend="jolt"` and skips when the optional dependency is absent.
- The occlusion-culling benchmark records `occlusion_on` / `occlusion_off` as backends so
  the Hi-Z cull's payoff (drawn vs total instances) is tracked as its own series.

## Authoring a benchmark

Use the shared harness in `simvx.core.testing.benchmark` and the `perf_recorder` fixture:

```python
import pytest
from simvx.core.testing import bench_headless_render

pytestmark = pytest.mark.perf


@pytest.mark.vulkan
def test_my_scene(perf_recorder):
    result = bench_headless_render(MyScene, frames=60, warmup=10, n=10_000,
                                   telemetry_keys=("transform_high_water",))
    perf_recorder.record(result)
```

- `bench_scene_runner(root, frames)` -- CPU-only (no GPU), via `SceneRunner`.
- `bench_headless_render(scene_or_cls, frames, ...)` -- full headless Vulkan render; copies
  named `app.last_telemetry` keys (GPU phase times, occlusion counts, instance high-water)
  into the result.
- `perf_recorder.record(result)` appends to history, compares to baseline, and (on teardown)
  fails the test on regression unless `--perf-report-only`.

Keep counts as a sweep to capture the scaling curve, and note any engine ceiling you hit
(e.g. the GPU draw-batch cap) in the test rather than silently staying under it.