`simvx.core.testing.benchmark`¶

Performance-benchmark harness for SimVX’s opt-in regression suite.

This module is the single home for the benchmark plumbing shared by every @pytest.mark.perf suite (across packages) and by tools/run_benchmarks.py:

class:

BenchmarkResult – one measured data point (frame timing + metadata).
func:

bench_scene_runner / :func:bench_headless_render – run a scene and measure it (CPU-only via :class:~simvx.core.testing.SceneRunner, or a full headless Vulkan render; the latter lazy-imports simvx.graphics).
class:

MachineInfo – host/CPU/GPU/interpreter snapshot stamped onto records.
class:

HistoryStore – append-only per-machine history + pinned baselines.
class:

PerfRecorder – records results, compares against the machine’s own baseline, and (outside report-only mode) fails when a metric regresses.

These utilities measure speed only: correctness lives in the normal suites. Results are stored locally under the perf directory (:func:perf_dir) and are never committed – absolute numbers are machine-specific, so regressions are judged against this machine’s own recorded baseline, not a hardcoded threshold.

Module Contents¶

Classes¶

`BenchmarkResult`	One measured benchmark point.
`MachineInfo`	Host + interpreter + GPU snapshot stamped onto every record.
`PerfRecord`	A flattened, JSON-serialisable benchmark record (one JSONL line).
`HistoryStore`	Append-only per-machine benchmark history with pinned baselines.
`PerfRecorder`	Records benchmark results, compares to the machine baseline, gates on drift.

Functions¶

`bench_scene_runner`	Benchmark a scene with :class:`SceneRunner` (CPU only, no GPU).
`bench_headless_render`	Benchmark a scene with full Vulkan rendering (headless).
`perf_dir`	Resolve the local perf directory (`$SIMVX_PERF_DIR` or `<repo>/.perf`).
`record_key`	Stable identity for a measured point across runs.
`compare`	Classify `current` against `baseline` -> ok / regressed / improved / new.
`make_perf_recorder`	Construct a :class:`PerfRecorder` (used by the `perf_recorder` fixture).
`perf_pytest_addoption`	Register the shared perf CLI options on a pytest parser.

Data¶

`__all__`
`DEFAULT_TOLERANCE`

API¶

simvx.core.testing.benchmark.__all__¶: [‘BenchmarkResult’, ‘HistoryStore’, ‘MachineInfo’, ‘PerfRecord’, ‘PerfRecorder’, ‘bench_headless_ren…

simvx.core.testing.benchmark.DEFAULT_TOLERANCE¶: 0.2

class simvx.core.testing.benchmark.BenchmarkResult[source]¶

One measured benchmark point.

The headline metric is :meth:metric_value. By default it is the average frame time in ms (metric="frame_ms", lower is better). A limit-style bench (e.g. “max sprites at 60 FPS”) sets metric, value and lower_is_better=False so the comparison treats higher as better.

name: str¶: None

count: int¶: 0

total_ms: float¶: 0.0

frames: int¶: 0

samples_ms: list[float]¶: ‘field(…)’

gpu_ms: float | None¶: None

backend: str¶: ‘default’

metric: str¶: ‘frame_ms’

value: float | None¶: None

lower_is_better: bool¶: True

extra: dict[str, float]¶: ‘field(…)’

property avg_frame_ms: float[source]¶

property fps: float[source]¶

property per_object_us: float[source]¶: Microseconds per object per frame.

property p50_ms: float[source]¶

property p95_ms: float[source]¶

property p99_ms: float[source]¶

property min_ms: float[source]¶

property max_ms: float[source]¶

metric_value() → float[source]¶: The headline value the baseline comparison is made against.

report() → str[source]¶

simvx.core.testing.benchmark.bench_scene_runner(root, frames: int = 120, draw: bool = False, *, backend: str = 'default') → simvx.core.testing.benchmark.BenchmarkResult[source]¶: Benchmark a scene with :class:SceneRunner (CPU only, no GPU).

simvx.core.testing.benchmark.bench_headless_render(scene_or_cls, frames: int = 60, width: int = 1280, height: int = 720, *, warmup: int = 0, telemetry_keys: tuple[str, ...] = (), backend: str = 'default', **kwargs) → simvx.core.testing.benchmark.BenchmarkResult[source]¶

Benchmark a scene with full Vulkan rendering (headless).

Lazy-imports simvx.graphics so the core package stays render-agnostic. telemetry_keys names entries from app.last_telemetry to copy into result.extra (e.g. "occlusion_drawn", "transform_high_water").

class simvx.core.testing.benchmark.MachineInfo[source]¶

Host + interpreter + GPU snapshot stamped onto every record.

host: str¶: None

os: str¶: None

cpu: str¶: None

cpu_count: int¶: None

ram_gb: float¶: None

python: str¶: None

free_threaded: bool¶: None

gpu: str¶: ‘unknown’

git_commit: str¶: ‘unknown’

classmethod capture() → simvx.core.testing.benchmark.MachineInfo[source]¶

to_dict() → dict[str, Any][source]¶

simvx.core.testing.benchmark.perf_dir() → pathlib.Path[source]¶: Resolve the local perf directory ($SIMVX_PERF_DIR or <repo>/.perf).

simvx.core.testing.benchmark.record_key(suite: str, name: str, backend: str, count: int, metric: str) → str[source]¶: Stable identity for a measured point across runs.

class simvx.core.testing.benchmark.PerfRecord[source]¶

A flattened, JSON-serialisable benchmark record (one JSONL line).

ts: str¶: None

suite: str¶: None

name: str¶: None

backend: str¶: None

count: int¶: None

metric: str¶: None

value: float¶: None

lower_is_better: bool¶: None

avg_frame_ms: float¶: None

fps: float¶: None

p95_ms: float¶: None

p99_ms: float¶: None

gpu_ms: float | None¶: None

extra: dict[str, float]¶: None

machine: dict[str, Any]¶: None

property key: str[source]¶

classmethod from_result(suite: str, result: simvx.core.testing.benchmark.BenchmarkResult, machine: simvx.core.testing.benchmark.MachineInfo) → simvx.core.testing.benchmark.PerfRecord[source]¶

to_dict() → dict[str, Any][source]¶

class simvx.core.testing.benchmark.HistoryStore(root: pathlib.Path | None = None, machine: simvx.core.testing.benchmark.MachineInfo | None = None)[source]¶

Append-only per-machine benchmark history with pinned baselines.

Layout under :func:perf_dir::

history/<host>/<suite>.jsonl   one appended line per measured point
baseline/<host>.json           {record_key: record_dict} pinned baseline

Initialization

append(record: simvx.core.testing.benchmark.PerfRecord) → None[source]¶

baselines() → dict[str, dict[str, Any]][source]¶

baseline(key: str) → dict[str, Any] | None[source]¶

update_baselines(records: dict[str, dict[str, Any]]) → None[source]¶: Merge records (key -> record dict) into the pinned baseline file.

simvx.core.testing.benchmark.compare(current: float, baseline: float, lower_is_better: bool, tol: float = DEFAULT_TOLERANCE) → str[source]¶: Classify current against baseline -> ok / regressed / improved / new.

class simvx.core.testing.benchmark.PerfRecorder(suite: str, *, store: simvx.core.testing.benchmark.HistoryStore | None = None, tol: float = DEFAULT_TOLERANCE, update_baseline: bool = False, report_only: bool = False, emit: collections.abc.Callable[[str], None] = print)[source]¶

Records benchmark results, compares to the machine baseline, gates on drift.

A perf test takes the perf_recorder fixture and calls

Meth:: record once per measured point. On fixture teardown
Meth:: finish runs: it pins the baseline when --update-perf-baseline was given, and otherwise raises if any metric regressed beyond tolerance (unless --perf-report-only). The first run on a machine has no baseline, so points are seeded and pass.