simvx.core.testing.benchmark

Performance-benchmark harness for SimVX’s opt-in regression suite.

This module is the single home for the benchmark plumbing shared by every @pytest.mark.perf suite (across packages) and by tools/run_benchmarks.py:

  • class:

    BenchmarkResult – one measured data point (frame timing + metadata).

  • func:

    bench_scene_runner / :func:bench_headless_render – run a scene and measure it (CPU-only via :class:~simvx.core.testing.SceneRunner, or a full headless Vulkan render; the latter lazy-imports simvx.graphics).

  • class:

    MachineInfo – host/CPU/GPU/interpreter snapshot stamped onto records.

  • class:

    HistoryStore – append-only per-machine history + pinned baselines.

  • class:

    PerfRecorder – records results, compares against the machine’s own baseline, and (outside report-only mode) fails when a metric regresses.

These utilities measure speed only: correctness lives in the normal suites. Results are stored locally under the perf directory (:func:perf_dir) and are never committed – absolute numbers are machine-specific, so regressions are judged against this machine’s own recorded baseline, not a hardcoded threshold.

Module Contents

Classes

BenchmarkResult

One measured benchmark point.

MachineInfo

Host + interpreter + GPU snapshot stamped onto every record.

PerfRecord

A flattened, JSON-serialisable benchmark record (one JSONL line).

HistoryStore

Append-only per-machine benchmark history with pinned baselines.

PerfRecorder

Records benchmark results, compares to the machine baseline, gates on drift.

Functions

bench_scene_runner

Benchmark a scene with :class:SceneRunner (CPU only, no GPU).

bench_headless_render

Benchmark a scene with full Vulkan rendering (headless).

perf_dir

Resolve the local perf directory ($SIMVX_PERF_DIR or <repo>/.perf).

record_key

Stable identity for a measured point across runs.

compare

Classify current against baseline -> ok / regressed / improved / new.

make_perf_recorder

Construct a :class:PerfRecorder (used by the perf_recorder fixture).

perf_pytest_addoption

Register the shared perf CLI options on a pytest parser.

Data

API

simvx.core.testing.benchmark.__all__

[‘BenchmarkResult’, ‘HistoryStore’, ‘MachineInfo’, ‘PerfRecord’, ‘PerfRecorder’, ‘bench_headless_ren…

simvx.core.testing.benchmark.DEFAULT_TOLERANCE

0.2

class simvx.core.testing.benchmark.BenchmarkResult[source]

One measured benchmark point.

The headline metric is :meth:metric_value. By default it is the average frame time in ms (metric="frame_ms", lower is better). A limit-style bench (e.g. “max sprites at 60 FPS”) sets metric, value and lower_is_better=False so the comparison treats higher as better.

name: str

None

count: int

0

total_ms: float

0.0

frames: int

0

samples_ms: list[float]

‘field(…)’

gpu_ms: float | None

None

backend: str

‘default’

metric: str

‘frame_ms’

value: float | None

None

lower_is_better: bool

True

extra: dict[str, float]

‘field(…)’

property avg_frame_ms: float[source]
property fps: float[source]
property per_object_us: float[source]

Microseconds per object per frame.

property p50_ms: float[source]
property p95_ms: float[source]
property p99_ms: float[source]
property min_ms: float[source]
property max_ms: float[source]
metric_value() float[source]

The headline value the baseline comparison is made against.

report() str[source]
simvx.core.testing.benchmark.bench_scene_runner(root, frames: int = 120, draw: bool = False, *, backend: str = 'default') simvx.core.testing.benchmark.BenchmarkResult[source]

Benchmark a scene with :class:SceneRunner (CPU only, no GPU).

simvx.core.testing.benchmark.bench_headless_render(scene_or_cls, frames: int = 60, width: int = 1280, height: int = 720, *, warmup: int = 0, telemetry_keys: tuple[str, ...] = (), backend: str = 'default', **kwargs) simvx.core.testing.benchmark.BenchmarkResult[source]

Benchmark a scene with full Vulkan rendering (headless).

Lazy-imports simvx.graphics so the core package stays render-agnostic. telemetry_keys names entries from app.last_telemetry to copy into result.extra (e.g. "occlusion_drawn", "transform_high_water").

class simvx.core.testing.benchmark.MachineInfo[source]

Host + interpreter + GPU snapshot stamped onto every record.

host: str

None

os: str

None

cpu: str

None

cpu_count: int

None

ram_gb: float

None

python: str

None

free_threaded: bool

None

gpu: str

‘unknown’

git_commit: str

‘unknown’

classmethod capture() simvx.core.testing.benchmark.MachineInfo[source]
to_dict() dict[str, Any][source]
simvx.core.testing.benchmark.perf_dir() pathlib.Path[source]

Resolve the local perf directory ($SIMVX_PERF_DIR or <repo>/.perf).

simvx.core.testing.benchmark.record_key(suite: str, name: str, backend: str, count: int, metric: str) str[source]

Stable identity for a measured point across runs.

class simvx.core.testing.benchmark.PerfRecord[source]

A flattened, JSON-serialisable benchmark record (one JSONL line).

ts: str

None

suite: str

None

name: str

None

backend: str

None

count: int

None

metric: str

None

value: float

None

lower_is_better: bool

None

avg_frame_ms: float

None

fps: float

None

p95_ms: float

None

p99_ms: float

None

gpu_ms: float | None

None

extra: dict[str, float]

None

machine: dict[str, Any]

None

property key: str[source]
classmethod from_result(suite: str, result: simvx.core.testing.benchmark.BenchmarkResult, machine: simvx.core.testing.benchmark.MachineInfo) simvx.core.testing.benchmark.PerfRecord[source]
to_dict() dict[str, Any][source]
class simvx.core.testing.benchmark.HistoryStore(root: pathlib.Path | None = None, machine: simvx.core.testing.benchmark.MachineInfo | None = None)[source]

Append-only per-machine benchmark history with pinned baselines.

Layout under :func:perf_dir::

history/<host>/<suite>.jsonl   one appended line per measured point
baseline/<host>.json           {record_key: record_dict} pinned baseline

Initialization

append(record: simvx.core.testing.benchmark.PerfRecord) None[source]
baselines() dict[str, dict[str, Any]][source]
baseline(key: str) dict[str, Any] | None[source]
update_baselines(records: dict[str, dict[str, Any]]) None[source]

Merge records (key -> record dict) into the pinned baseline file.

simvx.core.testing.benchmark.compare(current: float, baseline: float, lower_is_better: bool, tol: float = DEFAULT_TOLERANCE) str[source]

Classify current against baseline -> ok / regressed / improved / new.

class simvx.core.testing.benchmark.PerfRecorder(suite: str, *, store: simvx.core.testing.benchmark.HistoryStore | None = None, tol: float = DEFAULT_TOLERANCE, update_baseline: bool = False, report_only: bool = False, emit: collections.abc.Callable[[str], None] = print)[source]

Records benchmark results, compares to the machine baseline, gates on drift.

A perf test takes the perf_recorder fixture and calls

Meth:

record once per measured point. On fixture teardown

Meth:

finish runs: it pins the baseline when --update-perf-baseline was given, and otherwise raises if any metric regressed beyond tolerance (unless --perf-report-only). The first run on a machine has no baseline, so points are seeded and pass.

Initialization

record(result: simvx.core.testing.benchmark.BenchmarkResult, *, tol: float | None = None) str[source]
finish() None[source]
simvx.core.testing.benchmark.make_perf_recorder(suite: str, *, update_baseline: bool = False, report_only: bool = False, tol: float = DEFAULT_TOLERANCE) simvx.core.testing.benchmark.PerfRecorder[source]

Construct a :class:PerfRecorder (used by the perf_recorder fixture).

simvx.core.testing.benchmark.perf_pytest_addoption(parser) None[source]

Register the shared perf CLI options on a pytest parser.

Conftests call this so every package exposes the same flags. Guarded so a second call (e.g. multiple conftests under one root) does not error.