Source code for simvx.graphics.renderer.render_packet

"""CPU-side per-frame render snapshot for the pipelined render thread.

A :class:`RenderPacket` is an owned, immutable-by-convention snapshot of the
renderer's per-frame submission state, taken on the MAIN thread right after
``SceneAdapter.submit_scene`` populates that state. In pipelined render mode the
packet is handed to the render thread, which reconstructs the renderer's
per-frame lists from it and then records + submits the GPU frame, while the main
thread is free to simulate the next frame and rebuild the renderer's (now empty)
lists without mutating the in-flight packet.

This is the FOUNDATION wave: the packet, its :func:`extract_render_packet`
builder, and the :class:`RenderPacketRing` double-buffer are defined and unit
tested, but no render thread is spawned and the synchronous frame loop is
unchanged. The default (synchronous) path never constructs a packet, so it stays
byte-identical.

Field provenance
----------------
Every field mirrors a per-frame attribute the GPU frame reads, each cleared in
``Renderer.begin_frame`` (forward.py ~605-616) and rebuilt by the scene adapter:

``instances``
    ``Renderer._instances`` (forward.py ~64): list of
    ``(MeshHandle, transform, material_id, viewport_id)``. Consumed by
    ``_upload_transforms``, the shadow pass, occlusion cull, and the forward
    draw. The 4x4 ``transform`` arrays are copied so the main thread may rebuild
    next frame without touching the in-flight packet.
``skinned_instances``
    ``Renderer._skinned_instances`` (forward.py ~176):
    ``(MeshHandle, transform, material_id, joint_matrices)``. Both numpy arrays
    per entry are copied.
``shader_material_submissions``
    ``Renderer._shader_material_submissions`` (forward.py ~120):
    ``(MeshHandle, transform, material_id, shader_material)``. The transform is
    copied; ``shader_material`` is a shared, frame-stable object referenced.
``particle_submissions``
    ``Renderer._particle_submissions`` (forward.py ~109): ``(data, count)``.
    ``data`` is copied.
``gpu_particle_submissions``
    ``Renderer._gpu_particle_submissions`` (forward.py ~113):
    ``(emitter_id, emitter_config)``. ``emitter_config`` is a per-frame dict;
    shallow-copied so a downstream mutation of the dict object is isolated.
``materials``
    ``Renderer._materials`` (forward.py ~105, set via ``set_materials``). Copied.
``lights``
    ``Renderer._lights`` (forward.py ~106, set via ``set_lights``). Copied.
``viewports``
    Snapshot of ``Renderer.viewport_manager.viewports`` (scene_adapter ~785).
    Each entry is a :class:`ViewportSnapshot` carrying the viewport id plus an
    owned copy of the camera view/proj matrices, rect, and the (shared, not
    copied) render-target handle.
``structure_version``
    ``tree._structure_version`` at extract time. The velocity and occlusion
    passes guard prev<->cur pairing on this; the render thread must see the
    value that matched the snapshotted instance ordering, not a later one.
``draw2d_ops``
    Snapshot of ``Draw2D._ops`` (draw2d.py ~170): the ordered immediate-mode 2D
    op list a HUD/overlay scene rebuilds every frame on the MAIN thread
    (``Draw2D._reset(); tree.render(Draw2D)``). ``Op`` is an immutable
    ``NamedTuple`` whose ``verts``/``indices`` lists Draw2D builds fresh per emit
    and never mutates after appending, so a shallow ``list(...)`` copy is a
    faithful owned snapshot. ``install_packet`` binds it as the op source the
    render thread's ``Draw2DPass`` reads, so the render thread never touches the
    live global the main thread is concurrently clearing + rebuilding.
``frame_index``
    Monotonic producer frame counter, for ordering / telemetry / debugging.

Subsystem submission buffers (packetised this wave)
---------------------------------------------------
These mirror per-frame submission lists that live inside lazily-created
subsystem passes. Each is rebuilt every frame on the MAIN thread (during
``adapter.submit_scene``) and read during recording, so the render thread must
read an OWNED copy rather than the live list the main thread is clearing.

``tilemap_layers``
    Owned snapshot of ``Renderer._tilemap_pass._submissions``
    (tilemap_pass.py ~42, populated by ``submit_layer`` from
    ``SceneAdapter._submit_tilemaps``). Each entry is
    ``(tile_data: ndarray, tileset_texture_id: int, tile_size: tuple)``; the
    ``tile_data`` structured array is copied (the main thread reuses TileMap
    layer buffers across frames).
``light2d_lights`` / ``light2d_occluders``
    Owned snapshots of ``Renderer._light2d_pass._lights`` (list of per-light
    dicts, light2d_pass.py ~65) and ``._occluders`` (list of polygon
    vertex-lists, ~66). Each dict / polygon is shallow-copied so a downstream
    mutation of the per-frame entry is isolated from the packet.
``text_vertices`` / ``text_indices``
    Owned copies of ``Renderer._text_renderer.vertices`` / ``.indices``
    (text_renderer.py ~316-326): the CPU MSDF geometry the shared TextRenderer
    builds each frame from ``draw_text`` calls and that
    ``OverlayRenderer.render_text`` uploads + draws via ``TextPass``. The arrays
    are copied because the TextRenderer reuses its vertex/index buffers next
    frame (``begin_frame`` resets ``_char_count`` and overwrites in place).

Scene-render-unit (SRU) plans
-----------------------------
``subviewport_srus``
    Ordered list of :class:`SubViewportSRU` plans, one per live SubViewport,
    captured on the MAIN thread by :meth:`SubViewportManager.build_srus` (which
    reuses the P1 topological ``order_subviewports`` so producers precede
    consumers). Each plan owns the SubViewport's submitted opaque/skinned
    instances (transform copies), its camera view/proj matrices, its isolated
    Draw2D op list, target identity, and the update-mode decision, so the render
    thread can record the offscreen SRU WITHOUT walking the live tree. Empty when
    no SubViewport is present or in the synchronous path (which still walks the
    tree live, byte-identically).

State intentionally NOT snapshotted (and why)
---------------------------------------------
- ``_main_base`` / arena slice info: reservation is a GPU-SSBO concern owned
  entirely by the render thread (``reserve_main_slice`` + ``_upload_transforms``
  run there from the packet's ``instances``). The main thread never reserves.
- Per-frame HDR / post flags (``_hdr_rendered``): recomputed by the render
  thread inside ``pre_render``/``render`` from the snapshotted lists; they are
  outputs of recording, not inputs to it.
- Reflection-probe capture (``ReflectionProbePass.update_probes``): DEFERRED.
  The probe path is deeply GPU-stateful (per-probe source cubemaps, six face
  renders, an IBL compute convolution with record-time descriptor mutation, and
  cube-array copies with many intra-cmd barriers). Packetising it faithfully
  would require snapshotting six face instance-lists per probe plus replaying the
  convolution/copy state machine on the render thread, which is out of scope for
  this wave. Probe-bearing scenes keep the safe skip + one-time warning in
  pipelined mode (app.py ``_warn_pipelined_unpacketised`` / ``pre_render_fn``).

Note: ``Draw2D._ops`` (immediate-mode 2D HUD/overlay) was previously in the
"not snapshotted" set and raced; it is now ``draw2d_ops`` (installed via
``install_packet``). The subsystem buffers (tilemap / 2D-light / 3D-overlay
text) and SubViewport SRUs documented above are now packet-owned AND consumed by
the render thread: ``Renderer.install_packet`` binds them onto the per-frame
override attributes the pass read-sites consult, and the pipelined ``pre_render``
replays the SubViewport SRUs from the plan via
``SceneAdapter.render_sru_from_plan`` (no live-tree walk). Only reflection-probe
capture remains deferred in pipelined mode (see above). The synchronous path
installs no packet, so every read-site falls back to the live state and stays
byte-identical.
"""

from __future__ import annotations

import threading
from dataclasses import dataclass, field
from typing import TYPE_CHECKING, Any

import numpy as np

if TYPE_CHECKING:
    from simvx.core import SceneTree

    from ..types import MeshHandle
    from .forward import Renderer

__all__ = [
    "RenderPacket",
    "RenderPacketRing",
    "SubViewportSRU",
    "ViewportSnapshot",
    "extract_render_packet",
]


[docs] @dataclass(frozen=True, slots=True) class ViewportSnapshot: """Owned snapshot of one viewport's camera + rect for a single frame. Matrices are copied so the render thread reads a stable view/proj even after the main thread rebuilds the live ``Viewport`` next frame (TAA jitter and the motion-blur matrix update both mutate ``camera_proj`` in place during recording, which must happen on the render thread's owned copy). """ vp_id: int x: int y: int width: int height: int camera_view: np.ndarray camera_proj: np.ndarray render_target: Any | None
[docs] @dataclass(frozen=True, slots=True) class SubViewportSRU: """Owned plan to record one SubViewport offscreen without walking the tree. Captured on the MAIN thread by :meth:`SubViewportManager.build_srus` from a SubViewport's already-submitted offscreen scene. The render thread (next wave) replays it: reserve a transform-SSBO slice for ``instances`` + ``skinned_instances``, write those transforms, record the draws into the SubViewport's ``renderer`` target with ``camera_view`` / ``camera_proj``, then overlay ``draw2d_ops``. The producer-before-consumer ordering is encoded by this list's position in :attr:`RenderPacket.subviewport_srus` (P1 topo sort), so a consumer SRU appears after the producer it samples. Fields ------ sru_id Stable identity (``id(node)``) keying the frustum-visibility cache so SRUs never collide. Mirrors ``render_to_target(sru_id=...)``. renderer The SubViewport's :class:`SubViewportRenderer` (shared GPU target, not copied: it owns the offscreen image whose bindless slot the main scene samples). The render thread records into it; the main thread does not mutate it concurrently (create/resize happen during extract, before the plan is built). width / height The SRU's offscreen extent at capture time (drives viewport + the SSBO transform write). clear_colour Per-frame clear colour (``transparent_bg`` decides RGBA), copied. camera_view / camera_proj Owned copies of the SRU camera's view + projection matrices (``None`` for a 2D-only SubViewport, which uses the screen-size path). screen_size ``(width, height)`` float screen size override the 2D path needs. instances / skinned_instances Owned snapshots of the offscreen scene's submitted opaque / skinned instances (same shape as :attr:`RenderPacket.instances` / ``skinned_instances``; transform + joint arrays copied). draw2d_ops Owned copy of the SubViewport subtree's isolated Draw2D op list (the 2D overlay drawn on top of its 3D content via ``render_draw2d``). """ sru_id: int renderer: Any width: int height: int clear_colour: tuple[float, float, float, float] camera_view: np.ndarray | None camera_proj: np.ndarray | None screen_size: tuple[float, float] instances: list[tuple[MeshHandle, np.ndarray, int, int]] skinned_instances: list[tuple[MeshHandle, np.ndarray, int, np.ndarray]] draw2d_ops: list[Any] = field(default_factory=list)
[docs] @dataclass(frozen=True, slots=True) class RenderPacket: """Owned snapshot of the renderer's per-frame submission state. Construct via :func:`extract_render_packet`. All mutable numpy data is copied at extract time, so the main thread may rebuild the renderer's per-frame lists for the next frame while this packet is in flight on the render thread. """ frame_index: int structure_version: int instances: list[tuple[MeshHandle, np.ndarray, int, int]] skinned_instances: list[tuple[MeshHandle, np.ndarray, int, np.ndarray]] shader_material_submissions: list[tuple[MeshHandle, np.ndarray, int, Any]] particle_submissions: list[tuple[np.ndarray, int]] gpu_particle_submissions: list[tuple[int, dict]] materials: np.ndarray lights: np.ndarray viewports: list[ViewportSnapshot] = field(default_factory=list) # Owned snapshot of ``Draw2D._ops`` (immutable NamedTuples, shallow-copied). # The render thread's Draw2DPass reads this instead of the live global. draw2d_ops: list[Any] = field(default_factory=list) # Owned snapshots of the subsystem-pass per-frame submission buffers. See the # module docstring "Subsystem submission buffers" section for each source. tilemap_layers: list[tuple[np.ndarray, int, tuple[float, float]]] = field(default_factory=list) light2d_lights: list[dict] = field(default_factory=list) light2d_occluders: list[list[tuple[float, float]]] = field(default_factory=list) text_vertices: np.ndarray | None = None text_indices: np.ndarray | None = None # Ordered SubViewport SRU plans (producers first). Empty in the synchronous # path and for scenes with no SubViewport. See ``SubViewportSRU``. subviewport_srus: list[SubViewportSRU] = field(default_factory=list)
[docs] def extract_render_packet( renderer: Renderer, tree: SceneTree, *, frame_index: int = 0, sub_viewports: Any = None, ) -> RenderPacket: """Snapshot the renderer's current per-frame state into an owned RenderPacket. Call on the MAIN thread immediately after ``adapter.submit_scene(tree)``, before ``Renderer.begin_frame`` clears the lists for the next frame. Numpy arrays are copied (ownership transferred to the packet) so the producer can safely rebuild the live lists while this packet is consumed by the render thread. Args: renderer: The forward renderer whose per-frame lists to snapshot. tree: The scene tree (read for ``_structure_version`` and SubViewport discovery when ``sub_viewports`` is supplied). frame_index: Monotonic producer frame counter stamped on the packet. sub_viewports: Optional :class:`SubViewportManager`. When supplied the packet captures ordered SubViewport SRU plans (so the render thread can record them without walking the tree). ``None`` (the default and the unit-test path) yields an empty ``subviewport_srus``. """ from ..draw2d import Draw2D # Shallow copy of the immutable-Op list: the main thread clears + rebuilds # Draw2D._ops every frame, so the render thread must read this owned copy. draw2d_ops = list(Draw2D._ops) instances = [(mh, t.copy(), mid, vid) for (mh, t, mid, vid) in renderer._instances] skinned = [(mh, t.copy(), mid, j.copy()) for (mh, t, mid, j) in renderer._skinned_instances] shader_subs = [(mh, t.copy(), mid, sm) for (mh, t, mid, sm) in renderer._shader_material_submissions] particles = [(data.copy(), count) for (data, count) in renderer._particle_submissions] gpu_particles = [(eid, dict(cfg)) for (eid, cfg) in renderer._gpu_particle_submissions] tilemap_layers, light2d_lights, light2d_occluders, text_vertices, text_indices = _extract_subsystems(renderer) subviewport_srus = sub_viewports.build_srus(tree) if sub_viewports is not None else [] viewports = [ ViewportSnapshot( vp_id=vp_id, x=vp.x, y=vp.y, width=vp.width, height=vp.height, camera_view=np.array(vp.camera_view, dtype=np.float32, copy=True), camera_proj=np.array(vp.camera_proj, dtype=np.float32, copy=True), render_target=vp.render_target, ) for vp_id, vp in renderer.viewport_manager.viewports ] return RenderPacket( frame_index=frame_index, structure_version=int(getattr(tree, "_structure_version", -1)), instances=instances, skinned_instances=skinned, shader_material_submissions=shader_subs, particle_submissions=particles, gpu_particle_submissions=gpu_particles, materials=np.array(renderer._materials, copy=True), lights=np.array(renderer._lights, copy=True), viewports=viewports, draw2d_ops=draw2d_ops, tilemap_layers=tilemap_layers, light2d_lights=light2d_lights, light2d_occluders=light2d_occluders, text_vertices=text_vertices, text_indices=text_indices, subviewport_srus=subviewport_srus, )
def _extract_subsystems( renderer: Renderer, ) -> tuple[ list[tuple[np.ndarray, int, tuple[float, float]]], list[dict], list[list[tuple[float, float]]], np.ndarray | None, np.ndarray | None, ]: """Owned snapshots of the tilemap / 2D-light / 3D-overlay-text submission buffers. Each subsystem pass is lazily created, so a getter may return ``None``; absent passes yield empty / ``None`` fields. Arrays are copied because the main thread reuses the underlying buffers next frame (TileMap layer arrays, the TextRenderer's vertex/index buffers); per-light/occluder Python entries are shallow-copied so a downstream mutation is isolated from the packet. """ tilemap_layers: list[tuple[np.ndarray, int, tuple[float, float]]] = [] tm = renderer._tilemap_pass if tm is not None: tilemap_layers = [(data.copy(), tex_id, tuple(size)) for (data, tex_id, size) in tm._submissions] light2d_lights: list[dict] = [] light2d_occluders: list[list[tuple[float, float]]] = [] l2d = renderer._light2d_pass if l2d is not None: light2d_lights = [dict(light) for light in l2d._lights] light2d_occluders = [list(poly) for poly in l2d._occluders] text_vertices: np.ndarray | None = None text_indices: np.ndarray | None = None tr = renderer._text_renderer if tr is not None and tr.has_text: text_vertices = np.array(tr.vertices, copy=True) text_indices = np.array(tr.indices, copy=True) return tilemap_layers, light2d_lights, light2d_occluders, text_vertices, text_indices
[docs] class RenderPacketRing: """Bounded double-buffered handoff between the main and render threads. The ring IS the double-buffer: a fixed ``capacity`` (default 2) of packet slots with backpressure. ``submit`` blocks the producer (main thread) once it is one frame ahead, bounding latency to +1 frame (D6). ``acquire`` blocks the consumer (render thread) until a packet is available. The render thread calls ``release`` after it has finished the GPU frame for a packet, freeing the slot so the producer may run ahead again. Shutdown is cooperative: ``close`` wakes any blocked thread. After close, ``submit`` raises and ``acquire`` drains remaining packets then returns ``None`` so the consumer loop exits cleanly. """ def __init__(self, capacity: int = 2): if capacity < 1: raise ValueError(f"RenderPacketRing capacity must be >= 1, got {capacity}") self._capacity = capacity self._lock = threading.Lock() self._not_empty = threading.Condition(self._lock) self._not_full = threading.Condition(self._lock) self._queue: list[RenderPacket] = [] # In-flight = queued (not yet acquired) + acquired-but-not-released. The # producer blocks when in-flight would exceed capacity, so a 2-slot ring # lets the producer be exactly one frame ahead of the consumer. self._in_flight = 0 self._closed = False
[docs] @property def capacity(self) -> int: return self._capacity
[docs] @property def closed(self) -> bool: with self._lock: return self._closed
[docs] def submit(self, packet: RenderPacket) -> None: """Producer: enqueue a packet, blocking while the ring is full. Raises ``RuntimeError`` if the ring has been closed. """ with self._not_full: while self._in_flight >= self._capacity and not self._closed: self._not_full.wait() if self._closed: raise RuntimeError("submit on closed RenderPacketRing") self._queue.append(packet) self._in_flight += 1 self._not_empty.notify()
[docs] def acquire(self, timeout: float | None = None) -> RenderPacket | None: """Consumer: dequeue the next packet, blocking until one is available. Returns ``None`` if the ring is closed and drained (consumer should exit), or if ``timeout`` elapses with no packet. """ with self._not_empty: while not self._queue and not self._closed: if not self._not_empty.wait(timeout): return None if self._queue: return self._queue.pop(0) # Closed and drained. return None
[docs] def release(self) -> None: """Consumer: signal that the most recently acquired packet's GPU frame is done. Frees one in-flight slot so the producer may run ahead again. """ with self._not_full: if self._in_flight > 0: self._in_flight -= 1 self._not_full.notify()
[docs] def close(self) -> None: """Mark the ring closed and wake all blocked threads for clean shutdown.""" with self._lock: self._closed = True self._not_empty.notify_all() self._not_full.notify_all()
[docs] def pending(self) -> int: """Number of packets queued but not yet acquired (diagnostics/tests).""" with self._lock: return len(self._queue)