Source code for simvx.graphics.render2d.submit

"""Desktop GPU submit for a :class:`PublishedItemView` (design §2.6, P1.6).

The submit path of the build-once 2D pipeline: it consumes a frozen,
render-thread-readable
:class:`~simvx.graphics.render2d.publish.PublishedItemView` and draws it through
the 2D Vulkan pipelines. :class:`ItemSubmitter` reuses the ``Draw2DPass``
adjacent coalescer (for SubViewport targets); :class:`BindlessItemSubmitter` is
the co-batched main-framebuffer path (P3b).

What it does (design §2.6 / §3 Decision D / §10 P1 row)
-------------------------------------------------------
1. **Order.** Read the published draw order (``view.order``: physical row indices
   in back-to-front ``(layer, seq)`` order) -- the global sort already ran on the
   game thread; nothing re-sorts here.
2. **Resolve + transform.** For each item in order, resolve its captured local
   geometry (verts/indices) and emit an op tuple. The captured geometry is
   **world-space, camera-free** (the op-adapter bridge runs each node's
   ``on_draw`` with an identity ``Draw2D`` transform, and nodes bake their own
   world position into the coordinates they pass -- exactly as the legacy
   ``_draw_self`` does, which never pushes the node transform either). So the
   per-item ``transform`` column is redundant for the bridge geometry and is NOT
   re-applied here (re-applying it would double the world transform). The
   **camera** affine -- the only thing the legacy ``_xf`` ever carries during the
   tree walk (``scene_tree.render`` pushes ``(zoom,0,0,zoom, pan_x, pan_y)``) --
   is applied uniformly to every world-content item's verts, exactly mirroring
   the legacy bake. P3a's native per-node emission makes the verts truly local
   and lets the transform column drive the GPU instead; until then this is the
   render-target-agnostic, behaviour-preserving submit.
3. **Adjacent batch + draw.** Hand the ordered ops to the existing
   :meth:`Draw2DPass.render` (design §3 Decision D: "adjacent batcher first;
   bindless is P3b"). That reuses the legacy coalescer (consecutive items sharing
   ``(pipeline, clip, blend, texture)`` collapse into one GPU draw), the existing
   per-blend FILL/TEX pipelines, the LINE/TEXT pipelines, and the host-visible
   vertex/index buffers with their fence discipline -- so a sprite/shape scene is
   byte-comparable with the legacy path and the draw-call count matches.

Version-keyed upload (the clean-frame fast path, design §4)
----------------------------------------------------------
A :class:`PublishedItemView` carries a monotonic ``version`` that the publisher
only bumps on a dirty frame (a clean frame republishes the SAME view object).
:class:`ItemSubmitter` caches the ops it built last frame keyed by
``(version, camera, screen)``; if the next frame's view has the same version and
the same camera/screen, the cached ops are reused verbatim -- zero rebuild, zero
re-resolve -- and only the unchanged GPU buffers are re-uploaded by the reused
``Draw2DPass`` machinery (which itself skips work when the op list is identical
in shape). This is the §4 "clean frame uploads nothing" contract realised at the
submit boundary: a static scene with a still camera does no per-item CPU work.

Camera is **not** baked into the published verts (those stay camera-free, so a
later camera pan rebuilds one affine, not N item rows -- Decision B); it is
applied here at submit, in the per-frame mechanism the legacy path uses.

Text (GLYPH) renders natively as of P3a: the op-adapter bridge runs the one 2D
text layout (kerned MSDF quads) for ``draw_text``, so GLYPH items carry real
indexed glyph geometry and draw through the existing (non-bindless) TEXT pipeline
-- the placeholder is gone. A ``SCREEN_SPACE``-flagged GLYPH item (a screen-
pinned Text2D) skips the camera affine. Bindless co-batching of glyph runs with
sprites is P3b.
"""

from __future__ import annotations

from typing import TYPE_CHECKING, Any

import numpy as np

from ..draw2d_ops import Op, OpKind
from ..draw2d_vertex import UI2D_VERTEX_DTYPE
from .item_list import BlendMode, ItemFlags, PipelineKind

if TYPE_CHECKING:
    from .publish import PublishedItemView

__all__ = [
    "ItemSubmitter",
    "BindlessItemSubmitter",
    "CameraAffine",
    "build_item_ops",
    "build_bindless_geometry",
    "item_in_hdr_lane",
]

# A 2D affine matching draw2d's ``_xf``: x' = a*x + b*y + tx, y' = c*x + d*y + ty.
CameraAffine = tuple[float, float, float, float, float, float]
_IDENTITY_CAMERA: CameraAffine = (1.0, 0.0, 0.0, 1.0, 0.0, 0.0)

# PipelineKind -> legacy OpKind (the integer codes already agree; this is the
# rename TEXTURED->TEX / GLYPH->TEXT made explicit for the op tuple).
_PIPELINE_TO_OPKIND = {
    int(PipelineKind.FILL): OpKind.FILL,
    int(PipelineKind.LINE): OpKind.LINE,
    int(PipelineKind.GLYPH): OpKind.TEXT,
    int(PipelineKind.TEXTURED): OpKind.TEX,
}

_MODE_TO_BLEND = {
    int(BlendMode.ALPHA): "alpha",
    int(BlendMode.ADD): "add",
    int(BlendMode.MULTIPLY): "multiply",
}


def _apply_affine(verts: list[tuple], cam: CameraAffine) -> list[tuple]:
    """Apply the camera affine to an op's verts (pos only; uv + colour pass through).

    Mirrors the legacy ``Draw2D._xf_pt`` bake: each vertex is an 8-float tuple
    ``(x, y, u, v, r, g, b, a)``; only ``(x, y)`` are transformed.
    """
    a, b, c, d, tx, ty = cam
    out = []
    for v in verts:
        x, y = v[0], v[1]
        out.append((a * x + b * y + tx, c * x + d * y + ty, *v[2:]))
    return out


[docs] def build_item_ops( view: PublishedItemView, *, camera: CameraAffine = _IDENTITY_CAMERA, ) -> list[Op]: """Build the ordered legacy-``Op`` list for a published view (design §2.6). Walks ``view.order`` (the published back-to-front draw order), resolves each item's captured local geometry, applies the camera affine to the vertex positions, and emits one :class:`Op` per item with its scissor (read straight off the published clip-scope table), blend mode, and texture slot. The resulting list is exactly the shape the legacy ``Draw2DPass`` adjacent coalescer consumes, so the same draws result. GLYPH (text) items now carry real kerned MSDF glyph geometry (P3a native emission) and render through the existing TEXT pipeline (the bindless co-batch is P3b). A ``SCREEN_SPACE``-flagged item (a screen-pinned Text2D) is exempt from the camera affine, mirroring the deleted overlay's camera-free text. """ if view.count == 0: return [] cols = view.columns pipeline = cols["pipeline"] clip_scope = cols["clip_scope"] blend = cols["blend"] texture = cols["texture"] flags = cols["flags"] geometry = view.geometry clips = view.clips has_camera = camera != _IDENTITY_CAMERA ops: list[Op] = [] for row in view.order: i = int(row) kind = _PIPELINE_TO_OPKIND[int(pipeline[i])] geom = geometry[int(cols["geometry"][i])] verts = geom.verts if not verts: continue screen_space = bool(int(flags[i]) & int(ItemFlags.SCREEN_SPACE)) if has_camera and not screen_space: verts = _apply_affine(verts, camera) scissor = clips.scissor(int(clip_scope[i])) ops.append( Op( kind, scissor, verts, geom.indices, int(texture[i]), _MODE_TO_BLEND.get(int(blend[i]), "alpha"), ) ) return ops
[docs] class ItemSubmitter: """Render-thread-owned submit of a :class:`PublishedItemView` (design §2.6, §4). Holds the version-keyed op cache and delegates the GPU work to a :class:`~simvx.graphics.renderer.draw2d_pass.Draw2DPass` (its pipelines, buffers, and adjacent coalescer). One submitter per draw target (the main framebuffer; a SubViewport gets its own, mirroring how SRUs snapshot a per-target view). The submitter is the seam where the published, immutable item columns meet the existing 2D GPU machinery: it never touches the live game-thread store (only the frozen view), and it caches the built ops by the published ``version`` (plus the camera + screen the ops were built under) so a clean frame -- same view object, same camera -- does zero per-item CPU work (the §4 "clean frame uploads nothing" fast path at the submit boundary). """ __slots__ = ("_draw2d_pass", "_cache_key", "_cached_ops", "_build_count", "_reuse_count") def __init__(self, draw2d_pass: Any) -> None: self._draw2d_pass = draw2d_pass self._cache_key: tuple | None = None self._cached_ops: list[Op] | None = None self._build_count = 0 self._reuse_count = 0 def _ops_for(self, view: PublishedItemView, camera: CameraAffine) -> list[Op]: # Version-keyed reuse: a clean frame republishes the same view object # (same version), so identical (version, camera) reuses the built ops. key = (id(view), view.version, camera) if self._cache_key == key and self._cached_ops is not None: self._reuse_count += 1 return self._cached_ops ops = build_item_ops(view, camera=camera) self._cache_key = key self._cached_ops = ops self._build_count += 1 return ops
[docs] def render( self, cmd: Any, view: PublishedItemView, width: int, height: int, *, ui_width: int = 0, ui_height: int = 0, camera: CameraAffine = _IDENTITY_CAMERA, ) -> None: """Submit ``view`` to ``cmd`` through the 2D pipelines (design §2.6). Builds (or reuses) the ordered op list for the view + camera, then hands it to :meth:`Draw2DPass.render`, which coalesces adjacent ops and records the draws with its existing per-blend pipelines, host-visible buffers, and scissor discipline. ``width``/``height`` are the framebuffer extent; ``ui_width``/``ui_height`` the UI coordinate space (HiDPI), exactly as the legacy pass takes them. """ ops = self._ops_for(view, camera) self._draw2d_pass.render(cmd, width, height, ui_width, ui_height, ops=ops)
[docs] @property def last_frame_draw_count(self) -> int: """Draws issued by the underlying pass last frame (post-coalesce).""" return int(self._draw2d_pass.last_frame_draw_count)
[docs] @property def build_count(self) -> int: """Frames that rebuilt the op list (dirty/changed view or camera).""" return self._build_count
[docs] @property def reuse_count(self) -> int: """Frames that reused the cached op list (clean: same version + camera).""" return self._reuse_count
[docs] def camera_affine_from_tree(tree: Any) -> CameraAffine: """Return the active Camera2D's submit affine for ``tree`` (design §2.3 / §7.3). Delegates to the one Camera2D mapping, :meth:`Camera2D.canvas_transform`, which ``SceneTree.render`` bakes and ``world_to_screen`` inverts -- so the item-pipeline view, the legacy bake, and hit-testing are provably the same matrix. With no active Camera2D, identity (the legacy walk pushes nothing). """ cam = getattr(tree, "_current_camera_2d", None) if cam is None: return _IDENTITY_CAMERA return cam.canvas_transform(tree._screen_size)
# --------------------------------------------------------------------------- # Bindless co-batcher (design §3 Decision D, P3b) # # This is the PERF increment: instead of emitting one legacy Op per item and # letting the adjacent coalescer break a run on every texture/pipeline change, # the co-batcher groups consecutive items in published draw order that share only # (topology, clip_scope, blend) into ONE draw -- ACROSS different textures and # across sprite + glyph + fill -- because texture_id and is_msdf now travel PER # VERTEX (the ABI change). Glyph (IS_MSDF) items reference the MSDF atlas's # bindless slot; sprites their own slot; fills tex_id = -1. Painter order is # preserved: items stay in (layer, seq) order and a run only ever MERGES adjacent # compatible items -- it never reorders, so overlapping translucent items keep # their relative order. # --------------------------------------------------------------------------- _FLAG_IS_MSDF = 1 # matches ui2d.frag FLAG_IS_MSDF / BindlessDraw2DPass.FLAG_IS_MSDF # N1 (2D-in-HDR) submit lanes. ``"all"`` is the legacy single-pass behaviour (every # item, used when post-processing is off so everything composites to the swapchain). # With post on the view is submitted twice: ``"hdr"`` draws the world-space lane into # the HDR target before tonemap (so it gets exposure/tonemap/bloom), ``"ldr"`` draws # the screen-space lane onto the swapchain after the tonemap blit (authored LDR). _LANE_ALL = "all" _LANE_HDR = "hdr" _LANE_LDR = "ldr"
[docs] def item_in_hdr_lane(item_flags: int) -> bool: """Return whether an item belongs to the HDR (world) lane (N1, design §13 N1). By role: a world-space item (``SCREEN_SPACE`` clear) is HDR-eligible; a screen-space item (HUD/UI) is not. The per-node ``hdr`` override forces it either way: ``HDR_OPT_IN`` -> always HDR, ``HDR_OPT_OUT`` -> always LDR. The override wins over the screen-space role; opt-in wins over opt-out (a node cannot meaningfully request both). """ if item_flags & int(ItemFlags.HDR_OPT_IN): return True if item_flags & int(ItemFlags.HDR_OPT_OUT): return False return not (item_flags & int(ItemFlags.SCREEN_SPACE))
[docs] def build_bindless_geometry( view: PublishedItemView, *, camera: CameraAffine = _IDENTITY_CAMERA, atlas_slot: int = -1, lane: str = _LANE_ALL, only_band: int | None = None, exclude_bands: frozenset[int] | None = None, min_band: int | None = None, max_band: int | None = None, ) -> tuple[np.ndarray, np.ndarray, np.ndarray, list]: """Build the co-batched geometry + batch list for a published view (design §3 D). Returns ``(tri_verts, tri_indices, line_verts, batches)`` where ``tri_verts`` /``line_verts`` are :data:`UI2D_VERTEX_DTYPE` arrays (40-byte vertices with per-vertex ``tex_id`` + ``flags``, camera already applied), ``tri_indices`` a ``uint32`` index stream, and ``batches`` a list of :class:`~simvx.graphics.renderer.bindless_draw2d_pass.BindlessBatch` runs in draw order. Grouping rule (the co-batch): consecutive items sharing ``(topology, clip_scope, blend)`` merge into one batch even when their textures differ and even when sprite and glyph alternate. A LINE-topology item breaks the run (different pipeline) and emits a line batch. ``atlas_slot`` is the MSDF atlas's bindless slot; an IS_MSDF item's per-vertex ``tex_id`` is set to it (its captured geometry already carries atlas UVs). Band filter (per-CanvasLayer post, design §5.6): ``only_band`` keeps ONLY items whose ``layer`` column equals it (a single post-processed CanvasLayer band); ``exclude_bands`` drops items in any of those bands (the global swapchain path skips post-processed bands, which are composited by their own chain); ``min_band``/``max_band`` (inclusive) keep only items in a band interval (the global path draws each plain-band SEGMENT between two post bands so the composites interleave at the right z-slot). Like the lane filter, banding only ever SHORTENS a co-batch run -- it never reorders -- so painter order within the kept set is intact. All default ``None`` (no band filter) so the global path is byte-identical when the feature is unused. """ from ..renderer.bindless_draw2d_pass import BindlessBatch if view.count == 0: return ( np.empty(0, dtype=UI2D_VERTEX_DTYPE), np.empty(0, dtype=np.uint32), np.empty(0, dtype=UI2D_VERTEX_DTYPE), [], ) cols = view.columns pipeline = cols["pipeline"] clip_scope = cols["clip_scope"] blend = cols["blend"] texture = cols["texture"] flags = cols["flags"] layer = cols["layer"] geometry = view.geometry clips = view.clips a, b, c, d, tx, ty = camera has_camera = camera != _IDENTITY_CAMERA # Triangle stream (indexed) + line stream (non-indexed), with per-batch runs. tri_v: list = [] # list of per-item vertex arrays (structured) tri_i: list = [] # list of per-item index arrays (uint32, global) line_v: list = [] batches: list = [] tri_vert_cursor = 0 # next free vertex row in the triangle buffer tri_idx_cursor = 0 # next free index in the index buffer line_vert_cursor = 0 sentinel = object() run_line = False run_clip: Any = sentinel run_blend = -1 run_vert_start = 0 # base vertex of the current run (vertexOffset) run_idx_start = 0 run_count = 0 # idx count (tri) or vert count (line) def _flush() -> None: nonlocal run_count if run_count == 0: return scissor = clips.scissor(int(run_clip)) if run_clip is not None else None batches.append( BindlessBatch( clip=scissor, blend=_MODE_TO_BLEND.get(int(run_blend), "alpha"), vert_offset=run_vert_start, idx_offset=run_idx_start, count=run_count, line=run_line, ) ) run_count = 0 for row in view.order: i = int(row) geom = geometry[int(cols["geometry"][i])] verts = geom.verts if not verts: continue kind = int(pipeline[i]) is_line = kind == int(PipelineKind.LINE) item_clip = int(clip_scope[i]) item_blend = int(blend[i]) item_flags = int(flags[i]) # N1 lane filter: skip items not in the requested lane. Filtering preserves # draw order within the lane (items stay in published (layer, seq) order); # it can only shorten a co-batch run when the two lanes interleave, never # reorder, so painter order is intact. if lane != _LANE_ALL and item_in_hdr_lane(item_flags) != (lane == _LANE_HDR): continue # Per-band post filter (design §5.6): the post-band chain keeps only its own # band; the global swapchain path drops every post-processed band. Same # order-preserving "shorten a run, never reorder" property as the lane filter. if only_band is not None and int(layer[i]) != only_band: continue if exclude_bands is not None and int(layer[i]) in exclude_bands: continue if min_band is not None and int(layer[i]) < min_band: continue if max_band is not None and int(layer[i]) > max_band: continue is_msdf = bool(item_flags & int(ItemFlags.IS_MSDF)) screen_space = bool(item_flags & int(ItemFlags.SCREEN_SPACE)) # Per-vertex tex_id + flags (the ABI): MSDF glyphs sample the atlas slot, # textured sprites their own slot, fills/lines -1. if is_msdf: tex_id = atlas_slot vflags = _FLAG_IS_MSDF else: tex_id = int(texture[i]) vflags = 0 # Pack the item's verts into the structured UI2D vertex (camera applied # to position only, unless screen-space). n = len(verts) arr = np.asarray(verts, dtype=np.float32).reshape(n, 8) struct = np.empty(n, dtype=UI2D_VERTEX_DTYPE) px = arr[:, 0] py = arr[:, 1] if has_camera and not screen_space: struct["position"][:, 0] = a * px + b * py + tx struct["position"][:, 1] = c * px + d * py + ty else: struct["position"] = arr[:, :2] struct["uv"] = arr[:, 2:4] struct["colour"] = arr[:, 4:8] struct["tex_id"] = tex_id struct["flags"] = vflags # Run compatibility: same topology + clip + blend. Lines (different # pipeline) never merge with triangles; a glyph and a sprite DO merge. same_run = ( run_count != 0 and is_line == run_line and item_clip == run_clip and (is_line or item_blend == run_blend) ) if not same_run: _flush() run_line = is_line run_clip = item_clip run_blend = item_blend if is_line: run_vert_start = line_vert_cursor else: run_vert_start = tri_vert_cursor run_idx_start = tri_idx_cursor if is_line: line_v.append(struct) line_vert_cursor += n run_count += n else: # Offset the op-local indices by this item's base within the run so # the concatenated index stream is valid relative to run_vert_start # (which becomes vkCmdDrawIndexed's vertexOffset). idx = np.asarray(geom.indices, dtype=np.uint32) local_base = tri_vert_cursor - run_vert_start tri_i.append(idx + local_base) tri_v.append(struct) tri_vert_cursor += n tri_idx_cursor += len(idx) run_count += len(idx) _flush() tri_verts = np.concatenate(tri_v) if tri_v else np.empty(0, dtype=UI2D_VERTEX_DTYPE) tri_indices = np.concatenate(tri_i) if tri_i else np.empty(0, dtype=np.uint32) line_verts = np.concatenate(line_v) if line_v else np.empty(0, dtype=UI2D_VERTEX_DTYPE) return tri_verts, tri_indices, line_verts, batches
[docs] class BindlessItemSubmitter: """Render-thread-owned bindless co-batched submit of a published view (design §3 D). The P3b counterpart of :class:`ItemSubmitter`: it builds the co-batched geometry (one draw per ``(topology, clip, blend)`` run, across textures and sprite/glyph) and hands it to a :class:`~simvx.graphics.renderer.bindless_draw2d_pass.BindlessDraw2DPass`. The MSDF atlas is registered into the bindless array by the pass; the submitter asks the pass for the slot each frame (it is stable, refreshed only when the atlas re-uploads). Version-keyed reuse: a clean frame republishes the same view object (same version + camera + atlas slot) so the built geometry is reused verbatim (zero rebuild) -- the §4 "clean frame uploads nothing" fast path at the bindless submit boundary. """ __slots__ = ("_pass", "_cache_key", "_cached", "_build_count", "_reuse_count") def __init__(self, bindless_pass: Any) -> None: self._pass = bindless_pass self._cache_key: tuple | None = None self._cached: tuple | None = None self._build_count = 0 self._reuse_count = 0 def _geometry_for( self, view: PublishedItemView, camera: CameraAffine, atlas_slot: int, lane: str, only_band: int | None, exclude_bands: frozenset[int] | None, min_band: int | None, max_band: int | None, ): key = (id(view), view.version, camera, atlas_slot, lane, only_band, exclude_bands, min_band, max_band) if self._cache_key == key and self._cached is not None: self._reuse_count += 1 return self._cached built = build_bindless_geometry( view, camera=camera, atlas_slot=atlas_slot, lane=lane, only_band=only_band, exclude_bands=exclude_bands, min_band=min_band, max_band=max_band, ) self._cache_key = key self._cached = built self._build_count += 1 return built
[docs] def render( self, cmd: Any, view: PublishedItemView, width: int, height: int, *, ui_width: int = 0, ui_height: int = 0, camera: CameraAffine = _IDENTITY_CAMERA, lane: str = _LANE_ALL, only_band: int | None = None, exclude_bands: frozenset[int] | None = None, min_band: int | None = None, max_band: int | None = None, ) -> None: """Submit ``view`` to ``cmd`` through the bindless co-batched pipeline. The MSDF atlas bindless slot is registered OUTSIDE the render pass (the renderer calls :meth:`BindlessDraw2DPass.sync_atlas_slot` in ``pre_render``, because ``register_texture`` must not run during command recording). Here we only READ the already-synced slot. ``lane`` (N1) selects which items to draw: ``"all"`` (post off, everything to the swapchain), ``"hdr"`` (world lane into the HDR target before tonemap), or ``"ldr"`` (screen lane onto the swapchain after tonemap). One submitter instance is bound to one render pass, so the HDR and LDR lanes use distinct submitters; the cache key carries ``lane`` so each reuses its own geometry. ``only_band`` / ``exclude_bands`` / ``min_band`` / ``max_band`` (per-CanvasLayer post) filter by the ``layer`` band: the post chain submits ``only_band``; the global path passes ``exclude_bands`` (or a ``min_band``/``max_band`` segment) to interleave with the composites. All default ``None``. """ atlas_slot = self._pass.atlas_slot tri_verts, tri_indices, line_verts, batches = self._geometry_for( view, camera, atlas_slot, lane, only_band, exclude_bands, min_band, max_band, ) self._pass.render( cmd, width, height, ui_width, ui_height, verts=tri_verts, indices=tri_indices, line_verts=line_verts, batches=batches, )
[docs] @property def last_frame_draw_count(self) -> int: return int(self._pass.last_frame_draw_count)
[docs] @property def build_count(self) -> int: return self._build_count
[docs] @property def reuse_count(self) -> int: return self._reuse_count