"""Desktop GPU submit for a :class:`PublishedItemView` (design §2.6, P1.6).
The submit path of the build-once 2D pipeline: it consumes a frozen,
render-thread-readable
:class:`~simvx.graphics.render2d.publish.PublishedItemView` and draws it through
the 2D Vulkan pipelines. :class:`ItemSubmitter` reuses the ``Draw2DPass``
adjacent coalescer (for SubViewport targets); :class:`BindlessItemSubmitter` is
the co-batched main-framebuffer path (P3b).
What it does (design §2.6 / §3 Decision D / §10 P1 row)
-------------------------------------------------------
1. **Order.** Read the published draw order (``view.order``: physical row indices
in back-to-front ``(layer, seq)`` order) -- the global sort already ran on the
game thread; nothing re-sorts here.
2. **Resolve + transform.** For each item in order, resolve its captured local
geometry (verts/indices) and emit an op tuple. The captured geometry is
**world-space, camera-free** (the op-adapter bridge runs each node's
``on_draw`` with an identity ``Draw2D`` transform, and nodes bake their own
world position into the coordinates they pass -- exactly as the legacy
``_draw_self`` does, which never pushes the node transform either). So the
per-item ``transform`` column is redundant for the bridge geometry and is NOT
re-applied here (re-applying it would double the world transform). The
**camera** affine -- the only thing the legacy ``_xf`` ever carries during the
tree walk (``scene_tree.render`` pushes ``(zoom,0,0,zoom, pan_x, pan_y)``) --
is applied uniformly to every world-content item's verts, exactly mirroring
the legacy bake. P3a's native per-node emission makes the verts truly local
and lets the transform column drive the GPU instead; until then this is the
render-target-agnostic, behaviour-preserving submit.
3. **Adjacent batch + draw.** Hand the ordered ops to the existing
:meth:`Draw2DPass.render` (design §3 Decision D: "adjacent batcher first;
bindless is P3b"). That reuses the legacy coalescer (consecutive items sharing
``(pipeline, clip, blend, texture)`` collapse into one GPU draw), the existing
per-blend FILL/TEX pipelines, the LINE/TEXT pipelines, and the host-visible
vertex/index buffers with their fence discipline -- so a sprite/shape scene is
byte-comparable with the legacy path and the draw-call count matches.
Version-keyed upload (the clean-frame fast path, design §4)
----------------------------------------------------------
A :class:`PublishedItemView` carries a monotonic ``version`` that the publisher
only bumps on a dirty frame (a clean frame republishes the SAME view object).
:class:`ItemSubmitter` caches the ops it built last frame keyed by
``(version, camera, screen)``; if the next frame's view has the same version and
the same camera/screen, the cached ops are reused verbatim -- zero rebuild, zero
re-resolve -- and only the unchanged GPU buffers are re-uploaded by the reused
``Draw2DPass`` machinery (which itself skips work when the op list is identical
in shape). This is the §4 "clean frame uploads nothing" contract realised at the
submit boundary: a static scene with a still camera does no per-item CPU work.
Camera is **not** baked into the published verts (those stay camera-free, so a
later camera pan rebuilds one affine, not N item rows -- Decision B); it is
applied here at submit, in the per-frame mechanism the legacy path uses.
Text (GLYPH) renders natively as of P3a: the op-adapter bridge runs the one 2D
text layout (kerned MSDF quads) for ``draw_text``, so GLYPH items carry real
indexed glyph geometry and draw through the existing (non-bindless) TEXT pipeline
-- the placeholder is gone. A ``SCREEN_SPACE``-flagged GLYPH item (a screen-
pinned Text2D) skips the camera affine. Bindless co-batching of glyph runs with
sprites is P3b.
"""
from __future__ import annotations
from typing import TYPE_CHECKING, Any
import numpy as np
from ..draw2d_ops import Op, OpKind
from ..draw2d_vertex import UI2D_VERTEX_DTYPE
from .item_list import BlendMode, ItemFlags, PipelineKind
if TYPE_CHECKING:
from .publish import PublishedItemView
__all__ = [
"ItemSubmitter",
"BindlessItemSubmitter",
"CameraAffine",
"build_item_ops",
"build_bindless_geometry",
"item_in_hdr_lane",
]
# A 2D affine matching draw2d's ``_xf``: x' = a*x + b*y + tx, y' = c*x + d*y + ty.
CameraAffine = tuple[float, float, float, float, float, float]
_IDENTITY_CAMERA: CameraAffine = (1.0, 0.0, 0.0, 1.0, 0.0, 0.0)
# PipelineKind -> legacy OpKind (the integer codes already agree; this is the
# rename TEXTURED->TEX / GLYPH->TEXT made explicit for the op tuple).
_PIPELINE_TO_OPKIND = {
int(PipelineKind.FILL): OpKind.FILL,
int(PipelineKind.LINE): OpKind.LINE,
int(PipelineKind.GLYPH): OpKind.TEXT,
int(PipelineKind.TEXTURED): OpKind.TEX,
}
_MODE_TO_BLEND = {
int(BlendMode.ALPHA): "alpha",
int(BlendMode.ADD): "add",
int(BlendMode.MULTIPLY): "multiply",
}
def _apply_affine(verts: list[tuple], cam: CameraAffine) -> list[tuple]:
"""Apply the camera affine to an op's verts (pos only; uv + colour pass through).
Mirrors the legacy ``Draw2D._xf_pt`` bake: each vertex is an 8-float tuple
``(x, y, u, v, r, g, b, a)``; only ``(x, y)`` are transformed.
"""
a, b, c, d, tx, ty = cam
out = []
for v in verts:
x, y = v[0], v[1]
out.append((a * x + b * y + tx, c * x + d * y + ty, *v[2:]))
return out
[docs]
def build_item_ops(
view: PublishedItemView,
*,
camera: CameraAffine = _IDENTITY_CAMERA,
) -> list[Op]:
"""Build the ordered legacy-``Op`` list for a published view (design §2.6).
Walks ``view.order`` (the published back-to-front draw order), resolves each
item's captured local geometry, applies the camera affine to the vertex
positions, and emits one :class:`Op` per item with its scissor (read straight
off the published clip-scope table), blend mode, and texture slot. The
resulting list is exactly the shape the legacy ``Draw2DPass`` adjacent
coalescer consumes, so the same draws result.
GLYPH (text) items now carry real kerned MSDF glyph geometry (P3a native
emission) and render through the existing TEXT pipeline (the bindless co-batch
is P3b). A ``SCREEN_SPACE``-flagged item (a screen-pinned Text2D) is exempt
from the camera affine, mirroring the deleted overlay's camera-free text.
"""
if view.count == 0:
return []
cols = view.columns
pipeline = cols["pipeline"]
clip_scope = cols["clip_scope"]
blend = cols["blend"]
texture = cols["texture"]
flags = cols["flags"]
geometry = view.geometry
clips = view.clips
has_camera = camera != _IDENTITY_CAMERA
ops: list[Op] = []
for row in view.order:
i = int(row)
kind = _PIPELINE_TO_OPKIND[int(pipeline[i])]
geom = geometry[int(cols["geometry"][i])]
verts = geom.verts
if not verts:
continue
screen_space = bool(int(flags[i]) & int(ItemFlags.SCREEN_SPACE))
if has_camera and not screen_space:
verts = _apply_affine(verts, camera)
scissor = clips.scissor(int(clip_scope[i]))
ops.append(
Op(
kind,
scissor,
verts,
geom.indices,
int(texture[i]),
_MODE_TO_BLEND.get(int(blend[i]), "alpha"),
)
)
return ops
[docs]
class ItemSubmitter:
"""Render-thread-owned submit of a :class:`PublishedItemView` (design §2.6, §4).
Holds the version-keyed op cache and delegates the GPU work to a
:class:`~simvx.graphics.renderer.draw2d_pass.Draw2DPass` (its pipelines,
buffers, and adjacent coalescer). One submitter per draw target (the main
framebuffer; a SubViewport gets its own, mirroring how SRUs snapshot a
per-target view).
The submitter is the seam where the published, immutable item columns meet
the existing 2D GPU machinery: it never touches the live game-thread store
(only the frozen view), and it caches the built ops by the published
``version`` (plus the camera + screen the ops were built under) so a clean
frame -- same view object, same camera -- does zero per-item CPU work (the §4
"clean frame uploads nothing" fast path at the submit boundary).
"""
__slots__ = ("_draw2d_pass", "_cache_key", "_cached_ops", "_build_count", "_reuse_count")
def __init__(self, draw2d_pass: Any) -> None:
self._draw2d_pass = draw2d_pass
self._cache_key: tuple | None = None
self._cached_ops: list[Op] | None = None
self._build_count = 0
self._reuse_count = 0
def _ops_for(self, view: PublishedItemView, camera: CameraAffine) -> list[Op]:
# Version-keyed reuse: a clean frame republishes the same view object
# (same version), so identical (version, camera) reuses the built ops.
key = (id(view), view.version, camera)
if self._cache_key == key and self._cached_ops is not None:
self._reuse_count += 1
return self._cached_ops
ops = build_item_ops(view, camera=camera)
self._cache_key = key
self._cached_ops = ops
self._build_count += 1
return ops
[docs]
def render(
self,
cmd: Any,
view: PublishedItemView,
width: int,
height: int,
*,
ui_width: int = 0,
ui_height: int = 0,
camera: CameraAffine = _IDENTITY_CAMERA,
) -> None:
"""Submit ``view`` to ``cmd`` through the 2D pipelines (design §2.6).
Builds (or reuses) the ordered op list for the view + camera, then hands
it to :meth:`Draw2DPass.render`, which coalesces adjacent ops and records
the draws with its existing per-blend pipelines, host-visible buffers, and
scissor discipline. ``width``/``height`` are the framebuffer extent;
``ui_width``/``ui_height`` the UI coordinate space (HiDPI), exactly as the
legacy pass takes them.
"""
ops = self._ops_for(view, camera)
self._draw2d_pass.render(cmd, width, height, ui_width, ui_height, ops=ops)
[docs]
@property
def last_frame_draw_count(self) -> int:
"""Draws issued by the underlying pass last frame (post-coalesce)."""
return int(self._draw2d_pass.last_frame_draw_count)
[docs]
@property
def build_count(self) -> int:
"""Frames that rebuilt the op list (dirty/changed view or camera)."""
return self._build_count
[docs]
@property
def reuse_count(self) -> int:
"""Frames that reused the cached op list (clean: same version + camera)."""
return self._reuse_count
[docs]
def camera_affine_from_tree(tree: Any) -> CameraAffine:
"""Return the active Camera2D's submit affine for ``tree`` (design §2.3 / §7.3).
Delegates to the one Camera2D mapping, :meth:`Camera2D.canvas_transform`, which
``SceneTree.render`` bakes and ``world_to_screen`` inverts -- so the item-pipeline
view, the legacy bake, and hit-testing are provably the same matrix. With no
active Camera2D, identity (the legacy walk pushes nothing).
"""
cam = getattr(tree, "_current_camera_2d", None)
if cam is None:
return _IDENTITY_CAMERA
return cam.canvas_transform(tree._screen_size)
# ---------------------------------------------------------------------------
# Bindless co-batcher (design §3 Decision D, P3b)
#
# This is the PERF increment: instead of emitting one legacy Op per item and
# letting the adjacent coalescer break a run on every texture/pipeline change,
# the co-batcher groups consecutive items in published draw order that share only
# (topology, clip_scope, blend) into ONE draw -- ACROSS different textures and
# across sprite + glyph + fill -- because texture_id and is_msdf now travel PER
# VERTEX (the ABI change). Glyph (IS_MSDF) items reference the MSDF atlas's
# bindless slot; sprites their own slot; fills tex_id = -1. Painter order is
# preserved: items stay in (layer, seq) order and a run only ever MERGES adjacent
# compatible items -- it never reorders, so overlapping translucent items keep
# their relative order.
# ---------------------------------------------------------------------------
_FLAG_IS_MSDF = 1 # matches ui2d.frag FLAG_IS_MSDF / BindlessDraw2DPass.FLAG_IS_MSDF
# N1 (2D-in-HDR) submit lanes. ``"all"`` is the legacy single-pass behaviour (every
# item, used when post-processing is off so everything composites to the swapchain).
# With post on the view is submitted twice: ``"hdr"`` draws the world-space lane into
# the HDR target before tonemap (so it gets exposure/tonemap/bloom), ``"ldr"`` draws
# the screen-space lane onto the swapchain after the tonemap blit (authored LDR).
_LANE_ALL = "all"
_LANE_HDR = "hdr"
_LANE_LDR = "ldr"
[docs]
def item_in_hdr_lane(item_flags: int) -> bool:
"""Return whether an item belongs to the HDR (world) lane (N1, design §13 N1).
By role: a world-space item (``SCREEN_SPACE`` clear) is HDR-eligible; a
screen-space item (HUD/UI) is not. The per-node ``hdr`` override forces it
either way: ``HDR_OPT_IN`` -> always HDR, ``HDR_OPT_OUT`` -> always LDR. The
override wins over the screen-space role; opt-in wins over opt-out (a node
cannot meaningfully request both).
"""
if item_flags & int(ItemFlags.HDR_OPT_IN):
return True
if item_flags & int(ItemFlags.HDR_OPT_OUT):
return False
return not (item_flags & int(ItemFlags.SCREEN_SPACE))
[docs]
def build_bindless_geometry(
view: PublishedItemView,
*,
camera: CameraAffine = _IDENTITY_CAMERA,
atlas_slot: int = -1,
lane: str = _LANE_ALL,
only_band: int | None = None,
exclude_bands: frozenset[int] | None = None,
min_band: int | None = None,
max_band: int | None = None,
) -> tuple[np.ndarray, np.ndarray, np.ndarray, list]:
"""Build the co-batched geometry + batch list for a published view (design §3 D).
Returns ``(tri_verts, tri_indices, line_verts, batches)`` where ``tri_verts``
/``line_verts`` are :data:`UI2D_VERTEX_DTYPE` arrays (40-byte vertices with
per-vertex ``tex_id`` + ``flags``, camera already applied), ``tri_indices`` a
``uint32`` index stream, and ``batches`` a list of
:class:`~simvx.graphics.renderer.bindless_draw2d_pass.BindlessBatch` runs in
draw order.
Grouping rule (the co-batch): consecutive items sharing
``(topology, clip_scope, blend)`` merge into one batch even when their
textures differ and even when sprite and glyph alternate. A LINE-topology item
breaks the run (different pipeline) and emits a line batch. ``atlas_slot`` is
the MSDF atlas's bindless slot; an IS_MSDF item's per-vertex ``tex_id`` is set
to it (its captured geometry already carries atlas UVs).
Band filter (per-CanvasLayer post, design §5.6): ``only_band`` keeps ONLY items
whose ``layer`` column equals it (a single post-processed CanvasLayer band);
``exclude_bands`` drops items in any of those bands (the global swapchain path
skips post-processed bands, which are composited by their own chain);
``min_band``/``max_band`` (inclusive) keep only items in a band interval (the
global path draws each plain-band SEGMENT between two post bands so the
composites interleave at the right z-slot). Like the lane filter, banding only
ever SHORTENS a co-batch run -- it never reorders -- so painter order within
the kept set is intact. All default ``None`` (no band filter) so the global
path is byte-identical when the feature is unused.
"""
from ..renderer.bindless_draw2d_pass import BindlessBatch
if view.count == 0:
return (
np.empty(0, dtype=UI2D_VERTEX_DTYPE),
np.empty(0, dtype=np.uint32),
np.empty(0, dtype=UI2D_VERTEX_DTYPE),
[],
)
cols = view.columns
pipeline = cols["pipeline"]
clip_scope = cols["clip_scope"]
blend = cols["blend"]
texture = cols["texture"]
flags = cols["flags"]
layer = cols["layer"]
geometry = view.geometry
clips = view.clips
a, b, c, d, tx, ty = camera
has_camera = camera != _IDENTITY_CAMERA
# Triangle stream (indexed) + line stream (non-indexed), with per-batch runs.
tri_v: list = [] # list of per-item vertex arrays (structured)
tri_i: list = [] # list of per-item index arrays (uint32, global)
line_v: list = []
batches: list = []
tri_vert_cursor = 0 # next free vertex row in the triangle buffer
tri_idx_cursor = 0 # next free index in the index buffer
line_vert_cursor = 0
sentinel = object()
run_line = False
run_clip: Any = sentinel
run_blend = -1
run_vert_start = 0 # base vertex of the current run (vertexOffset)
run_idx_start = 0
run_count = 0 # idx count (tri) or vert count (line)
def _flush() -> None:
nonlocal run_count
if run_count == 0:
return
scissor = clips.scissor(int(run_clip)) if run_clip is not None else None
batches.append(
BindlessBatch(
clip=scissor,
blend=_MODE_TO_BLEND.get(int(run_blend), "alpha"),
vert_offset=run_vert_start,
idx_offset=run_idx_start,
count=run_count,
line=run_line,
)
)
run_count = 0
for row in view.order:
i = int(row)
geom = geometry[int(cols["geometry"][i])]
verts = geom.verts
if not verts:
continue
kind = int(pipeline[i])
is_line = kind == int(PipelineKind.LINE)
item_clip = int(clip_scope[i])
item_blend = int(blend[i])
item_flags = int(flags[i])
# N1 lane filter: skip items not in the requested lane. Filtering preserves
# draw order within the lane (items stay in published (layer, seq) order);
# it can only shorten a co-batch run when the two lanes interleave, never
# reorder, so painter order is intact.
if lane != _LANE_ALL and item_in_hdr_lane(item_flags) != (lane == _LANE_HDR):
continue
# Per-band post filter (design §5.6): the post-band chain keeps only its own
# band; the global swapchain path drops every post-processed band. Same
# order-preserving "shorten a run, never reorder" property as the lane filter.
if only_band is not None and int(layer[i]) != only_band:
continue
if exclude_bands is not None and int(layer[i]) in exclude_bands:
continue
if min_band is not None and int(layer[i]) < min_band:
continue
if max_band is not None and int(layer[i]) > max_band:
continue
is_msdf = bool(item_flags & int(ItemFlags.IS_MSDF))
screen_space = bool(item_flags & int(ItemFlags.SCREEN_SPACE))
# Per-vertex tex_id + flags (the ABI): MSDF glyphs sample the atlas slot,
# textured sprites their own slot, fills/lines -1.
if is_msdf:
tex_id = atlas_slot
vflags = _FLAG_IS_MSDF
else:
tex_id = int(texture[i])
vflags = 0
# Pack the item's verts into the structured UI2D vertex (camera applied
# to position only, unless screen-space).
n = len(verts)
arr = np.asarray(verts, dtype=np.float32).reshape(n, 8)
struct = np.empty(n, dtype=UI2D_VERTEX_DTYPE)
px = arr[:, 0]
py = arr[:, 1]
if has_camera and not screen_space:
struct["position"][:, 0] = a * px + b * py + tx
struct["position"][:, 1] = c * px + d * py + ty
else:
struct["position"] = arr[:, :2]
struct["uv"] = arr[:, 2:4]
struct["colour"] = arr[:, 4:8]
struct["tex_id"] = tex_id
struct["flags"] = vflags
# Run compatibility: same topology + clip + blend. Lines (different
# pipeline) never merge with triangles; a glyph and a sprite DO merge.
same_run = (
run_count != 0
and is_line == run_line
and item_clip == run_clip
and (is_line or item_blend == run_blend)
)
if not same_run:
_flush()
run_line = is_line
run_clip = item_clip
run_blend = item_blend
if is_line:
run_vert_start = line_vert_cursor
else:
run_vert_start = tri_vert_cursor
run_idx_start = tri_idx_cursor
if is_line:
line_v.append(struct)
line_vert_cursor += n
run_count += n
else:
# Offset the op-local indices by this item's base within the run so
# the concatenated index stream is valid relative to run_vert_start
# (which becomes vkCmdDrawIndexed's vertexOffset).
idx = np.asarray(geom.indices, dtype=np.uint32)
local_base = tri_vert_cursor - run_vert_start
tri_i.append(idx + local_base)
tri_v.append(struct)
tri_vert_cursor += n
tri_idx_cursor += len(idx)
run_count += len(idx)
_flush()
tri_verts = np.concatenate(tri_v) if tri_v else np.empty(0, dtype=UI2D_VERTEX_DTYPE)
tri_indices = np.concatenate(tri_i) if tri_i else np.empty(0, dtype=np.uint32)
line_verts = np.concatenate(line_v) if line_v else np.empty(0, dtype=UI2D_VERTEX_DTYPE)
return tri_verts, tri_indices, line_verts, batches
[docs]
class BindlessItemSubmitter:
"""Render-thread-owned bindless co-batched submit of a published view (design §3 D).
The P3b counterpart of :class:`ItemSubmitter`: it builds the co-batched
geometry (one draw per ``(topology, clip, blend)`` run, across textures and
sprite/glyph) and hands it to a
:class:`~simvx.graphics.renderer.bindless_draw2d_pass.BindlessDraw2DPass`. The
MSDF atlas is registered into the bindless array by the pass; the submitter
asks the pass for the slot each frame (it is stable, refreshed only when the
atlas re-uploads).
Version-keyed reuse: a clean frame republishes the same view object (same
version + camera + atlas slot) so the built geometry is reused verbatim (zero
rebuild) -- the §4 "clean frame uploads nothing" fast path at the bindless
submit boundary.
"""
__slots__ = ("_pass", "_cache_key", "_cached", "_build_count", "_reuse_count")
def __init__(self, bindless_pass: Any) -> None:
self._pass = bindless_pass
self._cache_key: tuple | None = None
self._cached: tuple | None = None
self._build_count = 0
self._reuse_count = 0
def _geometry_for(
self,
view: PublishedItemView,
camera: CameraAffine,
atlas_slot: int,
lane: str,
only_band: int | None,
exclude_bands: frozenset[int] | None,
min_band: int | None,
max_band: int | None,
):
key = (id(view), view.version, camera, atlas_slot, lane, only_band, exclude_bands, min_band, max_band)
if self._cache_key == key and self._cached is not None:
self._reuse_count += 1
return self._cached
built = build_bindless_geometry(
view, camera=camera, atlas_slot=atlas_slot, lane=lane,
only_band=only_band, exclude_bands=exclude_bands,
min_band=min_band, max_band=max_band,
)
self._cache_key = key
self._cached = built
self._build_count += 1
return built
[docs]
def render(
self,
cmd: Any,
view: PublishedItemView,
width: int,
height: int,
*,
ui_width: int = 0,
ui_height: int = 0,
camera: CameraAffine = _IDENTITY_CAMERA,
lane: str = _LANE_ALL,
only_band: int | None = None,
exclude_bands: frozenset[int] | None = None,
min_band: int | None = None,
max_band: int | None = None,
) -> None:
"""Submit ``view`` to ``cmd`` through the bindless co-batched pipeline.
The MSDF atlas bindless slot is registered OUTSIDE the render pass (the
renderer calls :meth:`BindlessDraw2DPass.sync_atlas_slot` in ``pre_render``,
because ``register_texture`` must not run during command recording). Here
we only READ the already-synced slot.
``lane`` (N1) selects which items to draw: ``"all"`` (post off, everything
to the swapchain), ``"hdr"`` (world lane into the HDR target before tonemap),
or ``"ldr"`` (screen lane onto the swapchain after tonemap). One submitter
instance is bound to one render pass, so the HDR and LDR lanes use distinct
submitters; the cache key carries ``lane`` so each reuses its own geometry.
``only_band`` / ``exclude_bands`` / ``min_band`` / ``max_band``
(per-CanvasLayer post) filter by the ``layer`` band: the post chain submits
``only_band``; the global path passes ``exclude_bands`` (or a
``min_band``/``max_band`` segment) to interleave with the composites. All
default ``None``.
"""
atlas_slot = self._pass.atlas_slot
tri_verts, tri_indices, line_verts, batches = self._geometry_for(
view, camera, atlas_slot, lane, only_band, exclude_bands, min_band, max_band,
)
self._pass.render(
cmd, width, height, ui_width, ui_height,
verts=tri_verts, indices=tri_indices, line_verts=line_verts, batches=batches,
)
[docs]
@property
def last_frame_draw_count(self) -> int:
return int(self._pass.last_frame_draw_count)
[docs]
@property
def build_count(self) -> int:
return self._build_count
[docs]
@property
def reuse_count(self) -> int:
return self._reuse_count