simvx.graphics.gpu.multi_device¶
Explicit multi-adapter (multi-GPU) foundation: D8 workload-split offload.
This is the foundation wave of design decision D8. It builds the gated
plumbing for an explicit-multi-adapter renderer (one independent VkDevice
per physical GPU) where whole SubViewport / offscreen scene-render-units (SRUs)
are offloaded to secondary GPUs and composited on the primary (GPU 0). It is
deliberately off by default: a single-GPU box, and any multi-GPU box that
does not opt in, runs today’s single-device path byte-identical (no extra
device, no transfer work, no behaviour change). The active multi-device path is
verified on a 4x Arc Pro B70 rig, not on this single-GPU dev box.
Three coherent, unit-testable pieces live here:
- class:
MultiDeviceManagerenumerates physical devices and, only whenphysical_device_count > 1and the caller opted in, creates an independent logical device per physical GPU (reusing- func:
~simvx.graphics.gpu.device.create_logical_device). When the count is 1 or the opt-in is off it holds exactly the existing single (primary) device and is a transparent passthrough.
- func:
assign_srusis the pure device-assignment policy (no Vulkan): given the ordered SRUs and a device count it decides which render on GPU 0 vs which offload to GPU 1+. TAA-safe by construction (each SRU keeps its temporal history on its one assigned device, so there is no cross-device reprojection).
- class:
CrossDeviceTransferselects how a finished offscreen colour image moves from a secondary device to the primary for compositing. The staging-copy path (secondary -> host-visible staging -> primary) is the guaranteed floor and is the only path implemented end-to-end; the dma-buf /VK_KHR_external_memory_fdzero-copy path is the rig optimisation, gated behind a capability and currently raising a clear, actionable error.
REALITY CHECK (honest scope). Rendering an SRU on a second VkDevice
requires that device to own its full set of rendering resources: its own
pipelines, descriptor pools, transform/material SSBOs, and mesh+texture
residency. The current :class:~simvx.graphics.renderer.forward.Renderer is
built around exactly one device. Duplicating it per device is a large,
GPU-bound refactor that cannot be functionally verified on this single-GPU box.
So this module stops at a clean seam: the device manager, the assignment
policy, and the transfer interface are real and tested; the per-device renderer
construction + the actual offload-record-and-composite loop are the documented
rig-side completion (see :class:MultiDeviceManager.attach_renderer /
- meth:
DeviceSlot.renderer).
Module Contents¶
Classes¶
One physical+logical GPU participating in the multi-device renderer. |
|
Owns the per-physical-GPU logical devices for the D8 offload renderer. |
|
Which device renders one SRU and whether its result must be transferred. |
|
Selectable cross-device image-transfer strategies (enum-like constants). |
|
Moves a finished offscreen colour image from a secondary to the primary. |
|
The per-SRU offload decision the recording path consults. |
|
Decides + drives where each SubViewport SRU renders across the devices. |
Functions¶
Decide which device renders each SRU (pure, unit-testable, no Vulkan). |
|
Pick the transfer strategy for one SRU assignment (pure, no Vulkan). |
Data¶
API¶
- simvx.graphics.gpu.multi_device.log¶
‘getLogger(…)’
- simvx.graphics.gpu.multi_device.__all__¶
[‘DeviceSlot’, ‘MultiDeviceManager’, ‘SRUAssignment’, ‘assign_srus’, ‘CrossDeviceTransfer’, ‘Transfe…
- class simvx.graphics.gpu.multi_device.DeviceSlot[source]¶
One physical+logical GPU participating in the multi-device renderer.
index0 is always the primary (compositing) GPU; 1+ are secondaries that render offloaded SRUs. On a single-GPU run there is exactly one slot (index == 0) wrapping the engine’s existing device, and nothing else is created.rendereris the per-device :class:~simvx.graphics.renderer.forward.Rendererduplicate. It is populated for the primary slot from the engine’s existing renderer and, on the rig, for each secondary by- Meth:
MultiDeviceManager.attach_renderer. It staysNonefor secondaries until that rig-side per-device renderer construction lands (see the module docstring): the foundation never fabricates a broken renderer to look complete.
- index: int¶
None
- physical_device: Any¶
None
- queue_families: simvx.graphics.gpu.device.QueueFamilies¶
None
- device: Any¶
None
- graphics_queue: Any¶
None
- present_queue: Any¶
None
- compute_queue: Any¶
None
- transfer_queue: Any¶
None
- name: str = <Multiline-String>¶
- renderer: Any¶
None
- class simvx.graphics.gpu.multi_device.MultiDeviceManager(*, primary_physical_device: Any, primary_queue_families: simvx.graphics.gpu.device.QueueFamilies, primary_device: Any, primary_graphics_queue: Any, primary_present_queue: Any, physical_devices: list[Any], enabled: bool, capabilities: simvx.graphics.gpu.capabilities.RenderCapabilities | None = None, primary_compute_queue: Any = None, primary_transfer_queue: Any = None, find_queue_families: Any = None)[source]¶
Owns the per-physical-GPU logical devices for the D8 offload renderer.
Construct with the primary device’s already-created handles (the engine’s existing single device, so the primary slot is never re-created) plus the enumerated physical devices and the opt-in flag. When
enabledis true and more than one physical device is present, a secondary :class:DeviceSlotis created per additional physical GPU with its own independentVkDevicevia- Func:
create_logical_device. Otherwise the manager holds exactly the single primary slot and :attr:multi_gpuisFalse(today’s path, unchanged).
The manager does NOT own the primary device’s lifetime (the engine created and destroys it); :meth:
destroyonly tears down the secondary devices it created itself.Initialization
- property multi_gpu: bool[source]¶
True only when an opted-in multi-device renderer is active (>= 2 slots).
- property device_count: int[source]¶
Number of logical devices the manager owns (1 on the single-GPU path).
- property primary: simvx.graphics.gpu.multi_device.DeviceSlot[source]¶
- property secondaries: list[simvx.graphics.gpu.multi_device.DeviceSlot][source]¶
- property slots: list[simvx.graphics.gpu.multi_device.DeviceSlot][source]¶
- slot(index: int) simvx.graphics.gpu.multi_device.DeviceSlot[source]¶
- attach_renderer(index: int, renderer: Any) None[source]¶
Bind a per-device :class:
Rendererto slotindex(rig-side).The primary renderer is the engine’s existing one. Each secondary needs its OWN renderer (its device’s pipelines / descriptor pools / SSBOs / residency); constructing that is the documented rig-side completion. This setter is the seam where the rig hands the constructed per-device renderer back to the manager so the offload loop can record into it.
- register_teardown(hook: Any) None[source]¶
Register a callback to free per-secondary GPU resources before device destroy.
The offload coordinator registers its :meth:
SRUOffloadCoordinator.destroyhere so its secondary facades / targets / staging buffers are freed while the secondaryVkDevice\ s are still alive (they own those resources). Invoked first by :meth:destroy.
- destroy() None[source]¶
Destroy only the SECONDARY logical devices this manager created.
The primary device is owned by the engine and left untouched. Safe to call on the single-GPU path (no secondaries => no-op). Registered teardown hooks run FIRST so per-secondary GPU resources are freed before the devices.
- class simvx.graphics.gpu.multi_device.SRUAssignment[source]¶
Which device renders one SRU and whether its result must be transferred.
device_index0 means the SRU renders on the primary and is already resident for compositing (needs_transferisFalse). A non-zero index means the SRU is offloaded to that secondary and its finished colour image must be moved to the primary before compositing (needs_transferisTrue).- sru_id: int¶
None
- device_index: int¶
None
- simvx.graphics.gpu.multi_device.assign_srus(srus: list[Any], device_count: int, *, sru_id: Any = None, cost: Any = None) list[simvx.graphics.gpu.multi_device.SRUAssignment][source]¶
Decide which device renders each SRU (pure, unit-testable, no Vulkan).
Policy (TAA-safe by construction: an SRU is assigned to exactly one device, so its temporal history never crosses devices):
device_count <= 1: EVERY SRU stays on GPU 0. This is the single-GPU / unopted path and produces the same per-SRU work as today, byte-identical (no transfer is ever flagged).device_count >= 2: the main scene is implicitly GPU 0 (it is not an SRU and is not in this list). Among the SRUs, the cheapest stay on GPU 0 (composited locally) and the heaviest independent SRUs are offloaded to the secondaries round-robin. Concretely: sort SRUs by descending cost and walk them, sending the next heaviest to the least-loaded secondary while that keeps the primary from being the bottleneck; ties and the remainder stay on GPU 0.
The returned list preserves the INPUT order of
srus(the P1 producer-before-consumer topological order), so a consumer SRU still follows the producer it samples; only the device choice is decided here.Args: srus: Ordered SRU plans (
SubViewportSRUor any object exposing the cost inputs). Order is preserved in the result. device_count: Number of devices available (MultiDeviceManager.device_count). sru_id: Optional accessorsru -> intfor the stable id; defaults to readingsru.sru_id. cost: Optional accessorsru -> intoverriding the default instance-count heuristic (handy for tests).
- class simvx.graphics.gpu.multi_device.TransferMethod[source]¶
Selectable cross-device image-transfer strategies (enum-like constants).
- NONE¶
‘none’
SRU already lives on the primary device: compositing samples it directly.
- STAGING_COPY¶
‘staging_copy’
Secondary image -> host-visible staging buffer -> primary image. The guaranteed floor, works on any pair of devices (the colour RenderTarget already carries TRANSFER_SRC|TRANSFER_DST).
- DMABUF¶
‘dmabuf’
Zero-copy via VK_KHR_external_memory_fd (dma-buf import/export). The rig optimisation; requires the external-memory extensions to be enabled on both devices. Gated and currently raises until the rig path is implemented.
- simvx.graphics.gpu.multi_device.select_transfer_method(assignment: simvx.graphics.gpu.multi_device.SRUAssignment, capabilities: simvx.graphics.gpu.capabilities.RenderCapabilities | None, *, prefer_dmabuf: bool = True) str[source]¶
Pick the transfer strategy for one SRU assignment (pure, no Vulkan).
An SRU on the primary needs no transfer -> :data:
TransferMethod.NONE.An offloaded SRU uses :data:
TransferMethod.DMABUFwhen the caller prefers it AND the capability snapshot reports the external-memory-fd path enabled; otherwise the always-available :data:TransferMethod.STAGING_COPY.
The capability gate is :attr:
RenderCapabilities.external_memory_fd_enabled: the extension must be enabled at device creation, not merely probed available. A probed-but-not-enabled device cannot export/import fds, so DMABUF must not be selected for it (that would later raise with no working path). On this dev box the single-GPU path never enables it, so the field isFalse, the staging-copy floor is always chosen, and the dma-buf raise is never reached.
- class simvx.graphics.gpu.multi_device.CrossDeviceTransfer[source]¶
Moves a finished offscreen colour image from a secondary to the primary.
Holds the source (secondary) and destination (primary) :class:
DeviceSlotplus the chosen :class:TransferMethod. Cross-device synchronisation is explicit: each device signals a fence/timeline when its half of the copy is done, and the primary’s composite waits on the import being complete. The sync handles are carried on this object so the rig wiring is a single seam.Implemented:
- Meth:
run_staging_copydefines the staging-copy sequence (secondary GPU copy-to-buffer -> host-visible staging -> primary copy-to-image). The CPU-roundtrip floor: correct on any device pair, slower than dma-buf. The actualvkCmd*recording is the rig-side completion because it needs both devices’ live command pools + a host-visible staging allocation per device, which cannot be exercised on this single-GPU box; the method documents and gates that precisely rather than emitting unverifiable copy code.
Gated (rig):
- Meth:
run_dmabufraises a clear, actionable error until VK_KHR_external_memory_fd is enabled and the import/export is wired on the rig.
- method: str¶
None
- src_done: Any¶
None
- dst_ready: Any¶
None
- width: int¶
0
- height: int¶
0
- bytes_per_pixel: int¶
8
- extra: dict¶
‘field(…)’
- run(*, src_image: Any = None, dst_image: Any = None) None[source]¶
Execute the selected transfer. Dispatches by :attr:
method.- Data:
TransferMethod.NONEis a no-op (single-GPU / primary-resident SRU, the byte-identical path). The other two dispatch to their handlers.
- run_staging_copy(*, src_image: Any = None, dst_image: Any = None) None[source]¶
Staging-copy floor: secondary image -> host staging -> primary image.
The CPU-roundtrip floor: correct on any device pair, slower than dma-buf.
src_imageis the finished SRU colour image on the SECONDARY device (left inSHADER_READ_ONLY_OPTIMALby the offscreen pass; the RenderTarget carriesTRANSFER_SRC_BIT).dst_imageis the primary-device image the main scene samples for this SubViewport feed (carriesTRANSFER_DST_BIT).Exact sequence + every wait (the contract recorded here, verified on the rig). All staging buffers are host-visible|host-coherent and pre-built once per (transfer, size) into :attr:
extraby :meth:ensure_staging:SOURCE (secondary) device, one-shot command buffer on the secondary command pool:
a. barrier
src_image:SHADER_READ_ONLY_OPTIMAL -> TRANSFER_SRC_OPTIMAL(src stage FRAGMENT_SHADER, dst stage TRANSFER; src access SHADER_READ, dst access TRANSFER_READ). b.vkCmdCopyImageToBuffersrc_image -> src_staging(tightly packed,bufferRowLength=0,bufferImageHeight=0). c. barriersrc_image:TRANSFER_SRC_OPTIMAL -> SHADER_READ_ONLY_OPTIMALso the secondary can render into / sample it again next frame. d. submit onsrc.graphics_queuesignalling :attr:src_done(a fence on the secondary device). WAIT: the host read in step 2 blocks onsrc_done(vkWaitForFences); the GPU copy must complete before the bytes are mapped. The offscreen SRU render fence is itself waited on before this submit by the caller (_record_offloaded_sru), so the colour image is fully written first.HOST: map
src_staging(secondary device),memmovetheheight * row_bytesbytes intodst_staging(primary device), unmap both. Both are host-coherent so no explicit flush/invalidate is needed. WAIT: gated onsrc_donefrom 1d before the read; the write intodst_staginghappens-before the primary GPU read in 3 because step 3 is submitted only after this memmove returns on the same (host) thread.DESTINATION (primary) device, one-shot command buffer on the primary command pool:
a. barrier
dst_image:SHADER_READ_ONLY_OPTIMAL -> TRANSFER_DST_OPTIMAL. b.vkCmdCopyBufferToImagedst_staging -> dst_image. c. barrierdst_image:TRANSFER_DST_OPTIMAL -> SHADER_READ_ONLY_OPTIMALso the main composite pass samples it. d. submit ondst.graphics_queuesignalling :attr:dst_ready(a fence on the primary device). WAIT: the main composite pass that samplesdst_imageruns afterdst_ready; the caller waits on it before recording the frame’s main pass (or, in the same-frame in-cmd model, this transfer is submitted + waited before the main pass is recorded).
The implementation records exactly this. It needs both devices’ command pools + a host-visible staging buffer per device, supplied via
- Meth:
ensure_staging. On the single-GPU path the method is NONE and this is never reached; it is exercised end-to-end only on the multi-GPU rig.
- ensure_staging(src_command_pool: Any, dst_command_pool: Any) dict[source]¶
Allocate (once) the per-device host-visible staging buffers + cache pools.
Builds one host-visible|host-coherent buffer on each device, sized
width * height * bytes_per_pixel(the raw colour image bytes). Cached on- Attr:
extra['staging']so a steady-state per-frame transfer reuses them; reallocated only when the size changes (a SubViewport resize). Returns the staging dict. Rig-side GPU allocation; never reached on this box.
- abstractmethod run_dmabuf(*, src_image: Any = None, dst_image: Any = None) None[source]¶
Zero-copy dma-buf transfer (VK_KHR_external_memory_fd). Rig optimisation.
Export the secondary colour image’s memory as an opaque fd / dma-buf, import it on the primary as an external image, and composite directly with a single cross-device semaphore wait (no host roundtrip). Requires the external-memory-fd extension enabled on BOTH devices at init.
Gated: raises until the rig path is implemented and the
external_memory_fdcapability is enabled.
- class simvx.graphics.gpu.multi_device.OffloadRoute[source]¶
The per-SRU offload decision the recording path consults.
Computed once per frame by :class:
SRUOffloadCoordinator.planfrom the ordered SRU list.device_index0 means “render this SRU on the primary, exactly as today” (offloadedisFalse,transferis- Data:
TransferMethod.NONE); a non-zero index means the SRU renders on that secondary’s renderer and its colour image is moved to the primary via- Attr:
transferbefore the main pass samples it.
- sru_id: int¶
None
- device_index: int¶
None
- transfer: str¶
None
- class simvx.graphics.gpu.multi_device.SRUOffloadCoordinator(manager: simvx.graphics.gpu.multi_device.MultiDeviceManager, capabilities: simvx.graphics.gpu.capabilities.RenderCapabilities | None = None, *, prefer_dmabuf: bool = True, content_scale: tuple[float, float] = (1.0, 1.0), secondary_renderer_factory: Any = None)[source]¶
Decides + drives where each SubViewport SRU renders across the devices.
Glue between the (already-tested) :func:
assign_sruspolicy, the- Func:
select_transfer_methodselector, and the :class:MultiDeviceManagerdevice topology. The recording path (SceneAdapter.render_sru_from_plan/render_to_target) consults a coordinator, when one is present, to decide per SRU whether to take today’s primary-device path or route the SRU to a secondary device and transfer the result back.
Constructed only when the manager is actively multi-GPU (opted in AND >= 2 devices). On the single-GPU / unopted path the engine builds no coordinator, so the recording path’s
coordinator is Nonebranch is taken and the frame is byte-identical to today.The decision logic (
plan/route_for) is pure and unit-tested with no Vulkan. :meth:render_offloadedis the seam where a secondary-assigned SRU would be recorded on its device’s renderer and transferred back; that needs a per-device renderer (DeviceSlot.renderer), which isNoneon every box without the rig-side per-device-renderer construction, so it raises a clear, capability-gated error rather than emitting unverifiable cross-device code.Initialization
- property active: bool[source]¶
True only when the backing manager is an opted-in multi-device renderer.
- plan(srus: list[Any], *, sru_id: Any = None, cost: Any = None) list[simvx.graphics.gpu.multi_device.OffloadRoute][source]¶
Compute + cache this frame’s per-SRU routes from the ordered SRU list.
Returns one :class:
OffloadRouteper SRU in INPUT order (the P1 producer-before-consumer topological order is preserved). When the coordinator is inactive (single-GPU / unopted) every route stays on GPU 0 with :data:TransferMethod.NONE, so the caller takes today’s path.sru_id/costare forwarded to :func:assign_srusso a caller whose ordered items are notSubViewportSRU(e.g. the synchronous path’s live nodes) can supply the id + offload-cost accessors.
- route_for(sru_id: int) simvx.graphics.gpu.multi_device.OffloadRoute | None[source]¶
The cached route for
sru_idfrom the last :meth:plan, orNone.Nonemeans “no decision recorded” (the SRU was not in the planned list); callers treat that as “render on the primary”, today’s path.
- transfer_for(route: simvx.graphics.gpu.multi_device.OffloadRoute, *, width: int = 0, height: int = 0) simvx.graphics.gpu.multi_device.CrossDeviceTransfer[source]¶
Build the :class:
CrossDeviceTransferfor an offloadedroute.Pairs the secondary (
route.device_index) and primary (0) device slots with the chosen transfer method so the rig wiring is a single seam.
- render_offloaded(route: simvx.graphics.gpu.multi_device.OffloadRoute) Any[source]¶
Return the per-device renderer for an offloaded SRU, building it if needed.
Resolution order:
An explicitly attached renderer (
MultiDeviceManager.attach_renderer) is returned as-is (the rig may pre-build + bind one; also the unit-test seam).Otherwise, if a :attr:
_secondary_renderer_factorywas injected (rig), lazily build the secondary :class:SecondaryRenderContextfacade, run the factory to construct +setup()aRenderer(facade)on the secondary device, bind it to the slot, and return it.With neither (this single-GPU box: no factory is ever injected because no secondary device exists), raise the clear, capability-gated rig-completion error. NEVER reached on the single-GPU / unopted path: that path has no coordinator at all, so the seam’s
coordinator is Nonebranch keeps it byte-identical.
- render_sru_offloaded(sru: Any, primary_dst_image: Any) bool[source]¶
Render one SRU on its secondary device and transfer it to the primary image.
End-to-end multi-GPU SubViewport offload for one SRU (the
- Class:
~simvx.graphics.renderer.render_packet.SubViewportSRUplan):
Resolve the SRU’s route (must be offloaded; else returns
Falseso the caller takes the primary path).Ensure the secondary render context (facade +
Renderer+ residency).Mirror the SRU’s mesh geometry (and, when textured-residency is wired, its sampled textures) onto the secondary device via :class:
SecondaryResidency.Size / create the secondary offscreen :class:
RenderTargetfor this SRU.Record + submit the SRU’s draws into that target on the secondary device (one submit on the secondary graphics queue; the offscreen RenderTarget leaves the colour image in
SHADER_READ_ONLY_OPTIMAL).Run the :class:
CrossDeviceTransfer(staging-copy floor) to move the secondary colour image intoprimary_dst_image(the SubViewport’s primary-device bindless image the main scene samples).
Returns
Truewhen the SRU was handled on a secondary,Falsewhen it was not offloaded (caller renders it on the primary as today). The actual per-device GPU record + submit (step 5) is delegated to the injected secondary renderer; this method owns the orchestration + the cross-device transfer. Exercised on the 4x Arc Pro B70 rig; never reached on the single-GPU path.