LLM NPC Flavour¶

a night guard whose barks come from a low-frequency LLM brain.

Tags: ai llm

The classical, authoritative game logic runs every frame and is the source of truth: hp, ammo, and an alert flag drift / toggle in on_update and are written onto the agent’s blackboard. A child AgentNode runs an LLMBrain that, a few seconds apart and entirely off the frame thread, turns that authoritative slice into a single in-character line. A bottom-anchored HUD renders the latest bark every frame.

The LLM is a stage, not the NPC: while a call is in flight, or if it is late / dropped / fails, the HUD keeps the last good line and the game never stalls.

How to run¶

OFFLINE (default, no LLM, no network, no setup): a scripted fake client picks a canned bark for the situation, after a trivial awaitable that proves the off-thread path. This is what the screenshot walker and web export use.

uv run python examples/features/ai/llm_npc_flavour.py

LIVE: point it at any OpenAI-compatible chat endpoint (vLLM, llama.cpp, Ollama, hosted, etc.) by setting these environment variables, then pass –live. The client is built by OpenAICompatibleClient.from_env():

SIMVX_LLM_BASE_URL   required, e.g. http://host:8000/v1
SIMVX_LLM_MODEL      required, the model name the endpoint serves
SIMVX_LLM_API_KEY    optional, only if your endpoint needs a key

SIMVX_LLM_BASE_URL=http://host:8000/v1 SIMVX_LLM_MODEL=your-model         uv run python examples/features/ai/llm_npc_flavour.py --live

Live runs record responses to a local cache (CACHE_DIR) so a re-run is deterministic and free.

Controls: A toggles the alert state (watch the next bark change), Esc quits.

Source¶

"""LLM NPC Flavour: a night guard whose barks come from a low-frequency LLM brain.

The classical, authoritative game logic runs every frame and is the source of
truth: hp, ammo, and an alert flag drift / toggle in ``on_update`` and are written
onto the agent's blackboard. A child ``AgentNode`` runs an ``LLMBrain`` that, a few
seconds apart and entirely off the frame thread, turns that authoritative slice
into a single in-character line. A bottom-anchored HUD renders the latest bark
every frame.

The LLM is a stage, not the NPC: while a call is in flight, or if it is late /
dropped / fails, the HUD keeps the last good line and the game never stalls.

## How to run

OFFLINE (default, no LLM, no network, no setup): a scripted fake client picks a
canned bark for the situation, after a trivial awaitable that proves the
off-thread path. This is what the screenshot walker and web export use.

    uv run python examples/features/ai/llm_npc_flavour.py

LIVE: point it at any OpenAI-compatible chat endpoint (vLLM, llama.cpp, Ollama,
hosted, etc.) by setting these environment variables, then pass --live. The
client is built by `OpenAICompatibleClient.from_env()`:

    SIMVX_LLM_BASE_URL   required, e.g. http://host:8000/v1
    SIMVX_LLM_MODEL      required, the model name the endpoint serves
    SIMVX_LLM_API_KEY    optional, only if your endpoint needs a key

    SIMVX_LLM_BASE_URL=http://host:8000/v1 SIMVX_LLM_MODEL=your-model \
        uv run python examples/features/ai/llm_npc_flavour.py --live

Live runs record responses to a local cache (CACHE_DIR) so a re-run is
deterministic and free.

Controls: A toggles the alert state (watch the next bark change), Esc quits.

# /// simvx
# tags = ["ai", "llm"]
# ///
"""

from __future__ import annotations

import asyncio
import math
import random
import re
import sys

from simvx.ai import BARK_KEY, CachingClient, LLMBrain, OpenAICompatibleClient
from simvx.ai.client import LLMClient, LLMResponse
from simvx.core import AnchorPreset, Input, InputMap, Key, Label, Node2D
from simvx.core.ai import AgentNode
from simvx.graphics import App

CACHE_DIR = "/tmp/simvx_llm_npc_cache"


class ScriptedGuardClient(LLMClient):
    """An offline fake: a trivial awaitable, then a canned bark for the situation.

    This stands in for a real model so the demo runs with no network. It still
    exercises the full async path (the ``await`` runs on the AsyncSlot loop, never
    the frame), so the non-blocking / coalescing / degrade behaviour is identical.
    """

    CALM = ["All quiet on the wall.", "Another slow shift.", "Nothing moving out there.", "Cold night. Stay sharp."]
    ALERT = ["Movement, north side!", "I heard something. Eyes up.", "Stay down, company.", "That's not the wind."]
    LOW_AMMO = ["Running low on rounds.", "Down to my last clip.", "Need a resupply soon."]

    async def complete(self, messages, **kwargs) -> LLMResponse:
        await asyncio.sleep(0.08)  # simulate model latency, off the frame thread
        user = messages[-1]["content"].lower()
        # Read the exact ammo value (a substring check would match "ammo": 1 inside 12).
        ammo_match = re.search(r'"ammo":\s*(\d+)', user)
        ammo = int(ammo_match.group(1)) if ammo_match else 99
        if '"alert": true' in user:
            pool = self.ALERT
        elif ammo <= 2:
            pool = self.LOW_AMMO
        else:
            pool = self.CALM
        return LLMResponse(text=random.choice(pool))


class NightGuard(Node2D):
    """Authoritative classical state every frame; an LLMBrain only colours it."""

    def __init__(self, client: LLMClient | None = None, **kwargs) -> None:
        super().__init__(**kwargs)
        # Default (no-arg) construction runs offline, so the screenshot walker and
        # web export (both of which instantiate the root with no args) get the
        # canned barks; main() passes a real client for --live.
        self._client = client if client is not None else ScriptedGuardClient()
        self.hp = 100.0
        self.ammo = 12
        self.alert = False
        self._t = 0.0
        self.agent: AgentNode | None = None
        self.hud: Label | None = None
        self.status: Label | None = None

    def on_ready(self) -> None:
        InputMap.add_action("toggle_alert", [Key.A])
        InputMap.add_action("quit", [Key.ESCAPE])

        # The child agent runs the LLM brain low-frequency (every 4s).
        self.agent = AgentNode(
            brain=LLMBrain(
                self._client,
                persona="a terse, tired night-watch guard",
                facts=["hp", "ammo", "alert"],
                period=4.0,
            ),
            name="GuardBrain",
        )
        self.add_child(self.agent)

        # Bottom-anchored HUD (anchors + margins, never absolute position).
        hud = Label("...", name="Bark")
        hud.set_anchor_preset(AnchorPreset.BOTTOM_WIDE)
        hud.margin_left = 20
        hud.margin_right = 20
        hud.margin_top = -64
        hud.margin_bottom = -16
        hud.font_size = 28.0
        hud.alignment = "center"
        self.add_child(hud)
        self.hud = hud

        title = Label("Night Guard  -  A: toggle alert   Esc: quit", name="Title")
        title.set_anchor_preset(AnchorPreset.CENTER_TOP)
        title.margin_left = -260
        title.margin_right = 260
        title.margin_top = 16
        title.margin_bottom = 40
        title.font_size = 18.0
        title.alignment = "center"
        self.add_child(title)

        # Authoritative classical state, centred, updated every frame (the LLM never
        # writes this: it only reads it to colour the bark below).
        status = Label("", name="Status")
        status.set_anchor_preset(AnchorPreset.CENTER)
        status.margin_left = -260
        status.margin_right = 260
        status.margin_top = -20
        status.margin_bottom = 20
        status.font_size = 24.0
        status.alignment = "center"
        self.add_child(status)
        self.status = status

    def on_update(self, dt: float) -> None:
        # Classical authoritative simulation, every single frame.
        self._t += dt
        self.hp = 60.0 + 40.0 * (0.5 + 0.5 * math.sin(self._t * 0.7))
        if self._t % 2.0 < dt:
            # Drain a round every couple of seconds, then resupply once empty, so the
            # demo cycles through calm / low-ammo states (and barks) instead of draining flat.
            self.ammo = self.ammo - 1 if self.ammo > 0 else 12
        if Input.is_action_just_pressed("toggle_alert"):
            self.alert = not self.alert
        if Input.is_action_just_pressed("quit"):
            self.app.quit()

        # Publish the authoritative slice onto the agent's blackboard each frame.
        board = self.agent.blackboard if self.agent else None
        if board is not None:
            board.set("hp", round(self.hp))
            board.set("ammo", self.ammo)
            board.set("alert", self.alert)

        # Render the authoritative classical state (updates every frame) and the latest
        # LLM bark (last good line if a call is in flight / failed).
        if self.status is not None:
            flag = "ALERT" if self.alert else "calm"
            self.status.text = f"hp {round(self.hp)}    ammo {self.ammo}    {flag}"
        if self.hud is not None and board is not None:
            self.hud.text = str(board.get(BARK_KEY, "..."))


def _build_client(live: bool) -> LLMClient:
    if not live:
        return ScriptedGuardClient()
    # Record/replay so a re-run is deterministic and free.
    return CachingClient(OpenAICompatibleClient.from_env(), CACHE_DIR, mode="auto")


def main() -> None:
    live = "--live" in sys.argv
    app = App(title="SimVX LLM NPC Flavour", width=900, height=600)
    app.run(NightGuard(_build_client(live), name="NightGuard"))


if __name__ == "__main__":
    main()