# LLM NPC Flavour a night guard whose barks come from a low-frequency LLM brain. ```{raw} html ▶ Run in browser ``` **Tags:** `ai` `llm` The classical, authoritative game logic runs every frame and is the source of truth: hp, ammo, and an alert flag drift / toggle in ``on_update`` and are written onto the agent's blackboard. A child ``AgentNode`` runs an ``LLMBrain`` that, a few seconds apart and entirely off the frame thread, turns that authoritative slice into a single in-character line. A bottom-anchored HUD renders the latest bark every frame. The LLM is a stage, not the NPC: while a call is in flight, or if it is late / dropped / fails, the HUD keeps the last good line and the game never stalls. ## How to run OFFLINE (default, no LLM, no network, no setup): a scripted fake client picks a canned bark for the situation, after a trivial awaitable that proves the off-thread path. This is what the screenshot walker and web export use. uv run python examples/features/ai/llm_npc_flavour.py LIVE: point it at any OpenAI-compatible chat endpoint (vLLM, llama.cpp, Ollama, hosted, etc.) by setting these environment variables, then pass --live. The client is built by `OpenAICompatibleClient.from_env()`: SIMVX_LLM_BASE_URL required, e.g. http://host:8000/v1 SIMVX_LLM_MODEL required, the model name the endpoint serves SIMVX_LLM_API_KEY optional, only if your endpoint needs a key SIMVX_LLM_BASE_URL=http://host:8000/v1 SIMVX_LLM_MODEL=your-model uv run python examples/features/ai/llm_npc_flavour.py --live Live runs record responses to a local cache (CACHE_DIR) so a re-run is deterministic and free. Controls: A toggles the alert state (watch the next bark change), Esc quits. ## Source ```{literalinclude} ../../examples/features/ai/llm_npc_flavour.py :language: python :linenos: ```