# LLM NPC Flavour

a night guard whose barks come from a low-frequency LLM brain.

```{raw} html
<a class="example-run-button" href="../demos/features_ai_llm_npc_flavour.html">▶ Run in browser</a>

```

**Tags:** `ai` `llm`

The classical, authoritative game logic runs every frame and is the source of
truth: hp, ammo, and an alert flag drift / toggle in ``on_update`` and are written
onto the agent's blackboard. A child ``AgentNode`` runs an ``LLMBrain`` that, a few
seconds apart and entirely off the frame thread, turns that authoritative slice
into a single in-character line. A bottom-anchored HUD renders the latest bark
every frame.

The LLM is a stage, not the NPC: while a call is in flight, or if it is late /
dropped / fails, the HUD keeps the last good line and the game never stalls.

## How to run

OFFLINE (default, no LLM, no network, no setup): a scripted fake client picks a
canned bark for the situation, after a trivial awaitable that proves the
off-thread path. This is what the screenshot walker and web export use.

    uv run python examples/features/ai/llm_npc_flavour.py

LIVE: point it at any OpenAI-compatible chat endpoint (vLLM, llama.cpp, Ollama,
hosted, etc.) by setting these environment variables, then pass --live. The
client is built by `OpenAICompatibleClient.from_env()`:

    SIMVX_LLM_BASE_URL   required, e.g. http://host:8000/v1
    SIMVX_LLM_MODEL      required, the model name the endpoint serves
    SIMVX_LLM_API_KEY    optional, only if your endpoint needs a key

    SIMVX_LLM_BASE_URL=http://host:8000/v1 SIMVX_LLM_MODEL=your-model         uv run python examples/features/ai/llm_npc_flavour.py --live

Live runs record responses to a local cache (CACHE_DIR) so a re-run is
deterministic and free.

Controls: A toggles the alert state (watch the next bark change), Esc quits.

## Source

```{literalinclude} ../../examples/features/ai/llm_npc_flavour.py
:language: python
:linenos:
```