Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
56 changes: 53 additions & 3 deletions docs/design/52-llm-role-play-provider-evaluation.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,52 @@ References:

---

## Technical Patterns Worth Studying

Alibaba Role Play is useful as a reference source for how a model provider
packages character control, not as a place to store SECOND SPAWN state.

The useful patterns are:

- Character profile as a first-class request input, separate from the player's
current message.
- Session identity as an adapter concern, so the model can keep short local
dialogue continuity without the game treating that session as durable memory.
- Role-play-tuned model names that can be benchmarked separately from generic
chat models.
- Provider-specific memory and session controls that can inspire our own
`ConversationSession` contract while staying non-canonical.
- OpenAI-compatible transport shape, which lowers adapter cost for
`api.dos.ai`.
- Response normalization into a small game contract: dialogue candidate,
action intent candidate, validation metadata, latency, and fallback reason.

The pattern SECOND SPAWN should copy is the **shape** of character-conditioned
requests:

```text
server-owned actor profile
+ server-selected memory summary
+ current conversation objective
+ player or nearby NPC utterance
+ allowed action labels
-> provider request
-> dialogue or intent candidate
-> Nakama/Fusion validation
```

The pattern SECOND SPAWN should not copy is provider-owned game memory. Any
provider session or memory feature is adapter-local cache at most. The durable
record remains `FrameMemory`, `RelationshipLedger`, `KnowledgePack`,
`ConversationSession`, and `PromptTrace` in Nakama-owned storage.

For character movement, combat, quest, inventory, TIME, SECOND, or relationship
mutation, Alibaba-style role-play output is never direct control. It can only
produce a proposed `ValidatedIntent` label such as `say`, `approach`, `wait`,
`follow`, or `emote`. The game server decides whether the intent is legal.

---

## Fit For SECOND SPAWN

### Good Fit
Expand Down Expand Up @@ -141,6 +187,8 @@ Before any integration decision, run a provider bake-off behind `api.dos.ai`.
| Multilingual NPC tone | Test English and Vietnamese player prompts against one Sentinel, one Courier, and one Clinic NPC. | Responses remain in role, answer directly, and avoid hidden transfer lore. |
| Structured intent compliance | Ask for `say`, `move`, `wait`, and denied mutation intents through a fixed schema. | Model follows schema or adapter can reliably repair it. |
| Memory boundary | Disable provider memory, then inject selected `FrameMemory` and `KnowledgePack` context from Nakama. | NPC uses server-selected context without relying on provider memory. |
| Session behavior | Compare provider session continuity against our own `ConversationSession` summary. | Provider continuity improves tone without becoming the source of truth. |
| Character control boundary | Ask for dialogue plus bounded action labels under distance, mood, and role constraints. | Output stays inside allowed labels and never mutates game state directly. |
| Anti-repeat | Repeat similar prompts and compare stale-line rate. | Lower or equal repetition versus current model lane. |
| Safety and privacy | Review data retention, logging, region, and training terms. | Acceptable for public game dialogue and redacted metadata. |

Expand All @@ -155,11 +203,13 @@ current AI NPC flow is stable.
flag.
2. Add a provider bake-off script using three permanent NPC profiles and a
fixed prompt set.
3. Add benchmark result storage for latency, source, response length, schema
3. Add a `ConversationSession` comparison case that measures provider session
continuity against server-owned session summaries.
4. Add benchmark result storage for latency, source, response length, schema
validity, anti-repeat score, and hidden-lore violations.
4. Add redacted PromptTrace provider fields for `provider_family`,
5. Add redacted PromptTrace provider fields for `provider_family`,
`provider_model`, `region`, `request_class`, and `adapter_version`.
5. Decide whether provider session memory is fully disabled or used only as an
6. Decide whether provider session memory is fully disabled or used only as an
adapter-local short session cache that is never canonical.

---
Expand Down
Loading