From ff1736cdf7a36974c73202b2add56708bc219201 Mon Sep 17 00:00:00 2001 From: JOY Date: Tue, 26 May 2026 07:43:51 +0700 Subject: [PATCH] docs: expand role-play provider evaluation --- .../52-llm-role-play-provider-evaluation.md | 56 ++++++++++++++++++- 1 file changed, 53 insertions(+), 3 deletions(-) diff --git a/docs/design/52-llm-role-play-provider-evaluation.md b/docs/design/52-llm-role-play-provider-evaluation.md index a6c6fcff..a9ed7692 100644 --- a/docs/design/52-llm-role-play-provider-evaluation.md +++ b/docs/design/52-llm-role-play-provider-evaluation.md @@ -53,6 +53,52 @@ References: --- +## Technical Patterns Worth Studying + +Alibaba Role Play is useful as a reference source for how a model provider +packages character control, not as a place to store SECOND SPAWN state. + +The useful patterns are: + +- Character profile as a first-class request input, separate from the player's + current message. +- Session identity as an adapter concern, so the model can keep short local + dialogue continuity without the game treating that session as durable memory. +- Role-play-tuned model names that can be benchmarked separately from generic + chat models. +- Provider-specific memory and session controls that can inspire our own + `ConversationSession` contract while staying non-canonical. +- OpenAI-compatible transport shape, which lowers adapter cost for + `api.dos.ai`. +- Response normalization into a small game contract: dialogue candidate, + action intent candidate, validation metadata, latency, and fallback reason. + +The pattern SECOND SPAWN should copy is the **shape** of character-conditioned +requests: + +```text +server-owned actor profile ++ server-selected memory summary ++ current conversation objective ++ player or nearby NPC utterance ++ allowed action labels +-> provider request +-> dialogue or intent candidate +-> Nakama/Fusion validation +``` + +The pattern SECOND SPAWN should not copy is provider-owned game memory. Any +provider session or memory feature is adapter-local cache at most. The durable +record remains `FrameMemory`, `RelationshipLedger`, `KnowledgePack`, +`ConversationSession`, and `PromptTrace` in Nakama-owned storage. + +For character movement, combat, quest, inventory, TIME, SECOND, or relationship +mutation, Alibaba-style role-play output is never direct control. It can only +produce a proposed `ValidatedIntent` label such as `say`, `approach`, `wait`, +`follow`, or `emote`. The game server decides whether the intent is legal. + +--- + ## Fit For SECOND SPAWN ### Good Fit @@ -141,6 +187,8 @@ Before any integration decision, run a provider bake-off behind `api.dos.ai`. | Multilingual NPC tone | Test English and Vietnamese player prompts against one Sentinel, one Courier, and one Clinic NPC. | Responses remain in role, answer directly, and avoid hidden transfer lore. | | Structured intent compliance | Ask for `say`, `move`, `wait`, and denied mutation intents through a fixed schema. | Model follows schema or adapter can reliably repair it. | | Memory boundary | Disable provider memory, then inject selected `FrameMemory` and `KnowledgePack` context from Nakama. | NPC uses server-selected context without relying on provider memory. | +| Session behavior | Compare provider session continuity against our own `ConversationSession` summary. | Provider continuity improves tone without becoming the source of truth. | +| Character control boundary | Ask for dialogue plus bounded action labels under distance, mood, and role constraints. | Output stays inside allowed labels and never mutates game state directly. | | Anti-repeat | Repeat similar prompts and compare stale-line rate. | Lower or equal repetition versus current model lane. | | Safety and privacy | Review data retention, logging, region, and training terms. | Acceptable for public game dialogue and redacted metadata. | @@ -155,11 +203,13 @@ current AI NPC flow is stable. flag. 2. Add a provider bake-off script using three permanent NPC profiles and a fixed prompt set. -3. Add benchmark result storage for latency, source, response length, schema +3. Add a `ConversationSession` comparison case that measures provider session + continuity against server-owned session summaries. +4. Add benchmark result storage for latency, source, response length, schema validity, anti-repeat score, and hidden-lore violations. -4. Add redacted PromptTrace provider fields for `provider_family`, +5. Add redacted PromptTrace provider fields for `provider_family`, `provider_model`, `region`, `request_class`, and `adapter_version`. -5. Decide whether provider session memory is fully disabled or used only as an +6. Decide whether provider session memory is fully disabled or used only as an adapter-local short session cache that is never canonical. ---