From ff1736cdf7a36974c73202b2add56708bc219201 Mon Sep 17 00:00:00 2001
From: JOY <joy@joy.vn>
Date: Tue, 26 May 2026 07:43:51 +0700
Subject: [PATCH] docs: expand role-play provider evaluation

---
 .../52-llm-role-play-provider-evaluation.md   | 56 ++++++++++++++++++-
 1 file changed, 53 insertions(+), 3 deletions(-)

diff --git a/docs/design/52-llm-role-play-provider-evaluation.md b/docs/design/52-llm-role-play-provider-evaluation.md
index a6c6fcff..a9ed7692 100644
--- a/docs/design/52-llm-role-play-provider-evaluation.md
+++ b/docs/design/52-llm-role-play-provider-evaluation.md
@@ -53,6 +53,52 @@ References:
 
 ---
 
+## Technical Patterns Worth Studying
+
+Alibaba Role Play is useful as a reference source for how a model provider
+packages character control, not as a place to store SECOND SPAWN state.
+
+The useful patterns are:
+
+- Character profile as a first-class request input, separate from the player's
+  current message.
+- Session identity as an adapter concern, so the model can keep short local
+  dialogue continuity without the game treating that session as durable memory.
+- Role-play-tuned model names that can be benchmarked separately from generic
+  chat models.
+- Provider-specific memory and session controls that can inspire our own
+  `ConversationSession` contract while staying non-canonical.
+- OpenAI-compatible transport shape, which lowers adapter cost for
+  `api.dos.ai`.
+- Response normalization into a small game contract: dialogue candidate,
+  action intent candidate, validation metadata, latency, and fallback reason.
+
+The pattern SECOND SPAWN should copy is the **shape** of character-conditioned
+requests:
+
+```text
+server-owned actor profile
++ server-selected memory summary
++ current conversation objective
++ player or nearby NPC utterance
++ allowed action labels
+-> provider request
+-> dialogue or intent candidate
+-> Nakama/Fusion validation
+```
+
+The pattern SECOND SPAWN should not copy is provider-owned game memory. Any
+provider session or memory feature is adapter-local cache at most. The durable
+record remains `FrameMemory`, `RelationshipLedger`, `KnowledgePack`,
+`ConversationSession`, and `PromptTrace` in Nakama-owned storage.
+
+For character movement, combat, quest, inventory, TIME, SECOND, or relationship
+mutation, Alibaba-style role-play output is never direct control. It can only
+produce a proposed `ValidatedIntent` label such as `say`, `approach`, `wait`,
+`follow`, or `emote`. The game server decides whether the intent is legal.
+
+---
+
 ## Fit For SECOND SPAWN
 
 ### Good Fit
@@ -141,6 +187,8 @@ Before any integration decision, run a provider bake-off behind `api.dos.ai`.
 | Multilingual NPC tone | Test English and Vietnamese player prompts against one Sentinel, one Courier, and one Clinic NPC. | Responses remain in role, answer directly, and avoid hidden transfer lore. |
 | Structured intent compliance | Ask for `say`, `move`, `wait`, and denied mutation intents through a fixed schema. | Model follows schema or adapter can reliably repair it. |
 | Memory boundary | Disable provider memory, then inject selected `FrameMemory` and `KnowledgePack` context from Nakama. | NPC uses server-selected context without relying on provider memory. |
+| Session behavior | Compare provider session continuity against our own `ConversationSession` summary. | Provider continuity improves tone without becoming the source of truth. |
+| Character control boundary | Ask for dialogue plus bounded action labels under distance, mood, and role constraints. | Output stays inside allowed labels and never mutates game state directly. |
 | Anti-repeat | Repeat similar prompts and compare stale-line rate. | Lower or equal repetition versus current model lane. |
 | Safety and privacy | Review data retention, logging, region, and training terms. | Acceptable for public game dialogue and redacted metadata. |
 
@@ -155,11 +203,13 @@ current AI NPC flow is stable.
    flag.
 2. Add a provider bake-off script using three permanent NPC profiles and a
    fixed prompt set.
-3. Add benchmark result storage for latency, source, response length, schema
+3. Add a `ConversationSession` comparison case that measures provider session
+   continuity against server-owned session summaries.
+4. Add benchmark result storage for latency, source, response length, schema
    validity, anti-repeat score, and hidden-lore violations.
-4. Add redacted PromptTrace provider fields for `provider_family`,
+5. Add redacted PromptTrace provider fields for `provider_family`,
    `provider_model`, `region`, `request_class`, and `adapter_version`.
-5. Decide whether provider session memory is fully disabled or used only as an
+6. Decide whether provider session memory is fully disabled or used only as an
    adapter-local short session cache that is never canonical.
 
 ---