From 5cf8a0bb6e9622249b00499d9b6d7b933248a353 Mon Sep 17 00:00:00 2001 From: JOY Date: Tue, 26 May 2026 09:43:13 +0700 Subject: [PATCH] docs: design focused NPC portrait dialogue --- ROADMAP.md | 10 +- docs/SUMMARY.md | 1 + docs/design/03-systems-index.md | 6 +- docs/design/30-alpha-ux-flow.md | 4 + .../37-ai-npc-backend-client-roadmap.md | 6 + ...ed-npc-dialogue-portrait-lipsync-design.md | 358 ++++++++++++++++++ docs/setup/play-mode-smoke-checklist.md | 7 +- 7 files changed, 386 insertions(+), 6 deletions(-) create mode 100644 docs/design/56-focused-npc-dialogue-portrait-lipsync-design.md diff --git a/ROADMAP.md b/ROADMAP.md index 41019260..53912f02 100644 --- a/ROADMAP.md +++ b/ROADMAP.md @@ -361,8 +361,10 @@ remember, build relationships, join Gate missions, and change after outcomes. TIME, injury, and relationship hints. Track in [#138](https://github.com/DOS/Second-Spawn/issues/138). - [ ] C2: Stabilize focused RPG-style NPC dialogue with locked facing, typing - isolation, non-combat talk idle, and clear exit behavior. Track in - [#139](https://github.com/DOS/Second-Spawn/issues/139). + isolation, NPC-side portrait, presentation-only speaking animation, + non-combat talk idle, and clear exit behavior. Track in + [#139](https://github.com/DOS/Second-Spawn/issues/139) and + `docs/design/56-focused-npc-dialogue-portrait-lipsync-design.md`. - [ ] C3: Improve ambient NPC society presentation with readable bubbles, nameplates, debug-only source labels, and tendency-driven idle behavior. Track in [#139](https://github.com/DOS/Second-Spawn/issues/139). @@ -593,6 +595,10 @@ prototype. - [ ] Add NPC TTS playback in Unity through scoped `api.dos.ai` voice sessions: Nakama owns authorization, `api.dos.ai` owns provider routing, and Unity only receives short-lived playback/session material. Track in #262. +- [ ] Keep focused NPC dialogue portrait and text-timed speaking animation + independent from full TTS/Convai import, while leaving hook points for future + voice and provider viseme data. Track in #139 and + `docs/design/56-focused-npc-dialogue-portrait-lipsync-design.md`. - [ ] Decide whether Convai remains a phase 1 spike for one boss or hub NPC. The default long-term backbone stays Nakama plus Fusion plus `api.dos.ai`. Track in #262. diff --git a/docs/SUMMARY.md b/docs/SUMMARY.md index fb287578..b7f702a0 100644 --- a/docs/SUMMARY.md +++ b/docs/SUMMARY.md @@ -21,6 +21,7 @@ - [Alpha Questline and Encounter Design](design/29-alpha-questline-and-encounter-design.md) - [Ash Underpass Gate Gameplay Design](design/54-ash-underpass-gate-gameplay-design.md) - [Alpha UX Flow](design/30-alpha-ux-flow.md) +- [Focused NPC Dialogue Portrait and Lip Sync Design](design/56-focused-npc-dialogue-portrait-lipsync-design.md) - [Alpha Economy, Balance, and Playtest Plan](design/31-alpha-economy-balance-and-playtest-plan.md) - [Alpha Art, Audio, and Content Direction](design/32-alpha-art-audio-and-content-direction.md) - [Alpha Production Backlog](design/33-alpha-production-backlog.md) diff --git a/docs/design/03-systems-index.md b/docs/design/03-systems-index.md index 9eb9ca9c..3682abb7 100644 --- a/docs/design/03-systems-index.md +++ b/docs/design/03-systems-index.md @@ -52,7 +52,9 @@ The first alpha execution spine is: for the first questline, encounter, Tollkeeper Shell, rewards, and failure states. 5. [30-alpha-ux-flow.md](30-alpha-ux-flow.md) for the first alpha UX flow and - required player-facing surfaces. + required player-facing surfaces, with + [56-focused-npc-dialogue-portrait-lipsync-design.md](56-focused-npc-dialogue-portrait-lipsync-design.md) + defining the focused NPC portrait and speaking animation contract. 6. [31-alpha-economy-balance-and-playtest-plan.md](31-alpha-economy-balance-and-playtest-plan.md) for alpha tuning defaults, reward rules, playtest script, and balance guardrails. @@ -137,7 +139,7 @@ blockout tasks for the alpha. | 22 | Auth (Nakama + DOS Chain wallet, Supabase sidecar if useful) | Persistence | MVP | Prototype | (TDD pending - reuse DOS.Me pattern as identity bridge reference) | Nakama, thirdweb | | 23 | HUD (combat, level/stats, TIME) | UI | VS | Prototype | [30-alpha-ux-flow.md](30-alpha-ux-flow.md) | Combat, Profile | | 24 | Inventory UI | UI | VS | Design | [45-inventory-and-equipment-system.md](45-inventory-and-equipment-system.md), [30-alpha-ux-flow.md](30-alpha-ux-flow.md) | Inventory persistence | -| 25 | NPC dialogue UI | UI | VS | Not started | [35-alpha-content-and-copy-pack.md](35-alpha-content-and-copy-pack.md), [30-alpha-ux-flow.md](30-alpha-ux-flow.md) | NPC dialogue | +| 25 | NPC dialogue UI | UI | VS | Not started | [35-alpha-content-and-copy-pack.md](35-alpha-content-and-copy-pack.md), [30-alpha-ux-flow.md](30-alpha-ux-flow.md), [56-focused-npc-dialogue-portrait-lipsync-design.md](56-focused-npc-dialogue-portrait-lipsync-design.md) | NPC dialogue | | 26 | Quest tracker UI | UI | VS | Not started | [30-alpha-ux-flow.md](30-alpha-ux-flow.md) | Quest system | | 27 | Reincarnation UI | UI | VS | Not started | [30-alpha-ux-flow.md](30-alpha-ux-flow.md) | Reincarnation flow | | 28 | AI agent activity log UI | UI | VS | Not started | [30-alpha-ux-flow.md](30-alpha-ux-flow.md) | AI agent | diff --git a/docs/design/30-alpha-ux-flow.md b/docs/design/30-alpha-ux-flow.md index 1f413ae0..15491ca9 100644 --- a/docs/design/30-alpha-ux-flow.md +++ b/docs/design/30-alpha-ux-flow.md @@ -116,6 +116,8 @@ Layout: - Status line below title: answering, listening, retrying, failed. - Message area with speaker bubbles. - Player input row at bottom. +- Focused NPC portrait or bust beside the NPC-side message column. See + [56-focused-npc-dialogue-portrait-lipsync-design.md](56-focused-npc-dialogue-portrait-lipsync-design.md). Message alignment: @@ -123,6 +125,8 @@ Message alignment: - Player messages on the right. - Each bubble width should fit content up to a max width, not fill the whole panel by default. +- NPC speaking animation starts when the NPC answer begins revealing or playing, + and stops when the line completes, fails, is skipped, or dialogue exits. Text size: diff --git a/docs/design/37-ai-npc-backend-client-roadmap.md b/docs/design/37-ai-npc-backend-client-roadmap.md index d3aced6f..84a1eeeb 100644 --- a/docs/design/37-ai-npc-backend-client-roadmap.md +++ b/docs/design/37-ai-npc-backend-client-roadmap.md @@ -196,8 +196,14 @@ Client features: - Keep player and NPC locked into dialogue state until exit. - Use bottom RPG-style dialogue panel for 1:1 conversations. - Align player lines and NPC lines clearly. +- Show the focused NPC portrait or bust beside the NPC-side message column. +- Add a presentation-only speaking animation state that can use text-timed, + audio-amplitude, or provider viseme-driven lip sync tiers. - Disable normal movement input while typing. - Use non-combat talk or neutral idle animation during dialogue. +- Use + [56-focused-npc-dialogue-portrait-lipsync-design.md](56-focused-npc-dialogue-portrait-lipsync-design.md) + as the implementation packet for issue #139. ### C3: Ambient NPC Society Presentation diff --git a/docs/design/56-focused-npc-dialogue-portrait-lipsync-design.md b/docs/design/56-focused-npc-dialogue-portrait-lipsync-design.md new file mode 100644 index 00000000..cacfaddd --- /dev/null +++ b/docs/design/56-focused-npc-dialogue-portrait-lipsync-design.md @@ -0,0 +1,358 @@ +# Focused NPC Dialogue Portrait and Lip Sync Design + +*Status: Alpha implementation design* +*Created: 2026-05-26* +*Source of truth level: Player-facing NPC dialogue presentation contract for +issue #139. This doc does not approve broad voice-NPC scope beyond the focused +dialogue presentation slice.* + +--- + +## 1. Purpose + +Focused NPC dialogue should feel like an RPG conversation with a visible person, +not a debug chat log. When the player speaks to one NPC, the UI must show that +NPC's portrait or bust beside the NPC-side chat text, and the NPC should visibly +perform a speaking animation while the response is being delivered. + +The alpha goal is: + +```text +Player focuses one NPC -> dialogue panel opens -> NPC portrait appears on the +NPC text side -> NPC answer types in -> portrait or local NPC model animates as +speaking -> speaking state stops when the line completes. +``` + +This feature supports immersion, NPC identity, screenshot readability, and +later Convai-style facial animation without making Convai or voice a hard +dependency for the first implementation pass. + +--- + +## 2. Player-Facing Contract + +Required alpha behavior: + +- NPC lines appear on the NPC side of the focused dialogue panel. +- The focused NPC portrait, bust, or render texture appears next to the NPC + message column, not floating as an unrelated HUD decoration. +- Player lines remain on the opposite side and do not show the NPC portrait. +- The portrait area shows the focused NPC display name, role, and compact + response state when useful. +- While the NPC answer is being revealed or played, the portrait or in-world NPC + enters a speaking state. +- The speaking state ends promptly when the NPC line finishes, fails, is + skipped, or the player exits dialogue. +- Fallback, timeout, or backoff states remain honest. The UI must not fake a + model-backed answer when the backend returned a degraded path. + +The portrait should make identity legible even before final character art is +ready. A good alpha fallback is a framed render texture, bust crop, generated +portrait, or authored 2D portrait keyed by actor profile. + +--- + +## 3. Layout Rules + +Default focused dialogue layout: + +| Area | Rule | +| ---- | ---- | +| NPC column | Left side by default. Contains NPC message bubbles and portrait or bust. | +| Player column | Right side by default. Contains player message bubbles only. | +| Portrait anchor | Attached to NPC column, close enough that the player reads it as the current speaker. | +| Text input | Bottom row, full width or player-side aligned. | +| Status | Compact text near the NPC title or portrait, not repeated in every bubble. | + +Rules: + +- Do not cover the active objective tracker, TIME, HP, or critical combat HUD. +- Do not use a giant modal that hides the entire world unless a later cinematic + dialogue mode is explicitly designed. +- Hide or reduce the focused NPC overhead nameplate while the dialogue panel is + open, because the panel now owns focused identity. +- Keep ambient speech bubbles separate from focused dialogue. Ambient bubbles do + not need portraits in alpha. +- On ultrawide and QHD monitors, keep the dialogue panel readable without + stretching NPC bubbles across the whole screen. +- On small windows, the portrait may reduce to a compact square avatar, but it + must stay associated with NPC-side text. + +--- + +## 4. Portrait Source Priority + +Use this order when implementing assets: + +1. **Actor-profile portrait key**: stable portrait id stored or derived from + the NPC actor profile. +2. **Authored 2D portrait**: preferred for important permanent NPCs when art is + available. +3. **Render texture bust**: acceptable for alpha if the current NPC model, + camera angle, lighting, and performance are stable. +4. **Generated placeholder portrait**: acceptable only if clearly kept in the + prototype art folder and replaced before public marketing. +5. **Initials or silhouette fallback**: allowed for missing portraits, with + role and display name still visible. + +The portrait key is presentation data. It must not decide gameplay authority, +inventory, relationship, TIME, SECOND, quest state, or combat state. + +Recommended actor presentation fields: + +```text +portrait_key +portrait_variant +dialogue_bust_prefab_key +voice_profile +lip_sync_profile +speaking_animation_profile +``` + +These fields can live in presentation metadata that maps from existing actor +profile ids. They do not need to be complete backend schema for the first UI +pass if a local catalog is safer. + +--- + +## 5. Speaking Animation Tiers + +The design supports three implementation tiers. Dev should ship tier 1 first if +Convai, voice, or facial rig setup is not ready. + +### Tier 1: Text-Timed Speaking Fallback + +Use when the NPC response is text-only or when no viseme stream exists. + +Behavior: + +- Start a speaking state when NPC text reveal begins. +- Drive simple mouth-open, jaw, portrait frame, or talk-idle animation based on + text reveal progress, punctuation, and estimated syllable density. +- Pause or reduce motion at commas, periods, and line breaks. +- Stop the animation when the line finishes or is skipped. + +This is not true lip sync. It is a readable alpha fallback that makes the NPC +feel alive before voice and viseme data are ready. + +### Tier 2: Audio-Amplitude Speaking + +Use when the NPC response has TTS audio but no full viseme/blendshape stream. + +Behavior: + +- Start the speaking state when playback starts. +- Use audio amplitude or envelope to drive mouth-open intensity. +- Stop when playback ends or is interrupted. +- Keep text reveal aligned to audio playback when possible. + +This is acceptable for stylized alpha characters and lower-detail portraits. + +### Tier 3: Viseme Or Blendshape Lip Sync + +Use when Convai or another provider returns viseme, blendshape, or facial +animation frames and the character rig supports the required targets. + +Behavior: + +- Map provider facial data into the character-specific lip sync profile. +- Apply blendshape or bone effectors to jaw, mouth, tongue, and expression + targets. +- Preserve a safe fallback when a body has no compatible face rig. +- Treat provider animation data as presentation only. + +Convai's Unity documentation describes a lip-sync path where backend facial data +is parsed by the SDK and applied through a Convai LipSync component using +blendshape and bone effectors. It also supports preset or custom effector lists. +This matches our tier 3 direction, but it should remain behind the isolated +Convai import lane and not be required for the first focused dialogue UI pass. + +Reference: + +- [Convai Unity: Adding Lip-Sync to your Character](https://docs.convai.com/api-docs/plugins-and-integrations/unity-plugin/adding-lip-sync-to-your-character) + +--- + +## 6. Conversation State Machine + +```text +idle + -> focus_requested + -> focused_listening + -> player_line_sent + -> npc_pending + -> npc_speaking + -> focused_listening + -> exit_requested + -> idle +``` + +Failure branches: + +```text +npc_pending -> npc_fallback_speaking -> focused_listening +npc_pending -> npc_timeout_visible -> focused_listening +npc_speaking -> skipped -> focused_listening +npc_speaking -> exit_requested -> idle +``` + +State rules: + +- `npc_speaking` is a presentation state. It does not grant rewards, complete + quests, update relationship, or write memory by itself. +- Backend-authoritative dialogue, memory, and relationship writes still happen + through Nakama-owned paths. +- If the NPC reply is stale, rejected, or superseded, the speaking state must + stop and show the honest reason. +- If a new focused NPC is selected, the old portrait and speaking animation + clear before the new conversation starts. + +--- + +## 7. Data And Authority Boundaries + +Unity owns: + +- portrait placement +- portrait or bust animation playback +- typewriter reveal +- local speaking animation state +- local audio playback state +- debug display of source, fallback, timeout, or backoff + +Nakama owns: + +- player auth +- focused dialogue request acceptance +- response provenance +- relationship and memory writes +- quest clue validation +- rate limits and cooldown state + +`api.dos.ai` owns: + +- model provider routing +- future voice provider routing +- prompt safety +- optional TTS or facial-data provider response shaping + +Convai, if used, owns: + +- provider-specific dialogue or avatar animation transport for the isolated + Convai lane only +- optional viseme, blendshape, or facial animation data + +Convai or any provider must not own canonical NPC memory, relationship, quest, +TIME, SECOND, inventory, combat, or body lifecycle state. + +--- + +## 8. Implementation Packets + +### D1: Focused Dialogue Portrait UI + +Issue: #139 + +Build: + +- Add a portrait anchor to the focused dialogue panel. +- Resolve portrait presentation from the focused actor profile or local + presentation catalog. +- Show missing-portrait fallback without breaking layout. +- Hide or reduce focused overhead nameplate while the panel is active. + +Evidence: + +- Screenshot with NPC line, player line, portrait, NPC name, and status visible. +- Screenshot or note for missing-portrait fallback. + +### D2: Speaking State And Text-Timed Mouth Fallback + +Issue: #139 + +Build: + +- Add a presentation-only `npc_speaking` state. +- Start speaking animation while NPC text is being revealed. +- Stop on line complete, skip, exit, timeout, or stale reply. +- Keep movement input disabled only while focused dialogue rules require it. + +Evidence: + +- Short capture or test note showing speaking starts and stops with one NPC + answer. +- Fallback path note showing timeout or deterministic reply does not leave the + portrait stuck talking. + +### D3: Voice And Viseme Readiness Hook + +Issues: #139, #262 + +Build: + +- Add hook points for future TTS playback and provider viseme frames. +- Do not require Convai to be imported for D1 or D2. +- When Convai is imported later, map compatible bodies through a + character-specific lip sync profile instead of one global face assumption. + +Evidence: + +- Inspector or debug note showing which NPC presentation profile has text-only, + audio-amplitude, or viseme-capable mode. + +### D4: Play Mode Smoke Update + +Issues: #139, #140 + +Build: + +- Extend the NPC Dialogue smoke checklist with portrait and speaking-state + checks. +- Record one successful model-backed or scripted-smoke reply and one degraded + fallback case. + +Evidence: + +- Smoke checklist notes include focused NPC, portrait source, speaking tier, and + degraded-state behavior. + +--- + +## 9. Acceptance Criteria + +The feature is ready for alpha when: + +- Focused NPC dialogue shows the focused NPC portrait beside NPC-side text. +- NPC and player message directions are visually distinct. +- The NPC speaking animation starts when the NPC answer begins revealing or + playing. +- The speaking animation stops reliably when the line completes, fails, is + skipped, or the player exits. +- A missing portrait uses a stable fallback without layout breakage. +- Ambient NPC speech remains lightweight and does not require portraits. +- Debug status can show source, fallback, timeout, or backoff without dominating + the player-facing UI. +- LLM/provider output still cannot mutate authoritative game state. +- Play Mode smoke captures focused dialogue with portrait, speaking animation, + input capture, facing lock, and degraded-state honesty. + +--- + +## 10. Cut Rules + +If implementation time is tight: + +- Keep portrait + text-timed speaking fallback. +- Cut render texture busts. +- Cut full facial blendshape mapping. +- Cut voice playback. +- Cut emotion-specific expression blending. +- Keep the provider hook documented but disabled. + +Do not cut: + +- honest fallback/timeout display +- speaking-state cleanup on exit or failure +- server-authority boundary +- readable focused dialogue layout + diff --git a/docs/setup/play-mode-smoke-checklist.md b/docs/setup/play-mode-smoke-checklist.md index c9f24508..b4a8e2a0 100644 --- a/docs/setup/play-mode-smoke-checklist.md +++ b/docs/setup/play-mode-smoke-checklist.md @@ -54,6 +54,8 @@ too early can mix stale cleanup asserts into the next run. | Input capture | Typing in the chat box does not move, attack, jump, or interact the player. | | Reply source | NPC status shows `AI DOS.AI` for model replies, or an honest fallback/backoff/timeout label when degraded. | | Bubble | The NPC reply appears in the focused chat panel and as a readable world bubble without overlapping identity text. | +| Portrait | The focused NPC portrait or bust appears beside the NPC-side text and clears when dialogue exits. | +| Speaking animation | The NPC portrait, bust, or in-world model enters speaking state while the NPC line reveals or plays, then stops when the line completes, fails, is skipped, or exits. | | Dialogue stance | Player and NPC face each other and remain in dialogue stance while chat mode is active. | | Exit | Press `Esc`; player movement resumes and dialogue stance clears promptly. | @@ -152,5 +154,6 @@ For a PR or issue update, record: - Note if the only remaining console warning is the Visual Studio Unity messaging UDP port warning. That warning is editor integration noise and does not block gameplay smoke. -- Screenshot or short note for focused chat, one ambient NPC line, quest - tracker, debug surface, and outcome report when that flow exists. +- Screenshot or short note for focused chat with portrait, speaking-animation + tier, one ambient NPC line, quest tracker, debug surface, and outcome report + when that flow exists.