Skip to content

Add next-word prediction core to the candidate list#43

Open
mkpoli wants to merge 1 commit into
masterfrom
feat/next-word-prediction-core
Open

Add next-word prediction core to the candidate list#43
mkpoli wants to merge 1 commit into
masterfrom
feat/next-word-prediction-core

Conversation

@mkpoli

@mkpoli mkpoli commented Jul 1, 2026

Copy link
Copy Markdown
Owner

Why

The IME's autocompletion feels like it "only completes one word" — because it literally does. CandidateList::build only completes the current partial word; the moment a word is committed the suggestion engine goes dormant (key_event_sink.rs:100 gates on a non-empty buffer, candidates.rs:46 returns an empty list for an empty word). It never predicts the next word, even though the n-gram engine already can (Suggestions::predict_scores, next_words, default_words).

This is the first of two PRs adding next-word prediction (azooKey-style predictive continuation). Note: for Ainu the "conversion" problem that drives Japanese IMEs (kanji homophones) barely applies — output is kana/latin — so the real win is predictive continuation, which is exactly what's missing.

What

Adds CandidateList::predictions(prev2, prev1, suggest, max):

  • Ranks candidates by the blended trigram+bigram context scores (predict_scores), best first, alphabetical tie-break for a stable order.
  • Fills remaining slots with the globally most-frequent words so the popup is useful at a cold start (unknown/sparse context).
  • Duplicate-free, capped at max. Unlike build, there is no "typed word" at index 0 — every entry is a real predicted word.

Pure, host-independent logic. Four new unit tests (context-first ranking, trigram-beats-bigram, cold-start frequency fallback, max/dedup) — all green, and cargo clippy -D warnings clean on the MSVC target.

Scope

This PR is logic only — nothing yet consumes predictions(). The TSF integration (a no-composition predictive popup, accept-and-insert, and the key gating that surfaces it after a commit) lands in a follow-up PR, which will need Windows verification since TSF behavior can't be exercised on Linux.

Summary by CodeRabbit

  • New Features
    • Added smarter next-word suggestions that prioritize context when available, fall back to common words when not, and avoid duplicate entries.
    • Suggestions are now consistently ordered, with the most relevant matches shown first and ties broken predictably.
  • Tests
    • Added coverage for context-aware ranking, cold-start suggestions, result limits, and duplicate-free behavior.

`CandidateList::build` only ever *completes the current word*, so once a
word is committed the suggestion engine goes dormant until the user starts
typing the next partial word. The n-gram engine already predicts the
following word (`Suggestions::predict_scores` / `default_words`), but
nothing surfaces it.

Add `CandidateList::predictions(prev2, prev1, suggest, max)`: a next-word
list ranked by the blended trigram+bigram context scores, with the most
frequent words filling any remaining slots for a useful cold start. Unlike
`build`, there is no "typed word" at index 0 — every entry is a real
predicted word.

Pure, host-independent logic with unit tests (context-first ranking,
trigram-beats-bigram, cold-start frequency fallback, max/dedup). The TSF
key/window wiring that consumes this lands in a follow-up PR.
@coderabbitai

coderabbitai Bot commented Jul 1, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0e475de9-7128-4e61-abd1-caa08ef3c310

📥 Commits

Reviewing files that changed from the base of the PR and between 26561ac and 01678f6.

📒 Files selected for processing (1)
  • src/candidates.rs

📝 Walkthrough

Walkthrough

Adds a new CandidateList::predictions constructor in src/candidates.rs that builds a duplicate-free, capped list of next-word predictions, ranking context-based scores first and falling back to default frequency-based words. Adds corresponding unit tests.

Changes

Next-word Prediction Candidate List

Layer / File(s) Summary
Predictions constructor and tests
src/candidates.rs
Adds CandidateList::predictions(prev2, prev1, suggest, max), which ranks context-predicted words first using suggest.predict_scores, sorts by descending score with alphabetical tie-break, fills remaining slots with suggest.default_words(max) deduplicated via HashSet, caps at max, and initializes selection index to 0; adds tests for context-first ranking, trigram-over-bigram precedence, cold-start frequency fallback, and max/dedup enforcement.

Estimated code review effort: 2 (Simple) | ~10 minutes

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: adding next-word prediction support to the candidate list.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/next-word-prediction-core

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant