Add next-word prediction core to the candidate list by mkpoli · Pull Request #43 · mkpoli/ainuKey

mkpoli · 2026-07-01T17:02:26Z

Why

The IME's autocompletion feels like it "only completes one word" — because it literally does. CandidateList::build only completes the current partial word; the moment a word is committed the suggestion engine goes dormant (key_event_sink.rs:100 gates on a non-empty buffer, candidates.rs:46 returns an empty list for an empty word). It never predicts the next word, even though the n-gram engine already can (Suggestions::predict_scores, next_words, default_words).

This is the first of two PRs adding next-word prediction (azooKey-style predictive continuation). Note: for Ainu the "conversion" problem that drives Japanese IMEs (kanji homophones) barely applies — output is kana/latin — so the real win is predictive continuation, which is exactly what's missing.

What

Adds CandidateList::predictions(prev2, prev1, suggest, max):

Ranks candidates by the blended trigram+bigram context scores (predict_scores), best first, alphabetical tie-break for a stable order.
Fills remaining slots with the globally most-frequent words so the popup is useful at a cold start (unknown/sparse context).
Duplicate-free, capped at max. Unlike build, there is no "typed word" at index 0 — every entry is a real predicted word.

Pure, host-independent logic. Four new unit tests (context-first ranking, trigram-beats-bigram, cold-start frequency fallback, max/dedup) — all green, and cargo clippy -D warnings clean on the MSVC target.

Scope

This PR is logic only — nothing yet consumes predictions(). The TSF integration (a no-composition predictive popup, accept-and-insert, and the key gating that surfaces it after a commit) lands in a follow-up PR, which will need Windows verification since TSF behavior can't be exercised on Linux.

Summary by CodeRabbit

New Features
- Added smarter next-word suggestions that prioritize context when available, fall back to common words when not, and avoid duplicate entries.
- Suggestions are now consistently ordered, with the most relevant matches shown first and ties broken predictably.
Tests
- Added coverage for context-aware ranking, cold-start suggestions, result limits, and duplicate-free behavior.

`CandidateList::build` only ever *completes the current word*, so once a word is committed the suggestion engine goes dormant until the user starts typing the next partial word. The n-gram engine already predicts the following word (`Suggestions::predict_scores` / `default_words`), but nothing surfaces it. Add `CandidateList::predictions(prev2, prev1, suggest, max)`: a next-word list ranked by the blended trigram+bigram context scores, with the most frequent words filling any remaining slots for a useful cold start. Unlike `build`, there is no "typed word" at index 0 — every entry is a real predicted word. Pure, host-independent logic with unit tests (context-first ranking, trigram-beats-bigram, cold-start frequency fallback, max/dedup). The TSF key/window wiring that consumes this lands in a follow-up PR.

coderabbitai · 2026-07-01T17:02:45Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 0e475de9-7128-4e61-abd1-caa08ef3c310

📥 Commits

Reviewing files that changed from the base of the PR and between 26561ac and 01678f6.

📒 Files selected for processing (1)

src/candidates.rs

📝 Walkthrough

Walkthrough

Adds a new CandidateList::predictions constructor in src/candidates.rs that builds a duplicate-free, capped list of next-word predictions, ranking context-based scores first and falling back to default frequency-based words. Adds corresponding unit tests.

Changes

Next-word Prediction Candidate List

Layer / File(s)	Summary
Predictions constructor and tests `src/candidates.rs`	Adds `CandidateList::predictions(prev2, prev1, suggest, max)`, which ranks context-predicted words first using `suggest.predict_scores`, sorts by descending score with alphabetical tie-break, fills remaining slots with `suggest.default_words(max)` deduplicated via `HashSet`, caps at `max`, and initializes selection index to 0; adds tests for context-first ranking, trigram-over-bigram precedence, cold-start frequency fallback, and max/dedup enforcement.

Estimated code review effort: 2 (Simple) | ~10 minutes

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title clearly summarizes the main change: adding next-word prediction support to the candidate list.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

✨ Finishing Touches

📝 Generate docstrings

Create stacked PR
Commit on current branch

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/next-word-prediction-core

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands.}

mkpoli mentioned this pull request Jul 1, 2026

Surface next-word predictions after a commit #44

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add next-word prediction core to the candidate list#43

Add next-word prediction core to the candidate list#43
mkpoli wants to merge 1 commit into
masterfrom
feat/next-word-prediction-core

mkpoli commented Jul 1, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

coderabbitai Bot commented Jul 1, 2026 •

edited

Loading

Walkthrough

Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mkpoli commented Jul 1, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why

What

Scope

Summary by CodeRabbit

Uh oh!

coderabbitai Bot commented Jul 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

mkpoli commented Jul 1, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jul 1, 2026 •

edited

Loading