thaime

Thai Input Method Editor | THA-IME | ไทยมี

THAIME is a Latin-to-Thai input method editor. Type Thai using romanized Latin keystrokes on a standard QWERTY keyboard. The engine takes in the sequence of latin characters and predicts your desired Thai characters with classical NLP technologies (no ML yet). Think of it like typing in Chinese Pinyin or Japanese Romaji, but for Thai.

How It Works

You type Latin characters (e.g., sawasdee)
The engine runs prefix search on a trie-based dictionary at every position in the input
A lattice of possible Thai words is built from all possible word spans, then scored using the Viterbi algorithm with n-gram context
Ranked Thai candidates are presented (e.g., สวัสดี)
You select a candidate, which is committed as Thai text and feeds back into the context window

Web Demo

Try THAIME in your browser: Web Demo

The web demo runs the full engine client-side via WebAssembly - no servers required.

Installation

THAIME is still under active development; no user-friendly installation methods are provided yet.

User friendly releases are planned for Q3-4 2026 and beyond. See Roadmap section below.

For developers, use the CLI/TUI, or try out the web demo. See the Installation Guide for details.

Project Structure

thaime/
├── crates/
│   ├── thaime_engine/     Core library (lib + cdylib + staticlib)
│   │   ├── lib.rs         ThaiMeEngine struct + C ABI exports
│   │   ├── trie.rs        Double-array trie dictionary (yada)
│   │   ├── ranking.rs     Viterbi k-best candidate ranking
│   │   ├── ngram.rs       N-gram language model (Stupid Backoff)
│   │   ├── context.rs     Input session state machine
│   │   ├── config.rs      Tunable parameters and constants
│   │   ├── keymap.rs      Latin → Thai mapping (planned)
│   │   └── validate.rs    Thai sequence validation (planned)
│   ├── thaime_cli/        Interactive CLI test harness
│   ├── thaime_tui/        Ratatui-based visual debugger
│   ├── thaime_dictgen/    Dictionary compiler (JSON → binary trie)
│   └── thaime_wasm/       wasm-bindgen wrapper for browser
├── web/                   React + TypeScript web demo
├── frontends/
│   └── ibus/              IBus engine frontend (planned)
├── data/
│   ├── dict/              Compiled dictionary binaries (versioned)
│   └── input/             Source JSON + n-gram binaries
├── build.sh               Full build pipeline script
└── tests/                 Regression test data (TOML)

Dictionary and n-gram data is generated from the companion thaime-nlp repository, which handles NLP research, corpus processing, and romanization variant generation.

Building

Requires the stable Rust toolchain. Install via rustup if needed.

# Quick start (dictionary binaries are committed to the repo)
cargo build --workspace
cargo test --workspace

# Full pipeline: regenerate dict, build workspace, WASM, and web demo
./build.sh

See the Build Guide for prerequisites, feature flags, WASM setup, and CI details.

CLI

cargo run -p thaime_cli

The CLI is an interactive REPL for testing the engine:

THAIME CLI v0.5.0
Commands: :q quit, :r reset, :b backspace, :cc clear context

<BOS> > sawatdee
   #  Thai              Total    Freq   Ngram  SegPen  Words
   1  สวัสดี              6.81    5.81    0.00    1.00      1

<BOS> > mai
   #  Thai              Total    Freq   Ngram  SegPen  Words
   1  ไม่                5.34    4.34    0.00    1.00      1
   2  ไหม                6.30    5.30    0.00    1.00      1
   3  ใหม่               6.52    5.52    0.00    1.00      1

Type Latin characters (a-z) to build input and see candidates
Enter a number (1-9) to commit that candidate
Press Enter to commit the top candidate
:b backspace, :r reset, :cc clear context, :q quit

TUI

cargo run -p thaime_tui

The TUI is a ratatui-based visual debugger with four modes:

Main - Live candidate exploration with score decomposition and real-time parameter tuning (lambda, ngram_weight, alpha, k, min_freq)
Lattice - View all word lattice edges for the current input
Inspector - Trie explorer with optional "why not?" target word diagnosis
Regression - Run and view regression test results from TOML test files

Documentation

Document	Description
Architecture	High-level architecture, module map, data flow, two-repo design
Algorithm	Engine internals: trie, lattice, Viterbi, n-gram, scoring formulas
Build Guide	Prerequisites, `build.sh`, dict generation, WASM, CI/CD
Installation	Installation options and planned frontend support

Roadmap

THAIME is in pre-alpha - the core engine works, but packaging and platform support are still in progress.

Now - Pre-alpha development

Core engine: trie dictionary, Viterbi ranking, n-gram context scoring
NLP systems improvements - see thaime-nlp repo for research progress
Interactive CLI and TUI debugger for development
Web demo via WebAssembly (client-side, no server)

Q3 2026 - Linux community package alpha

First packaged releases targeting early adopters on Linux:

Fedora via COPR
Ubuntu via Launchpad PPA
IBus frontend with basic compose/commit workflow
Feedback-driven iteration on dictionary coverage and ranking quality

Q4 2026 - Wider Linux beta

Broader distribution packaging and more frontend support:

Fedora / RHEL via dnf
Ubuntu / Debian via apt
Arch via AUR
Other major distros as demand warrants
Wider frontend support (Fcitx5 and more)
Open for community contributions

2027 and beyond - Windows / macOS public release

Windows IME (TSF) and macOS input method frontends
Cross-platform installer / package manager support
Continued improvements to dictionary, ranking, and language model

This roadmap is tentative and subject to change.

Contributing

This is still a solo developer project, but I aim to open the project up for contributions by the end of 2026.

License

MPL-2.0

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
.github/workflows		.github/workflows
crates		crates
data		data
docs		docs
scripts		scripts
tests		tests
web		web
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
LICENSE-HEADER		LICENSE-HEADER
README.md		README.md
cbindgen.toml		cbindgen.toml
licenserc.toml		licenserc.toml
thaime-data.toml		thaime-data.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

thaime

How It Works

Web Demo

Installation

Project Structure

Building

CLI

TUI

Documentation

Roadmap

Now - Pre-alpha development

Q3 2026 - Linux community package alpha

Q4 2026 - Wider Linux beta

2027 and beyond - Windows / macOS public release

Contributing

License

About

Licenses found

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

thaime

How It Works

Web Demo

Installation

Project Structure

Building

CLI

TUI

Documentation

Roadmap

Now - Pre-alpha development

Q3 2026 - Linux community package alpha

Q4 2026 - Wider Linux beta

2027 and beyond - Windows / macOS public release

Contributing

License

About

Topics

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages