High-performance, local text embeddings for Dart and Flutter. A Dart wrapper around model2vec-rs using Rust FFI and Native Assets. Model2Vec creates small, fast, and effective text embeddings by distilling knowledge from large language models into a simple vocabulary-based look-up table.
- model2vec
- Extreme Performance: Built on top of a highly optimized Rust engine. Up to ~1.7x faster than the official Python implementation, generating embeddings in microseconds.
- Compact & Quantized: Models are typically 25MB - 100MB. Perfect for edge computing.
- Massive Streaming: Built-in
generateEmbeddingStreamfor processing millions of rows without blocking the Event Loop or overflowing RAM. - Hugging Face Integration: Automatically downloads and caches models directly from the Hugging Face Hub.
- Zero-Stutter Async: Transparently runs heavy tokenization and math in background Dart Isolates using
Asyncmethods. - Vector Utilities: Ships with high-performance mathematical tools (
cosineSimilarity,quantizeToInt8,similaritySearch, etc.).
Model2Vec provides a variety of pre-trained models optimized for different use cases. These can be loaded directly via their Hugging Face model ID.
| Model ID | Language | Distilled From | Params | Dimension | Size |
|---|---|---|---|---|---|
minishlab/potion-base-32M |
English | bge-base-en-v1.5 | 32.3M | 512 | ~150MB |
minishlab/potion-multilingual-128M |
Multi | bge-m3 | 128M | 768 | ~500MB |
minishlab/potion-retrieval-32M |
English | bge-base-en-v1.5 | 32.3M | 512 | ~150MB |
minishlab/potion-code-16M |
Code | CodeRankEmbed | 16M | 384 | ~80MB |
minishlab/potion-base-8M |
English | bge-base-en-v1.5 | 7.5M | 256 | ~50MB |
minishlab/potion-base-4M |
English | bge-base-en-v1.5 | 3.7M | 128 | ~30MB |
minishlab/potion-base-2M |
English | bge-base-en-v1.5 | 1.8M | 64 | ~25MB |
Add model2vec to your pubspec.yaml:
dependencies:
model2vec: anyOr add it using the command line:
dart pub add model2vecRequires Dart SDK: 3.10.0+ and Rust toolchain: 1.86.0+ (to build the native library via Native Assets).
import 'package:model2vec/model2vec.dart';
void main() {
final m2v = Model2Vec.instance;
// Initialize with a model from Hugging Face
m2v.initEmbedder('minishlab/potion-base-2M');
// Generate an embedding
final embedding = m2v.generateEmbedding('Dart FFI is blazingly fast 🚀');
print('Vector dimension: ${m2v.embeddingDimension}');
print('Vocabulary size: ${m2v.vocabularySize}');
}Process multiple strings at once for maximum hardware utilization. You can control sequence truncation and batch sizes.
final texts = ['Dart', 'Rust', 'Flutter'];
final embeddings = m2v.generateBatchEmbeddings(
texts,
maxLength: 256, // Truncate strings longer than 256 tokens
batchSize: 1024, // Internal chunks sent to the FFI layer
);When reading gigabytes of text from files or databases, loading everything into memory will crash the app. Use the Streaming API to handle data in chunks automatically.
import 'dart:convert';
import 'dart:io';
Future<void> processHugeFile() async {
final fileStream = File('massive_dataset.txt')
.openRead()
.transform(utf8.decoder)
.transform(const LineSplitter());
// Converts a Stream<String> into a Stream<Float32List>
final embeddingStream = m2v.generateEmbeddingStream(
fileStream,
batchSize: 500, // Process 500 strings at a time
useIsolate: true, // Run math in background threads
);
await for (final embedding in embeddingStream) {
saveToDb(embedding); // Memory safe!
}
}Never block the main thread. If you are building a Flutter app, always use the Async variants to perform generation in a background Isolate.
final embedding = await m2v.generateEmbeddingAsync('A very long text...');
final batch = await m2v.generateBatchEmbeddingsAsync(['A', 'B', 'C']);The library ships with Model2VecUtils — a powerful suite of math operations tuned for embeddings.
final query = m2v.generateEmbedding('cat');
final candidates = [
m2v.generateEmbedding('dog'),
m2v.generateEmbedding('space'),
];
// 1. Semantic Similarity (Cosine)
final sim = Model2VecUtils.cosineSimilarity(query, candidates[0]);
// 2. Threshold Searching (Find all matches > 80%)
final matches = Model2VecUtils.similaritySearchWithThreshold(
query, candidates, threshold: 0.8,
);
// 3. Scalar Quantization (Compress Float32 to Int8 to save 4x RAM)
final compressed = Model2VecUtils.quantizeToInt8(query);
// 4. Mean Pooling (Average multiple vectors into one)
final sentenceVector = Model2VecUtils.meanPooling(candidates);
// 5. DB Serialization
final base64String = Model2VecUtils.toBase64(query);| Method / Property | Description |
|---|---|
initEmbedder(path) |
Initializes the model from a Hugging Face repo ID or local path. |
initEmbedderAdvanced(...) |
Advanced initialization with custom cacheDirectory, hfToken, or normalize overrides. |
initEmbedderFromBytes(...) |
Initializes the model directly from raw Uint8List bytes (model.safetensors, tokenizer.json, etc). |
getRecommendedModels() |
Returns a list of officially supported models. |
tokenize(text) |
Runs the internal BPE tokenizer and returns a List<String>. |
generateEmbedding(text) |
Synchronously generates a Float32List embedding vector. |
generateBatchEmbeddings(texts) |
Synchronously generates embeddings for a List<String> using Rust SIMD. |
generateEmbeddingAsync(text) |
Asynchronously generates an embedding in a background Isolate. |
generateEmbeddingStream(stream) |
Processes a huge Stream<String> into a Stream<Float32List> in batches. |
embeddingDimension |
Property returning the vector size (e.g., 256, 384, 512). |
vocabularySize |
Property returning the number of tokens in the model's vocabulary. |
| Method | Description |
|---|---|
cosineSimilarity(a, b) |
Calculates cosine similarity (-1.0 to 1.0) between two vectors. |
cosineDistance(a, b) |
Calculates cosine distance (0.0 to 2.0). |
euclideanDistance(a, b) |
Calculates Euclidean (L2) distance. |
similaritySearch(query, docs) |
Returns the indices of the Top-K most similar vectors in a database. |
similaritySearchWithThreshold |
Returns all indices with similarity above a given threshold. |
quantizeToInt8(vector) |
Compresses a Float32List into an Int8List (4x memory savings). |
normalize(vector) |
Applies L2 normalization to a vector. |
meanPooling(vectors) |
Averages multiple vectors into a single vector. |
toBase64 / fromBase64 |
Serializes/Deserializes a vector to/from a Base64 string for DB storage. |
model2vec uses highly optimized FFI bindings. For mathematical operations on embeddings, Dart handles single-vector math natively with zero-overhead, while batch generation leverages Rust's SIMD (auto-vectorization) capabilities.
Here is a performance benchmark run on a typical machine (AOT compiled):
| Model | Load Time (Cache) | Single Embedding | Batch (32) |
|---|---|---|---|
minishlab/potion-base-2M |
~40 ms | 372.9 μs | 3.85 ms |
minishlab/potion-base-4M |
~40 ms | 363.7 μs | 4.19 ms |
minishlab/potion-base-8M |
~40 ms | 382.1 μs | 5.60 ms |
minishlab/potion-base-32M |
~120 ms | 452.6 μs | 6.79 ms |
minishlab/potion-multilingual-128M |
~1050 ms | 416.1 μs | 5.38 ms |
Note: Initial load times may vary slightly based on the disk speed. Generating an embedding takes just a few microseconds per string.
similaritySearchover 100,000 vectors takes <100ms in pure Dart.
The library uses Dart Native Assets, meaning cargo build is invoked automatically when running Dart code.
To manually re-build bindings if you modify the Rust C-API (native/src/lib.rs):
dart run ffigenTo run the test suite:
dart testThis project is licensed under the MIT License - see the LICENSE file for details.