Skip to content

pro100andrey/model2vec

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

model2vec

pub package License: MIT

High-performance, local text embeddings for Dart and Flutter. A Dart wrapper around model2vec-rs using Rust FFI and Native Assets. Model2Vec creates small, fast, and effective text embeddings by distilling knowledge from large language models into a simple vocabulary-based look-up table.

Table of Contents

Key Features

  • Extreme Performance: Built on top of a highly optimized Rust engine. Up to ~1.7x faster than the official Python implementation, generating embeddings in microseconds.
  • Compact & Quantized: Models are typically 25MB - 100MB. Perfect for edge computing.
  • Massive Streaming: Built-in generateEmbeddingStream for processing millions of rows without blocking the Event Loop or overflowing RAM.
  • Hugging Face Integration: Automatically downloads and caches models directly from the Hugging Face Hub.
  • Zero-Stutter Async: Transparently runs heavy tokenization and math in background Dart Isolates using Async methods.
  • Vector Utilities: Ships with high-performance mathematical tools (cosineSimilarity, quantizeToInt8, similaritySearch, etc.).

Recommended Models

Model2Vec provides a variety of pre-trained models optimized for different use cases. These can be loaded directly via their Hugging Face model ID.

Model ID Language Distilled From Params Dimension Size
minishlab/potion-base-32M English bge-base-en-v1.5 32.3M 512 ~150MB
minishlab/potion-multilingual-128M Multi bge-m3 128M 768 ~500MB
minishlab/potion-retrieval-32M English bge-base-en-v1.5 32.3M 512 ~150MB
minishlab/potion-code-16M Code CodeRankEmbed 16M 384 ~80MB
minishlab/potion-base-8M English bge-base-en-v1.5 7.5M 256 ~50MB
minishlab/potion-base-4M English bge-base-en-v1.5 3.7M 128 ~30MB
minishlab/potion-base-2M English bge-base-en-v1.5 1.8M 64 ~25MB

Installation

Add model2vec to your pubspec.yaml:

dependencies:
  model2vec: any

Or add it using the command line:

dart pub add model2vec

Requires Dart SDK: 3.10.0+ and Rust toolchain: 1.86.0+ (to build the native library via Native Assets).

Quick Start

import 'package:model2vec/model2vec.dart';

void main() {
  final m2v = Model2Vec.instance;
  
  // Initialize with a model from Hugging Face
  m2v.initEmbedder('minishlab/potion-base-2M');
  
  // Generate an embedding
  final embedding = m2v.generateEmbedding('Dart FFI is blazingly fast 🚀');
  
  print('Vector dimension: ${m2v.embeddingDimension}');
  print('Vocabulary size: ${m2v.vocabularySize}');
}

Recipes & Patterns

1. Advanced Batch Processing

Process multiple strings at once for maximum hardware utilization. You can control sequence truncation and batch sizes.

final texts = ['Dart', 'Rust', 'Flutter'];

final embeddings = m2v.generateBatchEmbeddings(
  texts,
  maxLength: 256,   // Truncate strings longer than 256 tokens
  batchSize: 1024,  // Internal chunks sent to the FFI layer
);

2. Massive Data Streaming

When reading gigabytes of text from files or databases, loading everything into memory will crash the app. Use the Streaming API to handle data in chunks automatically.

import 'dart:convert';
import 'dart:io';

Future<void> processHugeFile() async {
  final fileStream = File('massive_dataset.txt')
      .openRead()
      .transform(utf8.decoder)
      .transform(const LineSplitter());

  // Converts a Stream<String> into a Stream<Float32List>
  final embeddingStream = m2v.generateEmbeddingStream(
    fileStream,
    batchSize: 500, // Process 500 strings at a time
    useIsolate: true, // Run math in background threads
  );

  await for (final embedding in embeddingStream) {
    saveToDb(embedding); // Memory safe!
  }
}

3. Asynchronous Isolate Execution

Never block the main thread. If you are building a Flutter app, always use the Async variants to perform generation in a background Isolate.

final embedding = await m2v.generateEmbeddingAsync('A very long text...');
final batch = await m2v.generateBatchEmbeddingsAsync(['A', 'B', 'C']);

4. Vector Math & Quantization

The library ships with Model2VecUtils — a powerful suite of math operations tuned for embeddings.

final query = m2v.generateEmbedding('cat');
final candidates = [
  m2v.generateEmbedding('dog'),
  m2v.generateEmbedding('space'),
];

// 1. Semantic Similarity (Cosine)
final sim = Model2VecUtils.cosineSimilarity(query, candidates[0]);

// 2. Threshold Searching (Find all matches > 80%)
final matches = Model2VecUtils.similaritySearchWithThreshold(
  query, candidates, threshold: 0.8,
);

// 3. Scalar Quantization (Compress Float32 to Int8 to save 4x RAM)
final compressed = Model2VecUtils.quantizeToInt8(query);

// 4. Mean Pooling (Average multiple vectors into one)
final sentenceVector = Model2VecUtils.meanPooling(candidates);

// 5. DB Serialization
final base64String = Model2VecUtils.toBase64(query);

API Reference

Core Methods (Model2Vec class)

Method / Property Description
initEmbedder(path) Initializes the model from a Hugging Face repo ID or local path.
initEmbedderAdvanced(...) Advanced initialization with custom cacheDirectory, hfToken, or normalize overrides.
initEmbedderFromBytes(...) Initializes the model directly from raw Uint8List bytes (model.safetensors, tokenizer.json, etc).
getRecommendedModels() Returns a list of officially supported models.
tokenize(text) Runs the internal BPE tokenizer and returns a List<String>.
generateEmbedding(text) Synchronously generates a Float32List embedding vector.
generateBatchEmbeddings(texts) Synchronously generates embeddings for a List<String> using Rust SIMD.
generateEmbeddingAsync(text) Asynchronously generates an embedding in a background Isolate.
generateEmbeddingStream(stream) Processes a huge Stream<String> into a Stream<Float32List> in batches.
embeddingDimension Property returning the vector size (e.g., 256, 384, 512).
vocabularySize Property returning the number of tokens in the model's vocabulary.

Math Utilities (Model2VecUtils class)

Method Description
cosineSimilarity(a, b) Calculates cosine similarity (-1.0 to 1.0) between two vectors.
cosineDistance(a, b) Calculates cosine distance (0.0 to 2.0).
euclideanDistance(a, b) Calculates Euclidean (L2) distance.
similaritySearch(query, docs) Returns the indices of the Top-K most similar vectors in a database.
similaritySearchWithThreshold Returns all indices with similarity above a given threshold.
quantizeToInt8(vector) Compresses a Float32List into an Int8List (4x memory savings).
normalize(vector) Applies L2 normalization to a vector.
meanPooling(vectors) Averages multiple vectors into a single vector.
toBase64 / fromBase64 Serializes/Deserializes a vector to/from a Base64 string for DB storage.

Performance

model2vec uses highly optimized FFI bindings. For mathematical operations on embeddings, Dart handles single-vector math natively with zero-overhead, while batch generation leverages Rust's SIMD (auto-vectorization) capabilities.

Here is a performance benchmark run on a typical machine (AOT compiled):

Model Load Time (Cache) Single Embedding Batch (32)
minishlab/potion-base-2M ~40 ms 372.9 μs 3.85 ms
minishlab/potion-base-4M ~40 ms 363.7 μs 4.19 ms
minishlab/potion-base-8M ~40 ms 382.1 μs 5.60 ms
minishlab/potion-base-32M ~120 ms 452.6 μs 6.79 ms
minishlab/potion-multilingual-128M ~1050 ms 416.1 μs 5.38 ms

Note: Initial load times may vary slightly based on the disk speed. Generating an embedding takes just a few microseconds per string.

  • similaritySearch over 100,000 vectors takes <100ms in pure Dart.

Development & Contributing

The library uses Dart Native Assets, meaning cargo build is invoked automatically when running Dart code.

To manually re-build bindings if you modify the Rust C-API (native/src/lib.rs):

dart run ffigen

To run the test suite:

dart test

License

This project is licensed under the MIT License - see the LICENSE file for details.

About

A high-performance Dart wrapper for model2vec-rs using Rust FFI.

Topics

Resources

License

Stars

Watchers

Forks

Contributors