I'm Thomas van Dongen. I'm head of AI engineering at Springer Nature and co-founder of Minish, an open-source NLP lab working on efficient models and packages.
| Project | Description |
|---|---|
| semble | A code-search MCP/CLI tool for AI agents that drastically reduces token consumption |
| model2vec | Distill sentence transformers into static embeddings that are orders of magnitude faster |
| semhash | Multimodal semantic deduplication, outlier detection, and representative filtering |
| pyversity | Diversify search & retrieval results to reduce redundancy and improve coverage |
| vicinity | Fast, lightweight nearest neighbor search with pluggable backends |
| model2vec-rs | A Rust port of Model2Vec |
| tokenlearn | Pre-train static embedding models |
| agentcheck | A Go CLI that audits what an AI agent can access before you run it |





