AbstractVoice/llms.txt at main · lpalbou/AbstractVoice · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
# AbstractVoice
> Modular Python voice I/O for AI apps (Piper TTS + faster‑whisper STT by default), with optional heavy TTS/cloning engines (AudioDiT / OmniVoice) and voice cloning backends (F5‑TTS / Chroma).

Start with `README.md` for the primary entry flow and treat `docs/api.md` as the supported integrator contract. Requires Python `>=3.10` (see `pyproject.toml`).

Positioning:
- AbstractVoice is a **voice I/O library** (TTS/STT + optional cloning). It does not run an agent loop or an LLM server.
- In the AbstractFramework ecosystem, AbstractVoice is meant to be used with **AbstractCore** (via the capability plugin entry point).
- The REPL is a **demonstrator/smoke-test harness** and includes a minimal OpenAI-compatible LLM HTTP client (`abstractvoice/examples/llm_provider.py`).

Policy: the REPL runs offline‑first (`allow_downloads=False`), so model downloads should be explicit (see `docs/installation.md` and `docs/model-management.md`).

Format: follows the [llms.txt spec](https://llmstxt.org/) (`## Optional` is skippable when context is tight).

## Start here
- [README](README.md): install + smoke tests + ecosystem diagram
- [Docs index](docs/README.md): map of user-facing vs internal docs
- [Getting started](docs/getting-started.md): recommended setup + first smoke tests
- [API (integrator contract)](docs/api.md): supported surface (incl. AbstractFramework integrations)
- [FAQ](docs/faq.md): common install/runtime issues
- [Installation](docs/installation.md): platform notes + optional extras
- [REPL guide](docs/repl_guide.md): end-to-end validation + commands

## Ecosystem
- [AbstractFramework](https://github.com/lpalbou/AbstractFramework): umbrella ecosystem
- [AbstractCore](https://github.com/lpalbou/abstractcore): capabilities/plugins
- [AbstractRuntime](https://github.com/lpalbou/abstractruntime): runtime + ArtifactStore

## Architecture & decisions
- [Architecture](docs/architecture.md): implementation map + diagrams
- [Acronyms](docs/acronyms.md)
- [ADR 0001](docs/adr/0001-local_assistant_out_of_box.md): out-of-box local assistant
- [ADR 0002](docs/adr/0002_barge_in_interruption.md): barge-in modes + stop phrase + optional AEC
- [ADR 0003](docs/adr/0003_cloning_reference_text_fallback.md): cloning `reference_text` auto-fallback
- [ADR 0004](docs/adr/0004_streaming_and_cancellation_for_cloned_tts.md): streaming + cancellation for cloned TTS
- [ADR 0005](docs/adr/0005_torch_device_and_dtype_policy.md): torch device + dtype selection policy

## Core code map
- [VoiceManager façade](abstractvoice/voice_manager.py): public import target
- [VoiceManager wiring](abstractvoice/vm/manager.py): constructor + engine selection
- [Voice-mode behavior](abstractvoice/vm/core.py): listening behavior during TTS playback
- [TTS orchestration](abstractvoice/vm/tts_mixin.py): `speak*()` + cloning orchestration
- [TTS delivery mode](abstractvoice/tts/delivery_mode.py): normalize `"buffered"` vs `"streamed"`
- [Text chunking](abstractvoice/tts/text_chunking.py): `split_text_batches(...)` + `TextStreamChunker`
- [Text→audio streaming bridge](abstractvoice/tts/text_to_speech_stream.py): `TextToSpeechStream` (LLM streaming → TTS)
- [Voice profiles abstraction](abstractvoice/voice_profiles.py): cross-engine `VoiceProfile` ids + metadata
- [Audio chunk smoothing](abstractvoice/audio/fade.py): edge fades + headroom scaling
- [STT/listening orchestration](abstractvoice/vm/stt_mixin.py): `listen()` + `transcribe_*()`
- [Piper TTS adapter](abstractvoice/adapters/tts_piper.py): model caching + synthesis
- [AudioDiT TTS adapter (optional)](abstractvoice/adapters/tts_audiodit.py): LongCat-AudioDiT engine (extra)
- [OmniVoice runtime (optional)](abstractvoice/omnivoice/runtime.py): offline-first load + device/dtype policy glue
- [OmniVoice TTS adapter (optional)](abstractvoice/adapters/tts_omnivoice.py): omnilingual TTS engine (extra)
- [OmniVoice cloning engine (optional)](abstractvoice/cloning/engine_omnivoice.py): reference-audio cloning (extra)
- [faster-whisper STT adapter](abstractvoice/adapters/stt_faster_whisper.py): STT backend
- [Mic/VAD/STT loop](abstractvoice/recognition.py): listening thread + stop phrase handling
- [Stop phrase matching](abstractvoice/stop_phrase.py): normalization/matching (dependency-light)
- [Explicit prefetch CLI](abstractvoice/prefetch.py): `abstractvoice-prefetch ...`

## Integrations (AbstractFramework ecosystem)
- [AbstractCore capability plugin](abstractvoice/integrations/abstractcore_plugin.py): registers voice/audio backends
- [AbstractCore tool helpers](abstractvoice/integrations/abstractcore.py): `make_voice_tools(...)`
- [Artifact store adapter](abstractvoice/artifacts.py): AbstractRuntime-like ArtifactStore adapter (duck-typed)

## Project + safety
- [Contributing](CONTRIBUTING.md)
- [Security policy](SECURITY.md)
- [Acknowledgments](ACKNOWLEDGMENTS.md)
- [Changelog](CHANGELOG.md)
- [License](LICENSE)
- [Third-party licenses (vendored code)](third_party_licenses/longcat_audiodit_license.txt)

## Optional
- [Full index](llms-full.txt): expanded repo map + workflows
- [Tests](tests/): run `python -m pytest -q`
- [Backlog](docs/backlog/): internal planning (not an API contract)
- [Reports](docs/reports/): historical snapshots
- [Voice cloning research notes](docs/voice_cloning_2026.md): non-contract research notes