Technical Insights on RVC Fluency: 320-sample Alignment for HuBERT and RMVPE Stability in Rust

Hi there,
​First of all, thank you for your work on obs-rvc. Implementing RVC in Rust is a challenging path, as mentioned in your README, but I believe it's a vital step for high-performance real-time voice conversion.
​I have been developing a separate Rust-based RVC engine and wanted to share some technical findings regarding stability and "fluency." I must admit that my deep-level knowledge of the RVC architecture is still limited, and I am eager to learn from the expertise of predecessors like you. I would truly appreciate any guidance or feedback you could offer.
​Here are the specific architectural insights I've implemented:
​1. The "320-sample Alignment" for HuBERT (ONNX)
​I discovered that the HuBERT ONNX model (16kHz) internally processes data in units of 320 samples (20ms). If the input source length (including padding) is not a strict multiple of 320, ONNX Runtime often throws ReduceSum or Where dimension mismatch errors, or suffers from quality degradation. Strictly aligning the source window to 320 \times n samples significantly improved the stability.
​2. Multi-stage RMVPE Fallback Logic
​To handle near-silence or low-gain inputs, I implemented a recursive fallback strategy. If the initial threshold fails, the decoder retries with lower thresholds (down to 0.01). This reduces "pitch flickering" while maintaining detection in quiet segments.
​3. F0 Stabilization (Despiking & Interpolation)
​I noticed that the "metallic" noise often stems from 1-frame pitch spikes or very short unvoiced gaps. Implementing a simple despiking logic (removing 1-frame voiced islands) and linear interpolation for short gaps (≤ 2 frames) made the voice output much smoother.
​I am still refining the output quality—currently fighting with some scaling and gain issues—but I wanted to share these findings as they seem to be common bottlenecks in Rust implementations. I would be honored to discuss these points further and borrow your wisdom to make Rust-based RVC more practical.
​Best regards,

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Technical Insights on RVC Fluency: 320-sample Alignment for HuBERT and RMVPE Stability in Rust #12

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Technical Insights on RVC Fluency: 320-sample Alignment for HuBERT and RMVPE Stability in Rust #12

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions