# 🎙️ ONNX VC - Standalone Real-Time Voice Changer A high-performance, low-latency, real-time AI voice conversion system powered by **ONNX Runtime** and **Retrieval-based Voice Conversion (RVC)**. Features a premium dashboard built with **Next.js App Router**, **TypeScript**, and **Tailwind CSS**, supporting full internationalization. --- ## ✨ Key Features * **🚀 WebSocket Audio Pipeline:** Streaming audio transfer using binary WebSocket connections (raw PCM float32) for minimal overhead. * **⚡ Multi-Backend ONNX Acceleration:** Supports execution providers including NVIDIA `CUDA`, AMD/Intel `DirectML`, and fallback `CPU`. * **🌐 Universal Localisation:** Fully translatable interface supporting English, Indonesian, Japanese, Chinese, and Spanish. * **🎨 Premium Dashboard**: Fully responsive workspace built using React 19, Radix UI, Framer Motion, and Tailwind CSS. * **🎼 High-Fidelity DSP Pipeline:** * **Low-Cut Filter:** Active 1st order Butterworth high-pass filter at 80Hz to eliminate AC hum and rumble. * **Noise Gate:** Threshold-based noise suppression to bypass inference during silence (saving CPU/GPU cycles). * **Gain Controls:** Independent input/output digital gain staging. * **🧠 Advanced Pitch Extraction:** Optimized 16kHz pitch prediction using the RMVPE (Retrieval-based Minimum Vocal Pitch Estimation) model. * **🌐 Dual Routing Architecture:** Supports routing audio via the web browser (Web Audio API) or directly through the server's local audio hardware (using `sounddevice`). --- ## 🛠️ System Architecture ```mermaid graph TD A[Microphone / Web Browser] -->|Web Audio API| B(WebSocket Connection) B -->|Raw Float32 PCM Chunk| C[server.py Backend] C -->|1. High-Pass Filter 80Hz| D[DSP Stage] D -->|2. Gain & Noise Gate| D D -->|3. Resample to 16kHz| E[Hubert/ContentVec ONNX] D -->|4. Pitch Estimation RMVPE| F[Pitch Predictor] E --> G[RVC ONNX Model Inference] F --> G G -->|Target Audio Chunks| H(WebSocket Connection) H -->|Play audio| I[Browser Speakers / Audio Device] ``` --- ## 📁 Repository Structure * [server.py](file:///M:/Users/ahmad/project/onnx-voice-changer/server.py) — The main WebSocket backend and static HTTP server managing connection loops, audio resampling, and model execution. * [start.bat](file:///M:/Users/ahmad/project/onnx-voice-changer/start.bat) — Windows launcher batch file that automatically resolves the Python virtual environment and executes the server. * [requirements.txt](file:///M:/Users/ahmad/project/onnx-voice-changer/requirements.txt) — Python dependencies list. * [frontend-next/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend-next) — The development workspace for the frontend client (Next.js, TypeScript). * [frontend/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend) — Statically exported and optimized assets served by [server.py](file:///M:/Users/ahmad/project/onnx-voice-changer/server.py) backend. * [lib/](file:///M:/Users/ahmad/project/onnx-voice-changer/lib) — Core package containing inference models, ONNX conversion scripts, and prediction tools. * [weights/](file:///M:/Users/ahmad/project/onnx-voice-changer/weights) — Directory for character voice model weights (e.g. `weights/HuTao/`). * [pretrained/](file:///M:/Users/ahmad/project/onnx-voice-changer/pretrained) — Directory containing base pre-trained models. --- ## 🚀 Installation & Setup ### 📋 Prerequisites * **Python 3.10+** * **FFmpeg** installed and added to the system PATH (Required for audio processing utilities). * **Node.js 18+** & **npm** (Only required if you want to modify and compile the frontend workspace). * (Optional) **NVIDIA CUDA Toolkit** (v11.x/12.x) and **cuDNN** for GPU execution acceleration. --- ### 📦 1. Python Backend Installation 1. Clone this repository to your local directory. 2. Initialize and activate a virtual environment: ```bash python -m venv venv # On Windows: .\venv\Scripts\activate # On Linux/macOS: source venv/bin/activate ``` 3. Install the required dependencies: ```bash pip install -r requirements.txt ``` --- ### 📥 2. Download Pre-trained ContentVec (Required) The model requires a ContentVec base model to generate speaker features from voice chunks. 1. Download the `vec-768-layer-12.onnx` model from Hugging Face: 👉 **[Download vec-768-layer-12.onnx](https://huggingface.co/DogManTC/test-rvc-onnx/blob/main/vec-768-layer-12.onnx)** 2. Save the downloaded file inside the [pretrained/](file:///M:/Users/ahmad/project/onnx-voice-changer/pretrained) directory: ``` pretrained/ └── vec-768-layer-12.onnx ``` --- ### 🔄 3. Setup & Export RVC Models to ONNX To run character models on ONNX Runtime, you must place your standard PyTorch RVC models (`.pth`) under the [weights/](file:///M:/Users/ahmad/project/onnx-voice-changer/weights) directory and convert them. 1. Create a sub-folder under `weights/` named after your character (e.g. `HuTao`): ``` weights/ └── HuTao/ └── HuTao.pth ``` 2. Run the ONNX conversion script by passing the folder name of the model: ```bash python lib/export_onnx.py --model_name HuTao ``` 3. The script will automatically search for the `.pth` file inside `weights/HuTao/` and export a corresponding `HuTao.onnx` file inside the same directory: ``` weights/ └── HuTao/ ├── HuTao.pth └── HuTao.onnx ``` --- ### 🖥️ 4. Running the Frontend Client Since the Python backend operates purely as a WebSocket API service, you must run the Next.js frontend client separately. #### Option A: Development Server (Quick & Recommended) 1. Navigate to the frontend directory: ```bash cd frontend-next ``` 2. Install npm dependencies: ```bash npm install ``` 3. Spin up the dev server: ```bash npm run dev ``` Open your browser and navigate to **`http://localhost:3000`**. #### Option B: Compiled Static Production Web Server 1. Navigate to `frontend-next` and build the application: ```bash cd frontend-next npm install npm run build ``` *Note: This will compile static pages and copy them into the root `/frontend` folder.* 2. Serve the compiled output using a static file server of your choice: - Using Node: `npx serve ../frontend -p 3000` - Using Python: `python -m http.server 3000 --directory ../frontend` Open **`http://localhost:3000`** in your browser. --- ## 🏃 Running the Voice Changer ### Step 1: Start the Python WebSocket Backend Run the server using your terminal (defaults to port `8765`): ```bash python server.py --host 127.0.0.1 --port 8765 --device cuda ``` #### ⚙️ Command-Line Arguments | Argument | Description | Default | |---|---|---| | `--host` | The address the WebSocket server binds to. | `127.0.0.1` | | `--port` | WebSocket communication port. | `8765` | | `--device` | The ONNX Runtime execution device (`cpu`, `cuda`, `dml`). | `cuda` | | `--model` | Target folder name in `weights/` to load directly upon startup. | `None` | ### Step 2: Open the Frontend Dashboard Make sure your frontend client is running (via `npm run dev` or a static server on `http://localhost:3000`), open it in your browser, and it will automatically connect to the WebSocket API backend. --- ## 🔊 Audio DSP Details To achieve low latency without output artifacts, the audio processing utilizes: 1. **Sliding Window Context Buffer:** Keeps a short historical buffer of the audio to feed the model the required context frames while minimizing output audio delay. 2. **Convolution Padding Fadeout:** 120ms of trailing silent padding is temporarily appended to input segments to avoid edge-fading anomalies inherent to RVC convolutional steps. 3. **Linear Resampling:** Low-overhead linear interpolation for quick sample rate adaptation.