diff --git a/README.md b/README.md index b06482f..37aaf26 100644 --- a/README.md +++ b/README.md @@ -1,31 +1,100 @@ -# Standalone ONNX Voice Changer Service +# 🎙️ Standalone ONNX Real-Time Voice Changer Service -Layanan pengubah suara real-time berbasis AI berlatensi rendah menggunakan akselerasi ONNX Runtime dan model RVC (Retrieval-based Voice Conversion). +A high-performance, low-latency, real-time voice conversion system powered by **ONNX Runtime** and **Retrieval-based Voice Conversion (RVC)**. This application enables real-time voice conversion from a microphone/browser source to a designated target character model with minimal processing latency. -## Struktur Proyek -- `server.py`: WebSocket server utama yang memproses streaming audio dan menyajikan static HTTP frontend. -- `frontend/`: File UI web client (HTML, CSS, JS). -- `lib/`: Modul inferensi ONNX RVC. -- `weights/`: Tempat penyimpanan model suara (folder per model berisi file `.onnx` dan opsional file `.pth`). -- `pretrained/`: Model pra-latih dasar (seperti `vec-768-layer-12.onnx`). -- `rmvpe.pt` & `rmvpe.py`: Untuk ekstraksi pitch suara fidelitas tinggi. +--- -## Cara Menjalankan +## ✨ Key Features +* **🚀 WebSocket Audio Pipeline:** Streaming audio transfer using binary WebSocket connections (raw PCM float32) for minimal overhead. +* **⚡ Multi-Backend ONNX Acceleration:** Supports execution providers including NVIDIA `CUDA`, AMD/Intel `DirectML`, and fallback `CPU`. +* **🎼 High-Fidelity DSP Pipeline:** + * **Low-Cut Filter:** Active 1st order Butterworth high-pass filter at 80Hz to eliminate AC hum and rumble. + * **Noise Gate:** Threshold-based noise suppression to bypass inference during silence (saving CPU/GPU cycles). + * **Gain Controls:** Independent input/output digital gain staging. +* **🧠 Advanced Pitch Extraction:** Optimized 16kHz pitch prediction using the RMVPE (Retrieval-based Minimum Vocal Pitch Estimation) model. +* **🌐 Dual Routing Architecture:** Supports routing audio via the web browser (Web Audio API) or directly through the server's local audio hardware (using `sounddevice`). -### Persyaratan Sistem -Pastikan Python 3.10+ sudah terinstal di sistem Anda beserta library yang dibutuhkan di `requirements.txt`. +--- -### Menjalankan Server -Jalankan server menggunakan Python dari environment Anda: -```bash -python server.py --host 127.0.0.1 --port 8765 --http_port 8000 +## 🛠️ System Architecture + +```mermaid +graph TD + A[Microphone / Web Browser] -->|Web Audio API| B(WebSocket Connection) + B -->|Raw Float32 PCM Chunk| C[server.py Backend] + C -->|1. High-Pass Filter 80Hz| D[DSP Stage] + D -->|2. Gain & Noise Gate| D + D -->|3. Resample to 16kHz| E[Hubert/ContentVec ONNX] + D -->|4. Pitch Estimation RMVPE| F[Pitch Predictor] + E --> G[RVC ONNX Model Inference] + F --> G + G -->|Target Audio Chunks| H(WebSocket Connection) + H -->|Play audio| I[Browser Speakers / Audio Device] ``` -Parameter opsional: -- `--host`: Alamat host WebSocket server (default: `127.0.0.1`). -- `--port`: Port WebSocket server (default: `8765`). -- `--http_port`: Port HTTP server untuk UI web client (default: `8000`). -- `--device`: Execution Provider (`cpu`, `cuda`, atau `dml` - default: `cuda`). -- `--model`: Nama folder model suara di dalam `weights/` yang ingin dimuat langsung saat start. +--- -Setelah server berjalan, Web UI akan otomatis terbuka di browser Anda pada alamat `http://localhost:8000`. +## 📁 Repository Structure +* [server.py](file:///M:/Users/ahmad/project/onnx-voice-changer/server.py) — The main WebSocket backend and static HTTP server managing connection loops, audio resampling, and model execution. +* [start.bat](file:///M:/Users/ahmad/project/onnx-voice-changer/start.bat) — Windows launcher batch file that automatically resolves the Python virtual environment and executes the server. +* [requirements.txt](file:///M:/Users/ahmad/project/onnx-voice-changer/requirements.txt) — Python dependencies list. +* [frontend/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend) — Contains client-side Web UI files: + * [frontend/index.html](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend/index.html) — Control interface layout. + * [frontend/app.js](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend/app.js) — WebSocket communication and client-side audio rendering. + * [frontend/styles.css](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend/styles.css) — Custom dashboard styling. +* [lib/](file:///M:/Users/ahmad/project/onnx-voice-changer/lib) — core package containing inference models and prediction tools. +* [weights/](file:///M:/Users/ahmad/project/onnx-voice-changer/weights) — Directory for voice model weights. Place your custom `.onnx` and `.pth` model sub-directories here. +* [pretrained/](file:///M:/Users/ahmad/project/onnx-voice-changer/pretrained) — Directory containing base pre-trained models such as `vec-768-layer-12.onnx` or `vec-256-layer-12.onnx`. + +--- + +## 🚀 Getting Started + +### 📋 Prerequisites +* **Python 3.10+** (Recommended) +* **FFmpeg** installed and added to the system PATH (Required for audio processing utilities). +* (Optional) **NVIDIA CUDA Toolkit** (v11.x/12.x) and **cuDNN** for GPU execution acceleration. + +### 📦 Installation +1. Clone this repository to your local directory. +2. Initialize and activate a virtual environment (optional but recommended): + ```bash + python -m venv venv + .\venv\Scripts\activate + ``` +3. Install the required dependencies: + ```bash + pip install -r requirements.txt + ``` +4. Place your ContentVec base model (`vec-768-layer-12.onnx` or `vec-256-layer-12.onnx`) inside the [pretrained/](file:///M:/Users/ahmad/project/onnx-voice-changer/pretrained) directory. +5. Place your character models in [weights/](file:///M:/Users/ahmad/project/onnx-voice-changer/weights) in structured folders (e.g., `weights/HuTao/` containing `HuTao.onnx` and `HuTao.pth`). + +### 🏃 Running the Server + +#### Option A: Quick Launch (Windows) +Simply double-click the [start.bat](file:///M:/Users/ahmad/project/onnx-voice-changer/start.bat) file. It will automatically detect Python, set up the directory paths, and launch the service. + +#### Option B: Manual CLI execution +Execute the server using your terminal: +```bash +python server.py --host 127.0.0.1 --port 8765 --http_port 8000 --device cuda +``` + +### ⚙️ Command-Line Arguments +| Argument | Description | Default | +|---|---|---| +| `--host` | The address the WebSocket server binds to. | `127.0.0.1` | +| `--port` | WebSocket communication port. | `8765` | +| `--http_port`| Port serving the static frontend Web UI. | `8000` | +| `--device` | The ONNX Runtime execution device (`cpu`, `cuda`, `dml`). | `cuda` | +| `--model` | Target folder name in `weights/` to load directly upon startup. | `None` | + +Once the server begins execution, it will spin up the local server, and your Web UI should open automatically at `http://localhost:8000`. + +--- + +## 🔊 Audio DSP Details +To achieve low latency without output artifacts, the audio processing utilizes: +1. **Sliding Window Context Buffer:** Keeps a short historical buffer of the audio to feed the model the required context frames while minimizing output audio delay. +2. **Convolution Padding Fadeout:** 120ms of trailing silent padding is temporarily appended to input segments to avoid edge-fading anomalies inherent to RVC convolutional steps. +3. **Linear Resampling:** Low-overhead linear interpolation for quick sample rate adaptation.