docs: update README.md with download links and setup tutorials
This commit is contained in:
@@ -1,12 +1,14 @@
|
|||||||
# 🎙️ Standalone ONNX Real-Time Voice Changer Service
|
# 🎙️ ONNX VC - Standalone Real-Time Voice Changer
|
||||||
|
|
||||||
A high-performance, low-latency, real-time voice conversion system powered by **ONNX Runtime** and **Retrieval-based Voice Conversion (RVC)**. This application enables real-time voice conversion from a microphone/browser source to a designated target character model with minimal processing latency.
|
A high-performance, low-latency, real-time AI voice conversion system powered by **ONNX Runtime** and **Retrieval-based Voice Conversion (RVC)**. Features a premium dashboard built with **Next.js App Router**, **TypeScript**, and **Tailwind CSS**, supporting full internationalization.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## ✨ Key Features
|
## ✨ Key Features
|
||||||
* **🚀 WebSocket Audio Pipeline:** Streaming audio transfer using binary WebSocket connections (raw PCM float32) for minimal overhead.
|
* **🚀 WebSocket Audio Pipeline:** Streaming audio transfer using binary WebSocket connections (raw PCM float32) for minimal overhead.
|
||||||
* **⚡ Multi-Backend ONNX Acceleration:** Supports execution providers including NVIDIA `CUDA`, AMD/Intel `DirectML`, and fallback `CPU`.
|
* **⚡ Multi-Backend ONNX Acceleration:** Supports execution providers including NVIDIA `CUDA`, AMD/Intel `DirectML`, and fallback `CPU`.
|
||||||
|
* **🌐 Universal Localisation:** Fully translatable interface supporting English, Indonesian, Japanese, Chinese, and Spanish.
|
||||||
|
* **🎨 Premium Dashboard**: Fully responsive workspace built using React 19, Radix UI, Framer Motion, and Tailwind CSS.
|
||||||
* **🎼 High-Fidelity DSP Pipeline:**
|
* **🎼 High-Fidelity DSP Pipeline:**
|
||||||
* **Low-Cut Filter:** Active 1st order Butterworth high-pass filter at 80Hz to eliminate AC hum and rumble.
|
* **Low-Cut Filter:** Active 1st order Butterworth high-pass filter at 80Hz to eliminate AC hum and rumble.
|
||||||
* **Noise Gate:** Threshold-based noise suppression to bypass inference during silence (saving CPU/GPU cycles).
|
* **Noise Gate:** Threshold-based noise suppression to bypass inference during silence (saving CPU/GPU cycles).
|
||||||
@@ -38,44 +40,101 @@ graph TD
|
|||||||
* [server.py](file:///M:/Users/ahmad/project/onnx-voice-changer/server.py) — The main WebSocket backend and static HTTP server managing connection loops, audio resampling, and model execution.
|
* [server.py](file:///M:/Users/ahmad/project/onnx-voice-changer/server.py) — The main WebSocket backend and static HTTP server managing connection loops, audio resampling, and model execution.
|
||||||
* [start.bat](file:///M:/Users/ahmad/project/onnx-voice-changer/start.bat) — Windows launcher batch file that automatically resolves the Python virtual environment and executes the server.
|
* [start.bat](file:///M:/Users/ahmad/project/onnx-voice-changer/start.bat) — Windows launcher batch file that automatically resolves the Python virtual environment and executes the server.
|
||||||
* [requirements.txt](file:///M:/Users/ahmad/project/onnx-voice-changer/requirements.txt) — Python dependencies list.
|
* [requirements.txt](file:///M:/Users/ahmad/project/onnx-voice-changer/requirements.txt) — Python dependencies list.
|
||||||
* [frontend/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend) — Contains client-side Web UI files:
|
* [frontend-next/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend-next) — The development workspace for the frontend client (Next.js, TypeScript).
|
||||||
* [frontend/index.html](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend/index.html) — Control interface layout.
|
* [frontend/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend) — Statically exported and optimized assets served by [server.py](file:///M:/Users/ahmad/project/onnx-voice-changer/server.py) backend.
|
||||||
* [frontend/app.js](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend/app.js) — WebSocket communication and client-side audio rendering.
|
* [lib/](file:///M:/Users/ahmad/project/onnx-voice-changer/lib) — Core package containing inference models, ONNX conversion scripts, and prediction tools.
|
||||||
* [frontend/styles.css](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend/styles.css) — Custom dashboard styling.
|
* [weights/](file:///M:/Users/ahmad/project/onnx-voice-changer/weights) — Directory for character voice model weights (e.g. `weights/HuTao/`).
|
||||||
* [lib/](file:///M:/Users/ahmad/project/onnx-voice-changer/lib) — core package containing inference models and prediction tools.
|
* [pretrained/](file:///M:/Users/ahmad/project/onnx-voice-changer/pretrained) — Directory containing base pre-trained models.
|
||||||
* [weights/](file:///M:/Users/ahmad/project/onnx-voice-changer/weights) — Directory for voice model weights. Place your custom `.onnx` and `.pth` model sub-directories here.
|
|
||||||
* [pretrained/](file:///M:/Users/ahmad/project/onnx-voice-changer/pretrained) — Directory containing base pre-trained models such as `vec-768-layer-12.onnx` or `vec-256-layer-12.onnx`.
|
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
## 🚀 Getting Started
|
## 🚀 Installation & Setup
|
||||||
|
|
||||||
### 📋 Prerequisites
|
### 📋 Prerequisites
|
||||||
* **Python 3.10+** (Recommended)
|
* **Python 3.10+**
|
||||||
* **FFmpeg** installed and added to the system PATH (Required for audio processing utilities).
|
* **FFmpeg** installed and added to the system PATH (Required for audio processing utilities).
|
||||||
|
* **Node.js 18+** & **npm** (Only required if you want to modify and compile the frontend workspace).
|
||||||
* (Optional) **NVIDIA CUDA Toolkit** (v11.x/12.x) and **cuDNN** for GPU execution acceleration.
|
* (Optional) **NVIDIA CUDA Toolkit** (v11.x/12.x) and **cuDNN** for GPU execution acceleration.
|
||||||
|
|
||||||
### 📦 Installation
|
---
|
||||||
|
|
||||||
|
### 📦 1. Python Backend Installation
|
||||||
1. Clone this repository to your local directory.
|
1. Clone this repository to your local directory.
|
||||||
2. Initialize and activate a virtual environment (optional but recommended):
|
2. Initialize and activate a virtual environment:
|
||||||
```bash
|
```bash
|
||||||
python -m venv venv
|
python -m venv venv
|
||||||
|
# On Windows:
|
||||||
.\venv\Scripts\activate
|
.\venv\Scripts\activate
|
||||||
|
# On Linux/macOS:
|
||||||
|
source venv/bin/activate
|
||||||
```
|
```
|
||||||
3. Install the required dependencies:
|
3. Install the required dependencies:
|
||||||
```bash
|
```bash
|
||||||
pip install -r requirements.txt
|
pip install -r requirements.txt
|
||||||
```
|
```
|
||||||
4. Place your ContentVec base model (`vec-768-layer-12.onnx` or `vec-256-layer-12.onnx`) inside the [pretrained/](file:///M:/Users/ahmad/project/onnx-voice-changer/pretrained) directory.
|
|
||||||
5. Place your character models in [weights/](file:///M:/Users/ahmad/project/onnx-voice-changer/weights) in structured folders (e.g., `weights/HuTao/` containing `HuTao.onnx` and `HuTao.pth`).
|
|
||||||
|
|
||||||
### 🏃 Running the Server
|
---
|
||||||
|
|
||||||
#### Option A: Quick Launch (Windows)
|
### 📥 2. Download Pre-trained ContentVec (Required)
|
||||||
Simply double-click the [start.bat](file:///M:/Users/ahmad/project/onnx-voice-changer/start.bat) file. It will automatically detect Python, set up the directory paths, and launch the service.
|
The model requires a ContentVec base model to generate speaker features from voice chunks.
|
||||||
|
1. Download the `vec-768-layer-12.onnx` model from Hugging Face:
|
||||||
|
👉 **[Download vec-768-layer-12.onnx](https://huggingface.co/DogManTC/test-rvc-onnx/blob/main/vec-768-layer-12.onnx)**
|
||||||
|
2. Save the downloaded file inside the [pretrained/](file:///M:/Users/ahmad/project/onnx-voice-changer/pretrained) directory:
|
||||||
|
```
|
||||||
|
pretrained/
|
||||||
|
└── vec-768-layer-12.onnx
|
||||||
|
```
|
||||||
|
|
||||||
#### Option B: Manual CLI execution
|
---
|
||||||
Execute the server using your terminal:
|
|
||||||
|
### 🔄 3. Setup & Export RVC Models to ONNX
|
||||||
|
To run character models on ONNX Runtime, you must place your standard PyTorch RVC models (`.pth`) under the [weights/](file:///M:/Users/ahmad/project/onnx-voice-changer/weights) directory and convert them.
|
||||||
|
|
||||||
|
1. Create a sub-folder under `weights/` named after your character (e.g. `HuTao`):
|
||||||
|
```
|
||||||
|
weights/
|
||||||
|
└── HuTao/
|
||||||
|
└── HuTao.pth
|
||||||
|
```
|
||||||
|
2. Run the ONNX conversion script by passing the folder name of the model:
|
||||||
|
```bash
|
||||||
|
python lib/export_onnx.py --model_name HuTao
|
||||||
|
```
|
||||||
|
3. The script will automatically search for the `.pth` file inside `weights/HuTao/` and export a corresponding `HuTao.onnx` file inside the same directory:
|
||||||
|
```
|
||||||
|
weights/
|
||||||
|
└── HuTao/
|
||||||
|
├── HuTao.pth
|
||||||
|
└── HuTao.onnx
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 🖥️ 4. Build the Frontend (Optional)
|
||||||
|
*Note: Statically built files are already pre-compiled under `/frontend`. You only need to run this step if you have modified the Next.js workspace source files inside `/frontend-next`.*
|
||||||
|
|
||||||
|
1. Navigate to the frontend directory:
|
||||||
|
```bash
|
||||||
|
cd frontend-next
|
||||||
|
```
|
||||||
|
2. Install npm dependencies:
|
||||||
|
```bash
|
||||||
|
npm install
|
||||||
|
```
|
||||||
|
3. Build and export static assets (this will automatically compile files and synchronize them to the `/frontend` directory via `copy-build.js`):
|
||||||
|
```bash
|
||||||
|
npm run build
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 🏃 Running the Voice Changer
|
||||||
|
|
||||||
|
### Option A: Quick Launch (Windows)
|
||||||
|
Double-click the [start.bat](file:///M:/Users/ahmad/project/onnx-voice-changer/start.bat) file at the root. It automatically activates the Python virtual environment and fires up the server.
|
||||||
|
|
||||||
|
### Option B: Manual CLI Execution
|
||||||
|
Run the server using your terminal:
|
||||||
```bash
|
```bash
|
||||||
python server.py --host 127.0.0.1 --port 8765 --http_port 8000 --device cuda
|
python server.py --host 127.0.0.1 --port 8765 --http_port 8000 --device cuda
|
||||||
```
|
```
|
||||||
@@ -89,7 +148,7 @@ python server.py --host 127.0.0.1 --port 8765 --http_port 8000 --device cuda
|
|||||||
| `--device` | The ONNX Runtime execution device (`cpu`, `cuda`, `dml`). | `cuda` |
|
| `--device` | The ONNX Runtime execution device (`cpu`, `cuda`, `dml`). | `cuda` |
|
||||||
| `--model` | Target folder name in `weights/` to load directly upon startup. | `None` |
|
| `--model` | Target folder name in `weights/` to load directly upon startup. | `None` |
|
||||||
|
|
||||||
Once the server begins execution, it will spin up the local server, and your Web UI should open automatically at `http://localhost:8000`.
|
Once started, the application will automatically launch your browser and direct you to the premium audio dashboard at `http://localhost:8000`.
|
||||||
|
|
||||||
---
|
---
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user