From a387dfa46bd4b4ff69699bc09ccbd07e00750d87 Mon Sep 17 00:00:00 2001 From: akukanara Date: Sun, 31 May 2026 17:00:57 +0700 Subject: [PATCH] docs: add multi-language readme and update paths --- README.id.md | 165 +++++++++++++++++++++++++++++++++++++++++++++++++++ README.md | 36 +++++------ 2 files changed, 180 insertions(+), 21 deletions(-) create mode 100644 README.id.md diff --git a/README.id.md b/README.id.md new file mode 100644 index 0000000..82d8dd2 --- /dev/null +++ b/README.id.md @@ -0,0 +1,165 @@ +# 🎙️ ONNX VC - Standalone Real-Time Voice Changer + +🌐 **Bahasa:** [English](README.md) | [Bahasa Indonesia](README.id.md) + +Sistem konversi suara AI real-time berkinerja tinggi dan latensi rendah yang ditenagai oleh **ONNX Runtime** dan **Retrieval-based Voice Conversion (RVC)**. Dilengkapi dengan dashboard premium yang dibuat menggunakan **Next.js App Router**, **TypeScript**, dan **Tailwind CSS**, serta mendukung internasionalisasi penuh. + +--- + +## ✨ Fitur Utama +* **🚀 WebSocket Audio Pipeline:** Pengiriman audio streaming menggunakan koneksi WebSocket biner (raw PCM float32) untuk overhead minimal. +* **⚡ Akselerasi ONNX Multi-Backend:** Mendukung execution providers termasuk NVIDIA `CUDA`, AMD/Intel `DirectML`, dan fallback `CPU`. +* **🌐 Universal Localisation:** Antarmuka yang dapat diterjemahkan sepenuhnya, mendukung Bahasa Inggris, Indonesia, Jepang, Mandarin, dan Spanyol. +* **🎨 Dashboard Premium**: Halaman kerja responsif yang dibangun menggunakan React 19, Radix UI, Framer Motion, dan Tailwind CSS. +* **🎼 DSP Pipeline dengan Kualitas Tinggi:** + * **Low-Cut Filter:** Butterworth high-pass filter orde pertama aktif pada frekuensi 80Hz untuk menghilangkan hum AC dan gemuruh. + * **Noise Gate:** Penekanan derau berbasis ambang batas (threshold) untuk melewati inferensi saat hening (menghemat siklus CPU/GPU). + * **Gain Controls:** Pengaturan gain digital input/output independen. +* **🧠 Ekstraksi Pitch Canggih:** Prediksi pitch 16kHz yang dioptimalkan menggunakan model RMVPE (Retrieval-based Minimum Vocal Pitch Estimation). +* **🌐 Arsitektur Dual Routing:** Mendukung perutean audio melalui browser web (Web Audio API) atau langsung melalui perangkat keras audio lokal server (menggunakan `sounddevice`). + +--- + +## 🛠️ Arsitektur Sistem + +```mermaid +graph TD + A[Mikrofon / Browser Web] -->|Web Audio API| B(Koneksi WebSocket) + B -->|Chunk PCM Float32 Mentah| C[Backend server.py] + C -->|1. High-Pass Filter 80Hz| D[Tahap DSP] + D -->|2. Gain & Noise Gate| D + D -->|3. Resample ke 16kHz| E[Hubert/ContentVec ONNX] + D -->|4. Estimasi Pitch RMVPE| F[Prediktor Pitch] + E --> G[Inferensi Model ONNX RVC] + F --> G + G -->|Chunk Audio Target| H(Koneksi WebSocket) + H -->|Putar Audio| I[Speaker Browser / Perangkat Audio] +``` + +--- + +## 📁 Struktur Repositori +* [server.py](file:///M:/Users/ahmad/project/onnx-voice-changer/server.py) — Server backend WebSocket utama yang mengelola loop koneksi, resampling audio, dan eksekusi model. +* [start.bat](file:///M:/Users/ahmad/project/onnx-voice-changer/start.bat) — File batch peluncur Windows yang secara otomatis menyiapkan environment virtual Python dan menjalankan server. +* [requirements.txt](file:///M:/Users/ahmad/project/onnx-voice-changer/requirements.txt) — Daftar dependensi Python. +* [frontend/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend) — Ruang kerja klien frontend yang dibangun dengan Next.js (TypeScript, Tailwind CSS). +* [frontend-deprecated/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend-deprecated) — Kode frontend lama yang sudah usang. +* [lib/](file:///M:/Users/ahmad/project/onnx-voice-changer/lib) — Paket inti yang berisi model inferensi, skrip konversi ONNX, dan alat prediksi. +* [weights/](file:///M:/Users/ahmad/project/onnx-voice-changer/weights) — Direktori untuk bobot model suara karakter (contoh: `weights/HuTao/`). +* [pretrained/](file:///M:/Users/ahmad/project/onnx-voice-changer/pretrained) — Direktori yang berisi model dasar pra-terlatih. + +--- + +## 🚀 Instalasi & Pengaturan + +### 📋 Prasyarat +* **Python 3.10+** +* **FFmpeg** terinstal dan ditambahkan ke PATH sistem (Diperlukan untuk pemrosesan audio). +* **Node.js 18+** & **npm** (Diperlukan untuk menjalankan klien frontend Next.js). +* (Opsional) **NVIDIA CUDA Toolkit** (v11.x/12.x) dan **cuDNN** untuk akselerasi eksekusi GPU. + +--- + +### 📦 1. Instalasi Backend Python +1. Klon repositori ini ke direktori lokal Anda. +2. Inisialisasi dan aktifkan virtual environment: + ```bash + python -m venv venv + # Di Windows: + .\venv\Scripts\activate + # Di Linux/macOS: + source venv/bin/activate + ``` +3. Instal dependensi yang diperlukan: + ```bash + pip install -r requirements.txt + ``` + +--- + +### 📥 2. Unduh Pre-trained ContentVec (Diperlukan) +Model ini memerlukan model dasar ContentVec untuk menghasilkan fitur pembicara dari potongan suara. +1. Unduh model `vec-768-layer-12.onnx` dari Hugging Face: + 👉 **[Unduh vec-768-layer-12.onnx](https://huggingface.co/DogManTC/test-rvc-onnx/blob/main/vec-768-layer-12.onnx)** +2. Simpan file yang diunduh di dalam direktori `pretrained/`: + ``` + pretrained/ + └── vec-768-layer-12.onnx + ``` + +--- + +### 🔄 3. Siapkan & Ekspor Model RVC ke ONNX +Untuk menjalankan model karakter pada ONNX Runtime, Anda harus menempatkan model RVC PyTorch standar Anda (`.pth`) di bawah direktori `weights/` dan mengonversinya. + +1. Buat sub-folder di bawah `weights/` yang dinamai sesuai karakter Anda (contoh: `HuTao`): + ``` + weights/ + └── HuTao/ + └── HuTao.pth + ``` +2. Jalankan skrip konversi ONNX dengan memasukkan nama folder model: + ```bash + python lib/export_onnx.py --model_name HuTao + ``` +3. Skrip akan secara otomatis mencari file `.pth` di dalam `weights/HuTao/` dan mengekspor file `HuTao.onnx` yang sesuai di dalam direktori yang sama: + ``` + weights/ + └── HuTao/ + ├── HuTao.pth + └── HuTao.onnx + ``` + +--- + +### 🖥️ 4. Menjalankan Klien Frontend +Klien frontend berjalan sebagai server pengembangan Next.js mandiri atau server produksi yang telah di-build. + +1. Navigasi ke direktori frontend: + ```bash + cd frontend + ``` +2. Instal dependensi npm: + ```bash + npm install + ``` +3. Jalankan server pengembangan: + ```bash + npm run dev + ``` + Buka browser Anda dan arahkan ke **`http://localhost:3000`**. + +Atau, untuk membuat build dan menjalankan server produksi: +```bash +npm run build +npm run start +``` + +--- + +## 🏃 Menjalankan Pengubah Suara + +### Langkah 1: Mulai Backend WebSocket Python +Jalankan server menggunakan terminal Anda (default ke port `8765`): +```bash +python server.py --host 127.0.0.1 --port 8765 --device cuda +``` + +#### ⚙️ Argumen Baris Perintah +| Argumen | Deskripsi | Default | +|---|---|---| +| `--host` | Alamat yang diikat oleh server WebSocket. | `127.0.0.1` | +| `--port` | Port komunikasi WebSocket. | `8765` | +| `--device` | Perangkat eksekusi ONNX Runtime (`cpu`, `cuda`, `dml`). | `cuda` | +| `--model` | Nama folder target di `weights/` untuk dimuat langsung saat memulai. | `None` | + +### Langkah 2: Buka Dashboard Frontend +Pastikan klien frontend Anda berjalan (via `npm run dev` atau `npm run start` pada `http://localhost:3000`), buka di browser Anda, dan klien akan terhubung secara otomatis ke backend WebSocket API. + +--- + +## 🔊 Detail DSP Audio +Untuk mencapai latensi rendah tanpa artifak output, pemrosesan audio menggunakan: +1. **Buffer Konteks Sliding Window:** Mempertahankan buffer historis pendek dari audio untuk memberikan frame konteks yang diperlukan ke model sambil meminimalkan penundaan audio output. +2. **Fadeout Padding Konvolusi:** Padding senyap trailing sebesar 120ms ditambahkan sementara ke segmen input untuk menghindari anomali memudar di tepi yang melekat pada langkah konvolusional RVC. +3. **Resampling Linear:** Penggunaan overhead rendah resampling linear untuk adaptasi laju sampel yang cepat. diff --git a/README.md b/README.md index 9974a41..6254213 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,7 @@ # 🎙️ ONNX VC - Standalone Real-Time Voice Changer +🌐 **Languages:** [English](README.md) | [Bahasa Indonesia](README.id.md) + A high-performance, low-latency, real-time AI voice conversion system powered by **ONNX Runtime** and **Retrieval-based Voice Conversion (RVC)**. Features a premium dashboard built with **Next.js App Router**, **TypeScript**, and **Tailwind CSS**, supporting full internationalization. --- @@ -37,11 +39,11 @@ graph TD --- ## 📁 Repository Structure -* [server.py](file:///M:/Users/ahmad/project/onnx-voice-changer/server.py) — The main WebSocket backend and static HTTP server managing connection loops, audio resampling, and model execution. +* [server.py](file:///M:/Users/ahmad/project/onnx-voice-changer/server.py) — The main WebSocket backend server managing connection loops, audio resampling, and model execution. * [start.bat](file:///M:/Users/ahmad/project/onnx-voice-changer/start.bat) — Windows launcher batch file that automatically resolves the Python virtual environment and executes the server. * [requirements.txt](file:///M:/Users/ahmad/project/onnx-voice-changer/requirements.txt) — Python dependencies list. -* [frontend-next/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend-next) — The development workspace for the frontend client (Next.js, TypeScript). -* [frontend/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend) — Statically exported and optimized assets served by [server.py](file:///M:/Users/ahmad/project/onnx-voice-changer/server.py) backend. +* [frontend/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend) — The frontend client workspace built with Next.js (TypeScript, Tailwind CSS). +* [frontend-deprecated/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend-deprecated) — The old deprecated frontend code. * [lib/](file:///M:/Users/ahmad/project/onnx-voice-changer/lib) — Core package containing inference models, ONNX conversion scripts, and prediction tools. * [weights/](file:///M:/Users/ahmad/project/onnx-voice-changer/weights) — Directory for character voice model weights (e.g. `weights/HuTao/`). * [pretrained/](file:///M:/Users/ahmad/project/onnx-voice-changer/pretrained) — Directory containing base pre-trained models. @@ -53,7 +55,7 @@ graph TD ### 📋 Prerequisites * **Python 3.10+** * **FFmpeg** installed and added to the system PATH (Required for audio processing utilities). -* **Node.js 18+** & **npm** (Only required if you want to modify and compile the frontend workspace). +* **Node.js 18+** & **npm** (Required to run the Next.js frontend client). * (Optional) **NVIDIA CUDA Toolkit** (v11.x/12.x) and **cuDNN** for GPU execution acceleration. --- @@ -111,35 +113,27 @@ To run character models on ONNX Runtime, you must place your standard PyTorch RV --- ### 🖥️ 4. Running the Frontend Client -Since the Python backend operates purely as a WebSocket API service, you must run the Next.js frontend client separately. +The frontend client runs as a standalone Next.js development server or built production server. -#### Option A: Development Server (Quick & Recommended) 1. Navigate to the frontend directory: ```bash - cd frontend-next + cd frontend ``` 2. Install npm dependencies: ```bash npm install ``` -3. Spin up the dev server: +3. Start the development server: ```bash npm run dev ``` Open your browser and navigate to **`http://localhost:3000`**. -#### Option B: Compiled Static Production Web Server -1. Navigate to `frontend-next` and build the application: - ```bash - cd frontend-next - npm install - npm run build - ``` - *Note: This will compile static pages and copy them into the root `/frontend` folder.* -2. Serve the compiled output using a static file server of your choice: - - Using Node: `npx serve ../frontend -p 3000` - - Using Python: `python -m http.server 3000 --directory ../frontend` - Open **`http://localhost:3000`** in your browser. +Alternatively, to build and run the production server: +```bash +npm run build +npm run start +``` --- @@ -160,7 +154,7 @@ python server.py --host 127.0.0.1 --port 8765 --device cuda | `--model` | Target folder name in `weights/` to load directly upon startup. | `None` | ### Step 2: Open the Frontend Dashboard -Make sure your frontend client is running (via `npm run dev` or a static server on `http://localhost:3000`), open it in your browser, and it will automatically connect to the WebSocket API backend. +Make sure your frontend client is running (via `npm run dev` or `npm run start` on `http://localhost:3000`), open it in your browser, and it will automatically connect to the WebSocket API backend. ---