Compare commits

...

15 Commits

50 changed files with 11592 additions and 1654 deletions
+5
View File
@@ -29,3 +29,8 @@ Thumbs.db
*.index
weights/
pretrained/
# Next.js workspace exclusions
frontend/node_modules/
frontend/.next/
frontend/out/
+78
View File
@@ -0,0 +1,78 @@
# 🤖 ONNX VC - Agent Guidelines & Architecture Map
This file serves as a guide for AI agents (Gemini, Claude, Cursor, etc.) working on the ONNX Voice Changer repository. It explains the project architecture, directory structure, core conventions, and how to maintain the codebase.
---
## 🛠️ Technology Stack
1. **Backend:** Python 3.10+, WebSocket (using `websockets`), ONNX Runtime, NumPy, PyTorch (only for RVC export).
2. **Frontend:** Next.js 15 (App Router), React 19, TypeScript, Tailwind CSS, Lucide React, Framer Motion.
3. **Voice Conversion:** Retrieval-based Voice Conversion (RVC) models accelerated via ONNX Runtime (CPU, CUDA, DirectML).
---
## 📁 Repository Structure
* [/server.py](file:///M:/Users/ahmad/project/onnx-voice-changer/server.py) — The main WebSocket API server.
* [/frontend/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend) — Next.js 15 client dashboard app.
* [/frontend-deprecated/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend-deprecated) — Legacy static single-page web files (HTML/CSS/JS). Do not modify.
* [/docs/](file:///M:/Users/ahmad/project/onnx-voice-changer/docs) — Holds localized README documentation files (Indonesian, Spanish, Japanese, Chinese).
* [/lib/](file:///M:/Users/ahmad/project/onnx-voice-changer/lib) — RVC models and export scripts (e.g. `export_onnx.py`).
* [/weights/](file:///M:/Users/ahmad/project/onnx-voice-changer/weights) — Character voice models (e.g., `weights/HuTao/HuTao.onnx`).
* [/pretrained/](file:///M:/Users/ahmad/project/onnx-voice-changer/pretrained) — Holds the pre-trained `vec-768-layer-12.onnx` ContentVec model.
---
## ⚙️ Core Architecture & Conventions
### 1. Pure API Backend (No Static Hosting)
* **Rule:** The Python backend (`server.py`) operates **strictly as a WebSocket API**.
* **Do NOT** configure Python to serve frontend static pages, build files, or index HTML.
* The Next.js frontend client runs independently (via `npm run dev` or a separate production server).
### 2. WebSocket Audio Pipeline
* Audio chunks are sent to and from the server as **binary WebSocket messages** containing raw `Float32` PCM audio data.
* Configuration changes, telemetry, and status controls are handled using **JSON WebSocket messages** sent in the same connection.
* Always check the message payload type (binary vs. string JSON text) in `server.py`.
### 3. Digital Signal Processing (DSP) Staging
* Audio preprocessing is handled on the server side:
1. **Low-Cut Filter:** Active Butterworth 1st order high-pass filter at 80Hz to eliminate AC hum.
2. **Noise Gate:** Threshold-based silence gate to bypass inference when the user is silent.
3. **Gain Controls:** Input and output gain staging before and after inference.
* Ensure all DSP math is optimized using `numpy` arrays to maintain low latency.
### 4. RVC ONNX Export
* PyTorch RVC models (`.pth`) must be converted to ONNX (`.onnx`) before inference.
* Always use [/lib/export_onnx.py](file:///M:/Users/ahmad/project/onnx-voice-changer/lib/export_onnx.py) for conversion:
```bash
python lib/export_onnx.py --model_name <CharacterFolder>
```
---
## 🎨 Frontend Design Guidelines
* **Responsive Layout:** Must support mobile and desktop views, utilizing a collapsible sidebar.
* **Themes & Accent Colors:** Supports dark/light mode toggling, with a custom accent color system (Purple, Blue, Emerald, Rose, Amber) stored in state.
* **i18n Translation:** Do not hardcode English/Indonesian strings. Ensure all labels, warnings, and messages are registered in [translations.ts](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend/src/utils/translations.ts).
---
## 🏃 Useful Development Commands
### Running Backend
```bash
python server.py --host 127.0.0.1 --port 8765 --device cuda
```
### Running Frontend Dev Server
```bash
cd frontend
npm run dev
```
### Building Frontend Production Server
```bash
cd frontend
npm run build
npm run start
```
+201
View File
@@ -0,0 +1,201 @@
Apache License
Version 2.0, January 2004
http://www.apache.org/licenses/
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
1. Definitions.
"License" shall mean the terms and conditions for use, reproduction,
and distribution as defined by Sections 1 through 9 of this document.
"Licensor" shall mean the copyright owner or entity authorized by
the copyright owner that is granting the License.
"Legal Entity" shall mean the union of the acting entity and all
other entities that control, are controlled by, or are under common
control with that entity. For the purposes of this definition,
"control" means (i) the power, direct or indirect, to cause the
direction or management of such entity, whether by contract or
otherwise, or (ii) ownership of fifty percent (50%) or more of the
outstanding shares, or (iii) beneficial ownership of such entity.
"You" (or "Your") shall mean an individual or Legal Entity
exercising permissions granted by this License.
"Source" form shall mean the preferred form for making modifications,
including but not limited to software source code, documentation
source, and configuration files.
"Object" form shall mean any form resulting from mechanical
transformation or translation of a Source form, including but
not limited to compiled object code, generated documentation,
and conversions to other media types.
"Work" shall mean the work of authorship, whether in Source or
Object form, made available under the License, as indicated by a
copyright notice that is included in or attached to the work
(an example is provided in the Appendix below).
"Derivative Works" shall mean any work, whether in Source or Object
form, that is based on (or derived from) the Work and for which the
editorial revisions, annotations, elaborations, or other modifications
represent, as a whole, an original work of authorship. For the purposes
of this License, Derivative Works shall not include works that remain
separable from, or merely link (or bind by name) to the interfaces of,
the Work and Derivative Works thereof.
"Contribution" shall mean any work of authorship, including
the original version of the Work and any modifications or additions
to that Work or Derivative Works thereof, that is intentionally
submitted to Licensor for inclusion in the Work by the copyright owner
or by an individual or Legal Entity authorized to submit on behalf
of the copyright owner. For the purposes of this definition, "submitted"
means any form of electronic, verbal, or written communication sent
to the Licensor or its representatives, including but not limited to
communication on electronic mailing lists, source code control systems,
and issue tracking systems that are managed by, or on behalf of, the
Licensor for the purpose of discussing and improving the Work, but
excluding communication that is conspicuously marked or otherwise
designated in writing by the copyright owner as "Not a Contribution."
"Contributor" shall mean Licensor and any individual or Legal Entity
on behalf of whom a Contribution has been received by Licensor and
subsequently incorporated within the Work.
2. Grant of Copyright License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
copyright license to reproduce, prepare Derivative Works of,
publicly display, publicly perform, sublicense, and distribute the
Work and such Derivative Works in Source or Object form.
3. Grant of Patent License. Subject to the terms and conditions of
this License, each Contributor hereby grants to You a perpetual,
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
(except as stated in this section) patent license to make, have made,
use, offer to sell, sell, import, and otherwise transfer the Work,
where such license applies only to those patent claims licensable
by such Contributor that are necessarily infringed by their
Contribution(s) alone or by combination of their Contribution(s)
with the Work to which such Contribution(s) was submitted. If You
institute patent litigation against any entity (including a
cross-claim or counterclaim in a lawsuit) alleging that the Work
or a Contribution incorporated within the Work constitutes direct
or contributory patent infringement, then any patent licenses
granted to You under this License for that Work shall terminate
as of the date such litigation is filed.
4. Redistribution. You may reproduce and distribute copies of the
Work or Derivative Works thereof in any medium, with or without
modifications, and in Source or Object form, provided that You
meet the following conditions:
(a) You must give any other recipients of the Work or
Derivative Works a copy of this License; and
(b) You must cause any modified files to carry prominent notices
stating that You changed the files; and
(c) You must retain, in the Source form of any Derivative Works
that You distribute, all copyright, patent, trademark, and
attribution notices from the Source form of the Work,
excluding those notices that do not pertain to any part of
the Derivative Works; and
(d) If the Work includes a "NOTICE" text file as part of its
distribution, then any Derivative Works that You distribute must
include a readable copy of the attribution notices contained
within such NOTICE file, excluding those notices that do not
pertain to any part of the Derivative Works, in at least one
of the following places: within a NOTICE text file distributed
as part of the Derivative Works; within the Source form or
documentation, if distributed along with the Derivative Works; or,
within a display generated by the Derivative Works, if and
wherever such third-party notices normally appear. The contents
of the NOTICE file are for informational purposes only and
do not modify the License. You may add Your own attribution
notices within Derivative Works that You distribute, alongside
or as an addendum to the Work text from the License, provided
that such additional attribution notices cannot be construed
as modifying the License.
You may add Your own copyright statement to Your modifications and
may provide additional or different license terms and conditions
for use, reproduction, or distribution of Your modifications, or
for any such Derivative Works as a whole, provided Your use,
reproduction, and distribution of the Work otherwise complies with
the conditions stated in this License.
5. Submission of Contributions. Unless You explicitly state otherwise,
any Contribution intentionally submitted for inclusion in the Work
by You to the Licensor shall be under the terms and conditions of
this License, without any additional terms or conditions.
Notwithstanding the above, nothing herein shall supersede or modify
the terms of any separate license agreement you may have executed
with Licensor regarding such Contributions.
6. Trademarks. This License does not grant permission to use the trade
names, trademarks, service marks, or product names of the Licensor,
except as required for reasonable and customary use in describing the
origin of the Work and reproducing the content of the NOTICE file.
7. Disclaimer of Warranty. Unless required by applicable law or
agreed to in writing, Licensor provides the Work (and each
Contributor provides its Contributions) on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
implied, including, without limitation, any warranties or conditions
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
PARTICULAR PURPOSE. You are solely responsible for determining the
appropriateness of using or redistributing the Work and assume any
risks associated with Your exercise of permissions under this License.
8. Limitation of Liability. In no event and under no legal theory,
whether in tort (including negligence), contract, or otherwise,
unless required by applicable law (such as deliberate and grossly
negligent acts) or agreed to in writing, shall any Contributor be
liable to You for damages, including any direct, indirect, special,
incidental, or consequential damages of any character arising as a
result of this License or out of the use or inability to use the
Work (including but not limited to damages for loss of goodwill,
work stoppage, computer failure or malfunction, or any and all
other commercial damages or losses), even if such Contributor
has been advised of the possibility of such damages.
9. Accepting Warranty or Additional Liability. While redistributing
the Work or Derivative Works thereof, You may choose to offer,
and charge a fee for, acceptance of support, warranty, indemnity,
or other liability obligations and/or rights consistent with this
License. However, in accepting such obligations, You may act only
on Your own behalf and on Your sole responsibility, not on behalf
of any other Contributor, and only if You agree to indemnify,
defend, and hold each Contributor harmless for any liability
incurred by, or claims asserted against, such Contributor by reason
of your accepting any such warranty or additional liability.
END OF TERMS AND CONDITIONS
APPENDIX: How to apply the Apache License to your work.
To apply the Apache License to your work, attach the following
boilerplate notice, with the fields enclosed by brackets "[]"
replaced with your own identifying information. (Don't include
the brackets!) The text should be enclosed in the appropriate
comment syntax for the file format. We also recommend that a
file or class name and description of the purpose be included on the
same "printed page" as the copyright notice for easier
identification within third-party archives.
Copyright 2026 Kanara Technology
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
+101 -25
View File
@@ -1,12 +1,16 @@
# 🎙️ Standalone ONNX Real-Time Voice Changer Service
# 🎙️ ONNX VC - Standalone Real-Time Voice Changer
A high-performance, low-latency, real-time voice conversion system powered by **ONNX Runtime** and **Retrieval-based Voice Conversion (RVC)**. This application enables real-time voice conversion from a microphone/browser source to a designated target character model with minimal processing latency.
🌐 **Languages:** [English](README.md) | [Bahasa Indonesia](docs/README.id.md) | [Español](docs/README.es.md) | [日本語](docs/README.ja.md) | [简体中文](docs/README.zh.md)
A high-performance, low-latency, real-time AI voice conversion system powered by **ONNX Runtime** and **Retrieval-based Voice Conversion (RVC)**. Features a premium dashboard built with **Next.js App Router**, **TypeScript**, and **Tailwind CSS**, supporting full internationalization.
---
## ✨ Key Features
* **🚀 WebSocket Audio Pipeline:** Streaming audio transfer using binary WebSocket connections (raw PCM float32) for minimal overhead.
* **⚡ Multi-Backend ONNX Acceleration:** Supports execution providers including NVIDIA `CUDA`, AMD/Intel `DirectML`, and fallback `CPU`.
* **🌐 Universal Localisation:** Fully translatable interface supporting English, Indonesian, Japanese, Chinese, and Spanish.
* **🎨 Premium Dashboard**: Fully responsive workspace built using React 19, Radix UI, Framer Motion, and Tailwind CSS.
* **🎼 High-Fidelity DSP Pipeline:**
* **Low-Cut Filter:** Active 1st order Butterworth high-pass filter at 80Hz to eliminate AC hum and rumble.
* **Noise Gate:** Threshold-based noise suppression to bypass inference during silence (saving CPU/GPU cycles).
@@ -35,61 +39,122 @@ graph TD
---
## 📁 Repository Structure
* [server.py](file:///M:/Users/ahmad/project/onnx-voice-changer/server.py) — The main WebSocket backend and static HTTP server managing connection loops, audio resampling, and model execution.
* [server.py](file:///M:/Users/ahmad/project/onnx-voice-changer/server.py) — The main WebSocket backend server managing connection loops, audio resampling, and model execution.
* [start.bat](file:///M:/Users/ahmad/project/onnx-voice-changer/start.bat) — Windows launcher batch file that automatically resolves the Python virtual environment and executes the server.
* [requirements.txt](file:///M:/Users/ahmad/project/onnx-voice-changer/requirements.txt) — Python dependencies list.
* [frontend/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend) — Contains client-side Web UI files:
* [frontend/index.html](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend/index.html) — Control interface layout.
* [frontend/app.js](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend/app.js) — WebSocket communication and client-side audio rendering.
* [frontend/styles.css](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend/styles.css) — Custom dashboard styling.
* [lib/](file:///M:/Users/ahmad/project/onnx-voice-changer/lib) — core package containing inference models and prediction tools.
* [weights/](file:///M:/Users/ahmad/project/onnx-voice-changer/weights) — Directory for voice model weights. Place your custom `.onnx` and `.pth` model sub-directories here.
* [pretrained/](file:///M:/Users/ahmad/project/onnx-voice-changer/pretrained) — Directory containing base pre-trained models such as `vec-768-layer-12.onnx` or `vec-256-layer-12.onnx`.
* [frontend/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend) — The frontend client workspace built with Next.js (TypeScript, Tailwind CSS).
* [frontend-deprecated/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend-deprecated) — The old deprecated frontend code.
* [lib/](file:///M:/Users/ahmad/project/onnx-voice-changer/lib) — Core package containing inference models, ONNX conversion scripts, and prediction tools.
* [weights/](file:///M:/Users/ahmad/project/onnx-voice-changer/weights) — Directory for character voice model weights (e.g. `weights/HuTao/`).
* [pretrained/](file:///M:/Users/ahmad/project/onnx-voice-changer/pretrained) — Directory containing base pre-trained models.
---
## 🚀 Getting Started
## 🚀 Installation & Setup
### 📋 Prerequisites
* **Python 3.10+** (Recommended)
* **Python 3.10+**
* **FFmpeg** installed and added to the system PATH (Required for audio processing utilities).
* **Node.js 18+** & **npm** (Required to run the Next.js frontend client).
* (Optional) **NVIDIA CUDA Toolkit** (v11.x/12.x) and **cuDNN** for GPU execution acceleration.
### 📦 Installation
---
### 📦 1. Python Backend Installation
1. Clone this repository to your local directory.
2. Initialize and activate a virtual environment (optional but recommended):
2. Initialize and activate a virtual environment:
```bash
python -m venv venv
# On Windows:
.\venv\Scripts\activate
# On Linux/macOS:
source venv/bin/activate
```
3. Install the required dependencies:
```bash
pip install -r requirements.txt
```
4. Place your ContentVec base model (`vec-768-layer-12.onnx` or `vec-256-layer-12.onnx`) inside the [pretrained/](file:///M:/Users/ahmad/project/onnx-voice-changer/pretrained) directory.
5. Place your character models in [weights/](file:///M:/Users/ahmad/project/onnx-voice-changer/weights) in structured folders (e.g., `weights/HuTao/` containing `HuTao.onnx` and `HuTao.pth`).
### 🏃 Running the Server
---
#### Option A: Quick Launch (Windows)
Simply double-click the [start.bat](file:///M:/Users/ahmad/project/onnx-voice-changer/start.bat) file. It will automatically detect Python, set up the directory paths, and launch the service.
### 📥 2. Download Pre-trained ContentVec (Required)
The model requires a ContentVec base model to generate speaker features from voice chunks.
1. Download the `vec-768-layer-12.onnx` model from Hugging Face:
👉 **[Download vec-768-layer-12.onnx](https://huggingface.co/DogManTC/test-rvc-onnx/blob/main/vec-768-layer-12.onnx)**
2. Save the downloaded file inside the [pretrained/](file:///M:/Users/ahmad/project/onnx-voice-changer/pretrained) directory:
```
pretrained/
└── vec-768-layer-12.onnx
```
#### Option B: Manual CLI execution
Execute the server using your terminal:
---
### 🔄 3. Setup & Export RVC Models to ONNX
To run character models on ONNX Runtime, you must place your standard PyTorch RVC models (`.pth`) under the [weights/](file:///M:/Users/ahmad/project/onnx-voice-changer/weights) directory and convert them.
1. Create a sub-folder under `weights/` named after your character (e.g. `HuTao`):
```
weights/
└── HuTao/
└── HuTao.pth
```
2. Run the ONNX conversion script by passing the folder name of the model:
```bash
python lib/export_onnx.py --model_name HuTao
```
3. The script will automatically search for the `.pth` file inside `weights/HuTao/` and export a corresponding `HuTao.onnx` file inside the same directory:
```
weights/
└── HuTao/
├── HuTao.pth
└── HuTao.onnx
```
---
### 🖥️ 4. Running the Frontend Client
The frontend client runs as a standalone Next.js development server or built production server.
1. Navigate to the frontend directory:
```bash
cd frontend
```
2. Install npm dependencies:
```bash
npm install
```
3. Start the development server:
```bash
npm run dev
```
Open your browser and navigate to **`http://localhost:3000`**.
Alternatively, to build and run the production server:
```bash
python server.py --host 127.0.0.1 --port 8765 --http_port 8000 --device cuda
npm run build
npm run start
```
### ⚙️ Command-Line Arguments
---
## 🏃 Running the Voice Changer
### Step 1: Start the Python WebSocket Backend
Run the server using your terminal (defaults to port `8765`):
```bash
python server.py --host 127.0.0.1 --port 8765 --device cuda
```
#### ⚙️ Command-Line Arguments
| Argument | Description | Default |
|---|---|---|
| `--host` | The address the WebSocket server binds to. | `127.0.0.1` |
| `--port` | WebSocket communication port. | `8765` |
| `--http_port`| Port serving the static frontend Web UI. | `8000` |
| `--device` | The ONNX Runtime execution device (`cpu`, `cuda`, `dml`). | `cuda` |
| `--model` | Target folder name in `weights/` to load directly upon startup. | `None` |
Once the server begins execution, it will spin up the local server, and your Web UI should open automatically at `http://localhost:8000`.
### Step 2: Open the Frontend Dashboard
Make sure your frontend client is running (via `npm run dev` or `npm run start` on `http://localhost:3000`), open it in your browser, and it will automatically connect to the WebSocket API backend.
---
@@ -98,3 +163,14 @@ To achieve low latency without output artifacts, the audio processing utilizes:
1. **Sliding Window Context Buffer:** Keeps a short historical buffer of the audio to feed the model the required context frames while minimizing output audio delay.
2. **Convolution Padding Fadeout:** 120ms of trailing silent padding is temporarily appended to input segments to avoid edge-fading anomalies inherent to RVC convolutional steps.
3. **Linear Resampling:** Low-overhead linear interpolation for quick sample rate adaptation.
---
## 🤝 Credits & Acknowledgements
* **Made with ❤️ by [Kanara Technology](https://github.com/kanaratechnologyindonesia)** (Mirror: [git.kanara.tech](https://git.kanara.tech/kanara))
* Powered by [ONNX Runtime](https://onnxruntime.ai/) and [Retrieval-based Voice Conversion (RVC)](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI)
---
## 📄 License
This project is licensed under the Apache License 2.0. See the [LICENSE](file:///M:/Users/ahmad/project/onnx-voice-changer/LICENSE) file for details.
+176
View File
@@ -0,0 +1,176 @@
# 🎙️ ONNX VC - Standalone Real-Time Voice Changer
🌐 **Idiomas:** [English](../README.md) | [Bahasa Indonesia](README.id.md) | [Español](README.es.md) | [日本語](README.ja.md) | [简体中文](README.zh.md)
Un sistema de conversión de voz por IA en tiempo real, de alto rendimiento y baja latencia, impulsado por **ONNX Runtime** y **Retrieval-based Voice Conversion (RVC)**. Cuenta con un panel de control premium desarrollado con **Next.js App Router**, **TypeScript** y **Tailwind CSS**, con soporte para internacionalización completa.
---
## ✨ Características Clave
* **🚀 Canal de Audio WebSocket (Audio Pipeline):** Transferencia de audio en tiempo real mediante conexiones WebSocket binarias (PCM float32 sin procesar) para una latencia mínima.
* **⚡ Aceleración ONNX Multi-Backend:** Soporta proveedores de ejecución que incluyen NVIDIA `CUDA`, AMD/Intel `DirectML` y CPU como respaldo.
* **🌐 Localización Universal:** Interfaz completamente traducible que soporta inglés, indonesio, japonés, chino y español.
* **🎨 Panel de Control Premium:** Entorno de trabajo responsivo construido con React 19, Radix UI, Framer Motion y Tailwind CSS.
* **🎼 Canal DSP de Alta Fidelidad:**
* **Filtro de Corte Bajo (Low-Cut Filter):** Filtro de paso alto Butterworth activo de primer orden a 80 Hz para eliminar el zumbido de CA y los ruidos de fondo.
* **Puerta de Ruido (Noise Gate):** Supresión de ruido basada en umbral para omitir la inferencia durante el silencio (ahorrando ciclos de CPU/GPU).
* **Controles de Ganancia:** Control de ganancia digital independiente de entrada y salida.
* **🧠 Extracción Avanzada de Tono:** Predicción de tono optimizada a 16 kHz utilizando el modelo RMVPE (Retrieval-based Minimum Vocal Pitch Estimation).
* **🌐 Arquitectura de Enrutamiento Dual:** Permite enrutar el audio a través del navegador web (Web Audio API) o directamente mediante el hardware de audio local del servidor (utilizando `sounddevice`).
---
## 🛠️ Arquitectura del Sistema
```mermaid
graph TD
A[Micrófono / Navegador Web] -->|Web Audio API| B(Conexión WebSocket)
B -->|Fragmento de PCM Float32 sin procesar| C[Backend de server.py]
C -->|1. Filtro de paso alto 80Hz| D[Tahap DSP]
D -->|2. Ganancia y puerta de ruido| D
D -->|3. Remuestreo a 16kHz| E[Hubert/ContentVec ONNX]
D -->|4. Estimación de tono RMVPE| F[Predictor de tono]
E --> G[Inferencia del modelo RVC ONNX]
F --> G
G -->|Fragmentos de audio de destino| H(Conexión WebSocket)
H -->|Reproducir audio| I[Altavoces del navegador / Dispositivo de audio]
```
---
## 📁 Estructura del Repositorio
* [server.py](file:///M:/Users/ahmad/project/onnx-voice-changer/server.py) — El servidor backend WebSocket principal que gestiona los bucles de conexión, el remuestreo de audio y la ejecución del modelo.
* [start.bat](file:///M:/Users/ahmad/project/onnx-voice-changer/start.bat) — Archivo por lotes ejecutable para Windows que inicializa automáticamente el entorno virtual de Python y ejecuta el servidor.
* [requirements.txt](file:///M:/Users/ahmad/project/onnx-voice-changer/requirements.txt) — Lista de dependencias de Python.
* [frontend/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend) — El espacio de trabajo del cliente frontend construido con Next.js (TypeScript, Tailwind CSS).
* [frontend-deprecated/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend-deprecated) — Código frontend antiguo ya en desuso.
* [lib/](file:///M:/Users/ahmad/project/onnx-voice-changer/lib) — Paquete central que contiene los modelos de inferencia, scripts de conversión a ONNX y herramientas de predicción.
* [weights/](file:///M:/Users/ahmad/project/onnx-voice-changer/weights) — Directorio para los archivos de peso de los modelos de voz de personajes (ej. `weights/HuTao/`).
* [pretrained/](file:///M:/Users/ahmad/project/onnx-voice-changer/pretrained) — Directorio que contiene los modelos base preentrenados.
---
## 🚀 Instalación y Configuración
### 📋 Requisitos Previos
* **Python 3.10+**
* **FFmpeg** instalado y agregado al PATH del sistema (necesario para las utilidades de procesamiento de audio).
* **Node.js 18+** y **npm** (necesario para ejecutar el cliente frontend Next.js).
* (Opcional) **NVIDIA CUDA Toolkit** (v11.x/12.x) y **cuDNN** para aceleración de ejecución en GPU.
---
### 📦 1. Instalación del Backend de Python
1. Clone este repositorio en su directorio local.
2. Inicialice y active un entorno virtual:
```bash
python -m venv venv
# En Windows:
.\venv\Scripts\activate
# En Linux/macOS:
source venv/bin/activate
```
3. Instale las dependencias requeridas:
```bash
pip install -r requirements.txt
```
---
### 📥 2. Descargar ContentVec Preentrenado (Requerido)
El modelo requiere un modelo base ContentVec para generar las características del hablante a partir de los fragmentos de voz.
1. Descargue el modelo `vec-768-layer-12.onnx` desde Hugging Face:
👉 **[Descargar vec-768-layer-12.onnx](https://huggingface.co/DogManTC/test-rvc-onnx/blob/main/vec-768-layer-12.onnx)**
2. Guarde el archivo descargado dentro del directorio `pretrained/`:
```
pretrained/
└── vec-768-layer-12.onnx
```
---
### 🔄 3. Configuración y Exportación de Modelos RVC a ONNX
Para ejecutar modelos de personajes en ONNX Runtime, debe colocar sus modelos RVC estándar de PyTorch (`.pth`) bajo el directorio `weights/` y convertirlos.
1. Cree una subcarpeta bajo `weights/` con el nombre de su personaje (ej. `HuTao`):
```
weights/
└── HuTao/
└── HuTao.pth
```
2. Ejecute el script de conversión a ONNX pasando el nombre de la carpeta del modelo:
```bash
python lib/export_onnx.py --model_name HuTao
```
3. El script buscará automáticamente el archivo `.pth` dentro de `weights/HuTao/` y exportará el archivo `HuTao.onnx` correspondiente en el mismo directorio:
```
weights/
└── HuTao/
├── HuTao.pth
└── HuTao.onnx
```
---
### 🖥️ 4. Ejecución del Cliente Frontend
El cliente frontend se ejecuta como un servidor de desarrollo independiente de Next.js o como un servidor de producción compilado.
1. Navegue al directorio frontend:
```bash
cd frontend
```
2. Instale las dependencias de npm:
```bash
npm install
```
3. Inicie el servidor de desarrollo:
```bash
npm run dev
```
Abra su navegador y acceda a **`http://localhost:3000`**.
Alternativamente, para compilar y ejecutar el servidor de producción:
```bash
npm run build
npm run start
```
---
## 🏃 Funcionamiento del Cambiador de Voz
### Paso 1: Iniciar el Backend WebSocket de Python
Ejecute el servidor desde su terminal (puerto predeterminado `8765`):
```bash
python server.py --host 127.0.0.1 --port 8765 --device cuda
```
#### ⚙️ Argumentos de Línea de Comandos
| Argumento | Descripción | Predeterminado |
|---|---|---|
| `--host` | La dirección a la que se enlaza el servidor WebSocket. | `127.0.0.1` |
| `--port` | Puerto de comunicación WebSocket. | `8765` |
| `--device` | El dispositivo de ejecución de ONNX Runtime (`cpu`, `cuda`, `dml`). | `cuda` |
| `--model` | Nombre de la carpeta de destino en `weights/` para cargar directamente al inicio. | `None` |
### Paso 2: Abrir el Panel de Control Frontend
Asegúrese de que su cliente frontend esté ejecutándose (a través de `npm run dev` o `npm run start` en `http://localhost:3000`), ábralo en su navegador y este se conectará automáticamente al backend de la API WebSocket.
---
## 🔊 Detalles de DSP de Audio
Para lograr una latencia baja sin artefactos de salida, el procesamiento de audio utiliza:
1. **Búfer de Contexto de Ventana Deslizante:** Mantiene un búfer histórico corto de audio para alimentar al modelo con los fragmentos de contexto necesarios, minimizando el retraso de la salida de audio.
2. **Desvanecimiento por Relleno de Convolución:** Se añade temporalmente un relleno silencioso de 120 ms al final de los segmentos de entrada para evitar anomalías de desvanecimiento en los bordes, inherentes a los pasos de convolución RVC.
3. **Remuestreo Lineal:** Interpolación lineal de bajo costo computacional para una rápida adaptación de la frecuencia de muestreo.
---
## 🤝 Créditos y Agradecimientos
* **Hecho con ❤️ por [Kanara Technology](https://github.com/kanaratechnologyindonesia)** (Espejo: [git.kanara.tech](https://git.kanara.tech/kanara))
* Impulsado por [ONNX Runtime](https://onnxruntime.ai/) y [Retrieval-based Voice Conversion (RVC)](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI)
---
## 📄 Licencia
Este proyecto está licenciado bajo la Apache License 2.0. Consulte el archivo [LICENSE](file:///M:/Users/ahmad/project/onnx-voice-changer/LICENSE) para más detalles.
+176
View File
@@ -0,0 +1,176 @@
# 🎙️ ONNX VC - Standalone Real-Time Voice Changer
🌐 **Bahasa:** [English](../README.md) | [Bahasa Indonesia](README.id.md) | [Español](README.es.md) | [日本語](README.ja.md) | [简体中文](README.zh.md)
Sistem konversi suara AI real-time berkinerja tinggi dan latensi rendah yang ditenagai oleh **ONNX Runtime** dan **Retrieval-based Voice Conversion (RVC)**. Dilengkapi dengan dashboard premium yang dibuat menggunakan **Next.js App Router**, **TypeScript**, dan **Tailwind CSS**, serta mendukung internasionalisasi penuh.
---
## ✨ Fitur Utama
* **🚀 WebSocket Audio Pipeline:** Pengiriman audio streaming menggunakan koneksi WebSocket biner (raw PCM float32) untuk overhead minimal.
* **⚡ Akselerasi ONNX Multi-Backend:** Mendukung execution providers termasuk NVIDIA `CUDA`, AMD/Intel `DirectML`, dan fallback `CPU`.
* **🌐 Universal Localisation:** Antarmuka yang dapat diterjemahkan sepenuhnya, mendukung Bahasa Inggris, Indonesia, Jepang, Mandarin, dan Spanyol.
* **🎨 Dashboard Premium**: Halaman kerja responsif yang dibangun menggunakan React 19, Radix UI, Framer Motion, dan Tailwind CSS.
* **🎼 DSP Pipeline dengan Kualitas Tinggi:**
* **Low-Cut Filter:** Butterworth high-pass filter orde pertama aktif pada frekuensi 80Hz untuk menghilangkan hum AC dan gemuruh.
* **Noise Gate:** Penekanan derau berbasis ambang batas (threshold) untuk melewati inferensi saat hening (menghemat siklus CPU/GPU).
* **Gain Controls:** Pengaturan gain digital input/output independen.
* **🧠 Ekstraksi Pitch Canggih:** Prediksi pitch 16kHz yang dioptimalkan menggunakan model RMVPE (Retrieval-based Minimum Vocal Pitch Estimation).
* **🌐 Arsitektur Dual Routing:** Mendukung perutean audio melalui browser web (Web Audio API) atau langsung melalui perangkat keras audio lokal server (menggunakan `sounddevice`).
---
## 🛠️ Arsitektur Sistem
```mermaid
graph TD
A[Mikrofon / Browser Web] -->|Web Audio API| B(Koneksi WebSocket)
B -->|Chunk PCM Float32 Mentah| C[Backend server.py]
C -->|1. High-Pass Filter 80Hz| D[Tahap DSP]
D -->|2. Gain & Noise Gate| D
D -->|3. Resample ke 16kHz| E[Hubert/ContentVec ONNX]
D -->|4. Estimasi Pitch RMVPE| F[Prediktor Pitch]
E --> G[Inferensi Model ONNX RVC]
F --> G
G -->|Chunk Audio Target| H(Koneksi WebSocket)
H -->|Putar Audio| I[Speaker Browser / Perangkat Audio]
```
---
## 📁 Struktur Repositori
* [server.py](file:///M:/Users/ahmad/project/onnx-voice-changer/server.py) — Server backend WebSocket utama yang mengelola loop koneksi, resampling audio, dan eksekusi model.
* [start.bat](file:///M:/Users/ahmad/project/onnx-voice-changer/start.bat) — File batch peluncur Windows yang secara otomatis menyiapkan environment virtual Python dan menjalankan server.
* [requirements.txt](file:///M:/Users/ahmad/project/onnx-voice-changer/requirements.txt) — Daftar dependensi Python.
* [frontend/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend) — Ruang kerja klien frontend yang dibangun dengan Next.js (TypeScript, Tailwind CSS).
* [frontend-deprecated/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend-deprecated) — Kode frontend lama yang sudah usang.
* [lib/](file:///M:/Users/ahmad/project/onnx-voice-changer/lib) — Paket inti yang berisi model inferensi, skrip konversi ONNX, dan alat prediksi.
* [weights/](file:///M:/Users/ahmad/project/onnx-voice-changer/weights) — Direktori untuk bobot model suara karakter (contoh: `weights/HuTao/`).
* [pretrained/](file:///M:/Users/ahmad/project/onnx-voice-changer/pretrained) — Direktori yang berisi model dasar pra-terlatih.
---
## 🚀 Instalasi & Pengaturan
### 📋 Prasyarat
* **Python 3.10+**
* **FFmpeg** terinstal dan ditambahkan ke PATH sistem (Diperlukan untuk pemrosesan audio).
* **Node.js 18+** & **npm** (Diperlukan untuk menjalankan klien frontend Next.js).
* (Opsional) **NVIDIA CUDA Toolkit** (v11.x/12.x) dan **cuDNN** untuk akselerasi eksekusi GPU.
---
### 📦 1. Instalasi Backend Python
1. Klon repositori ini ke direktori lokal Anda.
2. Inisialisasi dan aktifkan virtual environment:
```bash
python -m venv venv
# Di Windows:
.\venv\Scripts\activate
# Di Linux/macOS:
source venv/bin/activate
```
3. Instal dependensi yang diperlukan:
```bash
pip install -r requirements.txt
```
---
### 📥 2. Unduh Pre-trained ContentVec (Diperlukan)
Model ini memerlukan model dasar ContentVec untuk menghasilkan fitur pembicara dari potongan suara.
1. Unduh model `vec-768-layer-12.onnx` dari Hugging Face:
👉 **[Unduh vec-768-layer-12.onnx](https://huggingface.co/DogManTC/test-rvc-onnx/blob/main/vec-768-layer-12.onnx)**
2. Simpan file yang diunduh di dalam direktori `pretrained/`:
```
pretrained/
└── vec-768-layer-12.onnx
```
---
### 🔄 3. Siapkan & Ekspor Model RVC ke ONNX
Untuk menjalankan model karakter pada ONNX Runtime, Anda harus menempatkan model RVC PyTorch standar Anda (`.pth`) di bawah direktori `weights/` dan mengonversinya.
1. Buat sub-folder di bawah `weights/` yang dinamai sesuai karakter Anda (contoh: `HuTao`):
```
weights/
└── HuTao/
└── HuTao.pth
```
2. Jalankan skrip konversi ONNX dengan memasukkan nama folder model:
```bash
python lib/export_onnx.py --model_name HuTao
```
3. Skrip akan secara otomatis mencari file `.pth` di dalam `weights/HuTao/` dan mengekspor file `HuTao.onnx` yang sesuai di dalam direktori yang sama:
```
weights/
└── HuTao/
├── HuTao.pth
└── HuTao.onnx
```
---
### 🖥️ 4. Menjalankan Klien Frontend
Klien frontend berjalan sebagai server pengembangan Next.js mandiri atau server produksi yang telah di-build.
1. Navigasi ke direktori frontend:
```bash
cd frontend
```
2. Instal dependensi npm:
```bash
npm install
```
3. Jalankan server pengembangan:
```bash
npm run dev
```
Buka browser Anda dan arahkan ke **`http://localhost:3000`**.
Atau, untuk membuat build dan menjalankan server produksi:
```bash
npm run build
npm run start
```
---
## 🏃 Menjalankan Pengubah Suara
### Langkah 1: Mulai Backend WebSocket Python
Jalankan server menggunakan terminal Anda (default ke port `8765`):
```bash
python server.py --host 127.0.0.1 --port 8765 --device cuda
```
#### ⚙️ Argumen Baris Perintah
| Argumen | Deskripsi | Default |
|---|---|---|
| `--host` | Alamat yang diikat oleh server WebSocket. | `127.0.0.1` |
| `--port` | Port komunikasi WebSocket. | `8765` |
| `--device` | Perangkat eksekusi ONNX Runtime (`cpu`, `cuda`, `dml`). | `cuda` |
| `--model` | Nama folder target di `weights/` untuk dimuat langsung saat memulai. | `None` |
### Langkah 2: Buka Dashboard Frontend
Pastikan klien frontend Anda berjalan (via `npm run dev` atau `npm run start` pada `http://localhost:3000`), buka di browser Anda, dan klien akan terhubung secara otomatis ke backend WebSocket API.
---
## 🔊 Detail DSP Audio
Untuk mencapai latensi rendah tanpa artifak output, pemrosesan audio menggunakan:
1. **Buffer Konteks Sliding Window:** Mempertahankan buffer historis pendek dari audio untuk memberikan frame konteks yang diperlukan ke model sambil meminimalkan penundaan audio output.
2. **Fadeout Padding Konvolusi:** Padding senyap trailing sebesar 120ms ditambahkan sementara ke segmen input untuk menghindari anomali memudar di tepi yang melekat pada langkah konvolusional RVC.
3. **Resampling Linear:** Penggunaan overhead rendah resampling linear untuk adaptasi laju sampel yang cepat.
---
## 🤝 Kredit & Penghargaan
* **Dibuat dengan ❤️ oleh [Kanara Technology](https://github.com/kanaratechnologyindonesia)** (Mirror: [git.kanara.tech](https://git.kanara.tech/kanara))
* Ditenagai oleh [ONNX Runtime](https://onnxruntime.ai/) dan [Retrieval-based Voice Conversion (RVC)](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI)
---
## 📄 Lisensi
Proyek ini dilisensikan di bawah Apache License 2.0. Lihat file [LICENSE](file:///M:/Users/ahmad/project/onnx-voice-changer/LICENSE) untuk informasi lebih lanjut.
+176
View File
@@ -0,0 +1,176 @@
# 🎙️ ONNX VC - Standalone Real-Time Voice Changer
🌐 **言語:** [English](../README.md) | [Bahasa Indonesia](README.id.md) | [Español](README.es.md) | [日本語](README.ja.md) | [简体中文](README.zh.md)
**ONNX Runtime**と**Retrieval-based Voice Conversion (RVC)**を搭載した、高性能・低遅延のリアルタイムAI音声変換システム。**Next.js App Router**、**TypeScript**、**Tailwind CSS**で構築されたプレミアムなダッシュボードを備え、完全な多言語対応をサポートしています。
---
## ✨ 主な機能
* **🚀 WebSocketオーディオパイプライン:** 遅延を最小限に抑えるため、バイナリWebSocket接続(生のPCM float32)を使用したストリーミングオーディオ転送。
* **⚡ マルチバックエンドONNXアクセラレーション:** NVIDIA `CUDA`、AMD/Intel `DirectML`、およびフォールバック用の`CPU`を含む実行プロバイダーをサポート。
* **🌐 ユニバーサルローカライズ:** 英語、インドネシア語、日本語、中国語、スペイン語をサポートする完全翻訳可能なインターフェース。
* **🎨 プレミアムダッシュボード:** React 19、Radix UI、Framer Motion、Tailwind CSSを使用して構築された完全レスポンシブなワークスペース。
* **🎼 高忠実度DSPパイプライン:**
* **ローカットフィルター:** ACハム音や低周波ノイズを除去する80Hzのアクティブ1次バターワースハイパスフィルター。
* **ノイズゲート:** 無音時のインフェレンスをスキップ(CPU/GPUサイクルを節約)するためのしきい値ベースのノイズ抑制。
* **ゲインコントロール:** 独立した入力/出力デジタルゲインスタージ。
* **🧠 高度なピッチ抽出:** RMVPERetrieval-based Minimum Vocal Pitch Estimation)モデルを使用した、最適化された16kHzピッチ予測。
* **🌐 デュアルルーティングアーキテクチャ:** Webブラウザ(Web Audio API)経由、またはサーバーのローカルオーディオハードウェア(`sounddevice`を使用)経由でのオーディオルーティングをサポート。
---
## 🛠️ システムアーキテクチャ
```mermaid
graph TD
A[マイク / Webブラウザ] -->|Web Audio API| B(WebSocket接続)
B -->|生のFloat32 PCMチャンク| C[server.py バックエンド]
C -->|1. ハイパスフィルター 80Hz| D[DSP段階]
D -->|2. ゲイン&ノイズゲート| D
D -->|3. 16kHzへのリサンプリング| E[Hubert/ContentVec ONNX]
D -->|4. ピッチ推定 RMVPE| F[ピッチ予測器]
E --> G[RVC ONNXモデル推論]
F --> G
G -->|ターゲットオーディオチャンク| H(WebSocket接続)
H -->|オーディオ再生| I[ブラウザスピーカー / オーディオデバイス]
```
---
## 📁 リポジトリ構成
* [server.py](file:///M:/Users/ahmad/project/onnx-voice-changer/server.py) — 接続ループ、オーディオリサンプリング、およびモデル実行を管理するメインのWebSocketバックエンドサーバー。
* [start.bat](file:///M:/Users/ahmad/project/onnx-voice-changer/start.bat) — Pythonの仮想環境を自動的に解決し、サーバーを実行するWindowsランチャーバッチファイル。
* [requirements.txt](file:///M:/Users/ahmad/project/onnx-voice-changer/requirements.txt) — Pythonの依存関係リスト。
* [frontend/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend) — Next.jsTypeScript、Tailwind CSS)で構築されたフロントエンドクライアントワークスペース。
* [frontend-deprecated/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend-deprecated) — 非推奨の古いフロントエンドコード。
* [lib/](file:///M:/Users/ahmad/project/onnx-voice-changer/lib) — 推論モデル、ONNX変換スクリプト、および予測ツールを含むコアパッケージ。
* [weights/](file:///M:/Users/ahmad/project/onnx-voice-changer/weights) — キャラクターボイスモデルの重み用のディレクトリ(例: `weights/HuTao/`)。
* [pretrained/](file:///M:/Users/ahmad/project/onnx-voice-changer/pretrained) — ベースとなる事前学習済みモデルを含むディレクトリ。
---
## 🚀 インストールとセットアップ
### 📋 前提条件
* **Python 3.10+**
* **FFmpeg** がインストールされ、システム環境変数 PATH に追加されていること(音声処理ユーティリティに必要)。
* **Node.js 18+** & **npm**(Next.jsフロントエンドクライアントの実行に必要)。
* (オプション)GPU実行アクセラレーション用の **NVIDIA CUDA Toolkit** (v11.x/12.x) および **cuDNN**
---
### 📦 1. Pythonバックエンドのインストール
1. このリポジトリをローカルディレクトリにクローンします。
2. 仮想環境を初期化して有効化します:
```bash
python -m venv venv
# Windowsの場合:
.\venv\Scripts\activate
# Linux/macOSの場合:
source venv/bin/activate
```
3. 必要な依存関係をインストールします:
```bash
pip install -r requirements.txt
```
---
### 📥 2. 事前学習済みContentVecのダウンロード(必須)
音声チャンクから話者特徴を生成するために、ContentVecベースモデルが必要です。
1. Hugging Faceから`vec-768-layer-12.onnx`モデルをダウンロードします:
👉 **[vec-768-layer-12.onnxをダウンロード](https://huggingface.co/DogManTC/test-rvc-onnx/blob/main/vec-768-layer-12.onnx)**
2. ダウンロードしたファイルを`pretrained/`ディレクトリ内に保存します:
```
pretrained/
└── vec-768-layer-12.onnx
```
---
### 🔄 3. RVCモデルのセットアップとONNXへのエクスポート
ONNX Runtimeでキャラクターモデルを実行するには、標準のPyTorch RVCモデル(`.pth`)を`weights/`ディレクトリに配置し、変換する必要があります。
1. `weights/`の下にキャラクター名(例: `HuTao`)のサブフォルダを作成します:
```
weights/
└── HuTao/
└── HuTao.pth
```
2. モデルのフォルダ名を指定してONNX変換スクリプトを実行します:
```bash
python lib/export_onnx.py --model_name HuTao
```
3. スクリプトは`weights/HuTao/`内の`.pth`ファイルを自動的に探索し、同じディレクトリ内に対応する`HuTao.onnx`ファイルをエクスポートします:
```
weights/
└── HuTao/
├── HuTao.pth
└── HuTao.onnx
```
---
### 🖥️ 4. フロントエンドクライアントの実行
フロントエンドクライアントは、スタンドアロンのNext.js開発サーバーまたはビルド済みの本番サーバーとして実行されます。
1. フロントエンドディレクトリに移動します:
```bash
cd frontend
```
2. npm依存関係をインストールします:
```bash
npm install
```
3. 開発サーバーを起動します:
```bash
npm run dev
```
ブラウザを開き、**`http://localhost:3000`**にアクセスします。
または、本番サーバーをビルドして実行する場合:
```bash
npm run build
npm run start
```
---
## 🏃 ボイスチェンジャーの実行
### ステップ 1Python WebSocketバックエンドの起動
ターミナルを使用してサーバーを実行します(デフォルトポートは`8765`):
```bash
python server.py --host 127.0.0.1 --port 8765 --device cuda
```
#### ⚙️ コマンドライン引数
| 引数 | 説明 | デフォルト |
|---|---|---|
| `--host` | WebSocketサーバーがバインドするアドレス。 | `127.0.0.1` |
| `--port` | WebSocket通信ポート。 | `8765` |
| `--device` | ONNX Runtimeの実行デバイス(`cpu`、`cuda`、`dml`)。 | `cuda` |
| `--model` | 起動時に直接読み込む`weights/`内のターゲットフォルダ名。 | `None` |
### ステップ 2:フロントエンドダッシュボードを開く
フロントエンドクライアントが実行されていること(`npm run dev`または`http://localhost:3000`での`npm run start`経由)を確認し、ブラウザで開くと、自動的にWebSocket APIバックエンドに接続されます。
---
## 🔊 オーディオDSPの詳細
出力アーティファクトなしで低遅延を実現するために、オーディオ処理では以下を使用しています:
1. **スライディングウィンドウコンテキストバッファ:** 出力オーディオの遅延を最小限に抑えながら、モデルに必要なコンテキストフレームを供給するために、オーディオの短い履歴バッファを保持します。
2. **畳み込みパディングフェードアウト:** RVCの畳み込みステップに固有のエッジフェード異常を回避するために、120msの末尾の無音パディングが入力セグメントに一時的に追加されます。
3. **線形リサンプリング:** 素早いサンプリングレート適応のための低オーバーヘッドの線形補間。
---
## 🤝 クレジットと謝辞
* **[Kanara Technology](https://github.com/kanaratechnologyindonesia)** (ミラー: [git.kanara.tech](https://git.kanara.tech/kanara)) **により ❤️ を込めて制作されました**
* [ONNX Runtime](https://onnxruntime.ai/) および [Retrieval-based Voice Conversion (RVC)](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI) を使用しています
---
## 📄 ライセンス
このプロジェクトは Apache License 2.0 の下でライセンスされています。詳細は [LICENSE](file:///M:/Users/ahmad/project/onnx-voice-changer/LICENSE) ファイルを参照してください。
+176
View File
@@ -0,0 +1,176 @@
# 🎙️ ONNX VC - Standalone Real-Time Voice Changer
🌐 **语言:** [English](../README.md) | [Bahasa Indonesia](README.id.md) | [Español](README.es.md) | [日本語](README.ja.md) | [简体中文](README.zh.md)
基于 **ONNX Runtime****检索式语音转换 (RVC)** 构建的高性能、低延迟实时 AI 变声系统。配有使用 **Next.js App Router**、**TypeScript** 和 **Tailwind CSS** 构建的高级仪表板,支持完整的国际化。
---
## ✨ 核心功能
* **🚀 WebSocket 音频传输管道:** 使用二进制 WebSocket 连接(原始 PCM float32)进行流式音频传输,确保最低的系统开销。
* **⚡ 多后端 ONNX 加速:** 支持包括 NVIDIA `CUDA`、AMD/Intel `DirectML` 以及备用 `CPU` 在内的多种执行提供程序。
* **🌐 通用本地化:** 支持英文、印尼文、日文、中文和西班牙文的完全可翻译界面。
* **🎨 高级仪表板:** 使用 React 19、Radix UI、Framer Motion 和 Tailwind CSS 构建的完全响应式工作区。
* **🎼 高保真 DSP 处理管道:**
* **低切滤波器:** 80Hz 处的主动一阶巴特沃斯高通滤波器,用以消除交流蜂鸣声和隆隆声。
* **噪声门:** 基于阈值的噪声抑制,可在静音期间绕过推理(以节省 CPU/GPU 周期)。
* **增益控制:** 独立输入/输出数字增益级。
* **🧠 先进的基频提取:** 使用 RMVPE (Retrieval-based Minimum Vocal Pitch Estimation) 模型优化 16kHz 基频预测。
* **🌐 双路由架构:** 支持通过 Web 浏览器(Web Audio API)或直接通过服务器的本地音频硬件(使用 `sounddevice`)进行音频路由。
---
## 🛠️ 系统架构
```mermaid
graph TD
A[麦克风 / Web 浏览器] -->|Web Audio API| B(WebSocket 连接)
B -->|原始 Float32 PCM 块| C[server.py 后端]
C -->|1. 高通滤波器 80Hz| D[DSP 阶段]
D -->|2. 增益与噪声门| D
D -->|3. 重采样至 16kHz| E[Hubert/ContentVec ONNX]
D -->|4. 基频估计 RMVPE| F[基频预测器]
E --> G[RVC ONNX 模型推理]
F --> G
G -->|目标音频块| H(WebSocket 连接)
H -->|播放音频| I[浏览器扬声器 / 音频设备]
```
---
## 📁 仓库结构
* [server.py](file:///M:/Users/ahmad/project/onnx-voice-changer/server.py) — 主要的 WebSocket 后端服务器,用于管理连接循环、音频重采样和模型执行。
* [start.bat](file:///M:/Users/ahmad/project/onnx-voice-changer/start.bat) — Windows 启动批处理文件,可自动解析 Python 虚拟环境并执行服务器。
* [requirements.txt](file:///M:/Users/ahmad/project/onnx-voice-changer/requirements.txt) — Python 依赖列表。
* [frontend/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend) — 使用 Next.jsTypeScript, Tailwind CSS)构建的前端客户端工作区。
* [frontend-deprecated/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend-deprecated) — 已弃用的旧前端代码。
* [lib/](file:///M:/Users/ahmad/project/onnx-voice-changer/lib) — 核心包,包含推理模型、ONNX 转换脚本和预测工具。
* [weights/](file:///M:/Users/ahmad/project/onnx-voice-changer/weights) — 角色声音模型权重目录(例如 `weights/HuTao/`)。
* [pretrained/](file:///M:/Users/ahmad/project/onnx-voice-changer/pretrained) — 包含基础预训练模型的目录。
---
## 🚀 安装与设置
### 📋 准备工作
* **Python 3.10+**
* 已安装 **FFmpeg** 并添加到系统 PATH 中(音频处理工具所必需)。
* **Node.js 18+** 和 **npm**(运行 Next.js 前端客户端所必需)。
* (可选)用于 GPU 执行加速的 **NVIDIA CUDA Toolkit** (v11.x/12.x) 和 **cuDNN**
---
### 📦 1. Python 后端安装
1. 将此仓库克隆到您的本地目录。
2. 初始化并激活虚拟环境:
```bash
python -m venv venv
# 在 Windows 下:
.\venv\Scripts\activate
# 在 Linux/macOS 下:
source venv/bin/activate
```
3. 安装所需依赖:
```bash
pip install -r requirements.txt
```
---
### 📥 2. 下载预训练 ContentVec(必需)
该模型需要 ContentVec 基础模型以从声音块生成说话者特征。
1. 从 Hugging Face 下载 `vec-768-layer-12.onnx` 模型:
👉 **[下载 vec-768-layer-12.onnx](https://huggingface.co/DogManTC/test-rvc-onnx/blob/main/vec-768-layer-12.onnx)**
2. 将下载的文件保存到 `pretrained/` 目录中:
```
pretrained/
└── vec-768-layer-12.onnx
```
---
### 🔄 3. 设置并导出 RVC 模型为 ONNX
要在 ONNX Runtime 上运行角色模型,您必须将标准 PyTorch RVC模型(`.pth`)放入 `weights/` 目录并进行转换。
1. 在 `weights/` 下创建一个以您的角色命名的子文件夹(例如 `HuTao`):
```
weights/
└── HuTao/
└── HuTao.pth
```
2. 通过传递模型的文件夹名称来运行 ONNX 转换脚本:
```bash
python lib/export_onnx.py --model_name HuTao
```
3. 脚本将自动在 `weights/HuTao/` 中搜索 `.pth` 文件,并在同一目录下导出相应的 `HuTao.onnx` 文件:
```
weights/
└── HuTao/
├── HuTao.pth
└── HuTao.onnx
```
---
### 🖥️ 4. 运行前端客户端
前端客户端可以作为独立的 Next.js 开发服务器或编译后的生产服务器运行。
1. 进入前端目录:
```bash
cd frontend
```
2. 安装 npm 依赖项:
```bash
npm install
```
3. 启动开发服务器:
```bash
npm run dev
```
打开浏览器并访问 **`http://localhost:3000`**。
或者,构建并运行生产服务器:
```bash
npm run build
npm run start
```
---
## 🏃 运行变声器
### 步骤 1:启动 Python WebSocket 后端
使用终端运行服务器(默认为端口 `8765`):
```bash
python server.py --host 127.0.0.1 --port 8765 --device cuda
```
#### ⚙️ 命令行参数
| 参数 | 说明 | 默认值 |
|---|---|---|
| `--host` | WebSocket 服务器绑定的地址。 | `127.0.0.1` |
| `--port` | WebSocket 通信端口。 | `8765` |
| `--device` | ONNX Runtime 执行设备(`cpu`、`cuda`、`dml`)。 | `cuda` |
| `--model` | 启动时直接加载的 `weights/` 中的目标文件夹名称。 | `None` |
### 步骤 2:打开前端仪表板
确保您的前端客户端正在运行(通过 `npm run dev` 或在 `http://localhost:3000` 上运行 `npm run start`),在浏览器中打开它,它将自动连接到 WebSocket API 后端。
---
## 音频 DSP 细节
为了在没有输出伪影的情况下实现低延迟,音频处理利用了:
1. **滑动窗口上下文缓冲区:** 保持较短的音频历史缓冲区,以向模型提供所需的上下文帧,同时最小化输出音频延迟。
2. **卷积填充淡出:** 在输入片段中临时追加 120ms 的尾随静音填充,以避免 RVC 卷积步骤中固有的边缘淡入淡出异常。
3. **线性重采样:** 低开销的线性插值,可快速适应采样率。
---
## 🤝 鸣谢与贡献
* **由 [Kanara Technology](https://github.com/kanaratechnologyindonesia)** (镜像: [git.kanara.tech](https://git.kanara.tech/kanara)) **用 ❤️ 制作**
* 基于 [ONNX Runtime](https://onnxruntime.ai/) 和 [Retrieval-based Voice Conversion (RVC)](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI)
---
## 📄 许可证
本项目采用 Apache 2.0 许可证进行授权。详情请参阅 [LICENSE](file:///M:/Users/ahmad/project/onnx-voice-changer/LICENSE) 文件。
+41
View File
@@ -0,0 +1,41 @@
# ⚠️ Deprecated Frontend
This directory contains the original, legacy single-page static HTML/CSS/JS frontend application for the ONNX Voice Changer.
---
## 🚫 Status: Deprecated
This frontend is **no longer maintained or active**.
### Why?
1. **Upgraded Dashboard:** The voice changer client has been completely refactored and rewritten as a premium **Next.js 15 (TypeScript & Tailwind CSS)** application, which is now located in the main [/frontend](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend) workspace.
2. **Pure API Backend:** The Python backend (`server.py`) has been simplified to run as a pure WebSocket API backend and no longer hosts static files.
---
## 📂 Files Included
* `index.html` — The legacy single-page layout.
* `styles.css` — Legacy stylesheets.
* `app.js` — Legacy Audio DSP processing and WebSocket integration.
---
## 🏃 Running the Deprecated Client (Reference Only)
If you still want to run this legacy frontend for reference, you can serve this directory using any static file server:
### Option A: Using Python
```bash
python -m http.server 3000
```
### Option B: Using Node (npx)
```bash
npx serve . -p 3000
```
After serving, open **`http://localhost:3000`** in your browser. Ensure the Python WebSocket backend (`server.py`) is running on port `8765`.
---
## 🤝 Credits & Acknowledgements
* **Made with ❤️ by [Kanara Technology](https://github.com/kanaratechnologyindonesia)** (Mirror: [git.kanara.tech](https://git.kanara.tech/kanara))
Binary file not shown.
Binary file not shown.
Binary file not shown.
+41
View File
@@ -0,0 +1,41 @@
# See https://help.github.com/articles/ignoring-files/ for more about ignoring files.
# dependencies
/node_modules
/.pnp
.pnp.*
.yarn/*
!.yarn/patches
!.yarn/plugins
!.yarn/releases
!.yarn/versions
# testing
/coverage
# next.js
/.next/
/out/
# production
/build
# misc
.DS_Store
*.pem
# debug
npm-debug.log*
yarn-debug.log*
yarn-error.log*
.pnpm-debug.log*
# env files (can opt-in for committing if needed)
.env*
# vercel
.vercel
# typescript
*.tsbuildinfo
next-env.d.ts
+5
View File
@@ -0,0 +1,5 @@
<!-- BEGIN:nextjs-agent-rules -->
# This is NOT the Next.js you know
This version has breaking changes — APIs, conventions, and file structure may all differ from your training data. Read the relevant guide in `node_modules/next/dist/docs/` before writing any code. Heed deprecation notices.
<!-- END:nextjs-agent-rules -->
+1
View File
@@ -0,0 +1 @@
@AGENTS.md
+54
View File
@@ -0,0 +1,54 @@
# 🎨 ONNX VC - Frontend Client Dashboard
The premium user dashboard for the **ONNX VC (Real-Time Voice Changer)**. Built using **Next.js 15 (App Router)**, **React 19**, **TypeScript**, and **Tailwind CSS**, providing a high-fidelity control panel for real-time AI voice conversion.
---
## ✨ Key Features
* **🌐 Complete Internationalization (i18n):** Supports English, Indonesian, Spanish, Japanese, and Chinese.
* **🌓 Dark Mode & Custom Themes:** Seamless dark/light theme switching with custom accent colors (Purple, Blue, Emerald, Rose, Amber).
* **📊 Dual Waveform Visualizer:** Displays real-time input and output audio waveform graphs side-by-side in a single row for compact, effective monitoring.
* **📱 Collapsible Sidebar:** Optimized UI layout with a smooth collapsible sidebar for managing settings.
* **🎛️ Interactive DSP Controls:** Easily adjust input/output gain staging, active 80Hz low-cut filters, noise gates, and pitch transposition.
* **⚙️ Dual Routing Toggle:** Switch between browser-side routing (Web Audio API) and server-side hardware routing (`sounddevice`).
---
## 🚀 Getting Started
### 📋 Prerequisites
* **Node.js 18+**
* **npm**, **yarn**, or **pnpm** package manager
### 📦 Installation
Navigate to this directory and install the project dependencies:
```bash
npm install
```
### 🏃 Running the Development Server
Start the Next.js local development server:
```bash
npm run dev
```
Open **[http://localhost:3000](http://localhost:3000)** in your web browser.
### 🏗️ Building for Production
To build the application for optimized production performance:
```bash
npm run build
npm run start
```
---
## 🔌 WebSocket Connection
The frontend connects to the Python WebSocket backend to stream binary audio chunks and receive converted audio.
* **Default backend URL:** `ws://127.0.0.1:8765`
* Ensure your backend `server.py` is running before starting voice conversion from the dashboard.
---
## 🤝 Credits & Acknowledgements
* **Made with ❤️ by [Kanara Technology](https://github.com/kanaratechnologyindonesia)** (Mirror: [git.kanara.tech](https://git.kanara.tech/kanara))
* Powered by [Next.js](https://nextjs.org/) and [Tailwind CSS](https://tailwindcss.com/)
-744
View File
@@ -1,744 +0,0 @@
/**
* Omni Real-Time Voice Changer - Client App
* High-performance browser-based mic streaming and RVC playback.
*/
// UI Elements
const wsUrlInput = document.getElementById('ws_url');
const connectionStatus = document.getElementById('connection_status');
const connectBtn = document.getElementById('connect_btn');
const streamBtn = document.getElementById('stream_btn');
const playToggleBtn = document.getElementById('play_toggle_btn');
const modelSelect = document.getElementById('model_select');
const deviceSelect = document.getElementById('device_select');
const transposeSlider = document.getElementById('transpose_slider');
const transposeVal = document.getElementById('transpose_val');
const gateSlider = document.getElementById('gate_slider');
const gateVal = document.getElementById('gate_val');
const inputGainSlider = document.getElementById('input_gain_slider');
const inputGainVal = document.getElementById('input_gain_val');
const outputGainSlider = document.getElementById('output_gain_slider');
const outputGainVal = document.getElementById('output_gain_val');
const chunkSelect = document.getElementById('chunk_select');
const noiseCancelCheckbox = document.getElementById('noise_cancel_checkbox');
const routingModeSelect = document.getElementById('routing_mode_select');
const hardwareDevicesPanel = document.getElementById('hardware_devices_panel');
const serverInputSelect = document.getElementById('server_input_select');
const serverOutputSelect = document.getElementById('server_output_select');
const browserNoiseCancelGroup = document.getElementById('browser_noise_cancel_group');
const presetLatencyBtn = document.getElementById('preset_latency_btn');
const presetQualityBtn = document.getElementById('preset_quality_btn');
const inputCanvas = document.getElementById('input_canvas');
const outputCanvas = document.getElementById('output_canvas');
const hudLatency = document.getElementById('hud_latency');
const hudTime = document.getElementById('hud_time');
const hudGateStatus = document.getElementById('hud_gate_status');
const hudSr = document.getElementById('hud_sr');
// Audio Visualizer Contexts
const inputCtx = inputCanvas.getContext('2d');
const outputCtx = outputCanvas.getContext('2d');
// Web Audio State
let audioContext = null;
let micStream = null;
let micSourceNode = null;
let scriptProcessorNode = null;
let micAccumulator = new Float32Array(0); // Accumulates audio for large/custom chunk sizes
// WebSocket State
let socket = null;
let isStreaming = false;
let playOutput = true;
let targetSampleRate = 40000; // RVC Model default, updated dynamically
// Playback Sync State
let nextPlaybackTime = 0;
const safetyDelay = 0.10; // 100ms buffer to absorb network/websocket jitter (increased for perfect smoothness!)
// Latency Tracking Queues
let sentTimestamps = [];
const maxSentLogs = 50;
// --- SMOOTH VISUALIZER (Rolling Display Buffers + RAF loop) ---
// Fixed display buffer size: ~85ms window looks great at all chunk sizes.
const VIS_DISPLAY_SIZE = 4096;
let inputDisplayBuf = new Float32Array(VIS_DISPLAY_SIZE); // rolling input (updated ~85ms)
let outputDisplayBuf = new Float32Array(VIS_DISPLAY_SIZE); // fallback for hardware mode
let rafHandle = null;
// Time-synced output queue: each entry = { data: Float32Array, startTime: number (audioCtx seconds) }
let outputChunkQueue = [];
function pushToDisplayBuf(displayBuf, newSamples) {
if (newSamples.length >= VIS_DISPLAY_SIZE) {
displayBuf.set(newSamples.slice(newSamples.length - VIS_DISPLAY_SIZE));
} else {
displayBuf.copyWithin(0, newSamples.length);
displayBuf.set(newSamples, VIS_DISPLAY_SIZE - newSamples.length);
}
}
// Build a VIS_DISPLAY_SIZE window of output samples ending at audioContext.currentTime
function buildTimeSyncedOutputBuf() {
if (!audioContext || outputChunkQueue.length === 0) return outputDisplayBuf;
const now = audioContext.currentTime;
const windowDuration = VIS_DISPLAY_SIZE / targetSampleRate;
const windowStart = now - windowDuration;
// Drop chunks that ended before our window start
while (outputChunkQueue.length > 0) {
const c = outputChunkQueue[0];
if (c.startTime + c.data.length / targetSampleRate < windowStart) {
outputChunkQueue.shift();
} else break;
}
const out = new Float32Array(VIS_DISPLAY_SIZE);
for (const chunk of outputChunkQueue) {
const chunkEnd = chunk.startTime + chunk.data.length / targetSampleRate;
// Overlap between [windowStart, now] and [chunk.startTime, chunkEnd]
const overlapStart = Math.max(windowStart, chunk.startTime);
const overlapEnd = Math.min(now, chunkEnd);
if (overlapStart >= overlapEnd) continue;
const srcOffset = Math.floor((overlapStart - chunk.startTime) * targetSampleRate);
const destOffset = Math.floor((overlapStart - windowStart) * targetSampleRate);
const count = Math.floor((overlapEnd - overlapStart) * targetSampleRate);
const safeCount = Math.min(count,
chunk.data.length - srcOffset,
VIS_DISPLAY_SIZE - destOffset);
if (safeCount > 0) out.set(chunk.data.subarray(srcOffset, srcOffset + safeCount), destOffset);
}
return out;
}
function startVisualizerLoop() {
if (rafHandle) return;
function frame() {
drawWaveform(inputDisplayBuf, inputCanvas, '#6366f1');
// Time-synced output: scrub through queued chunks using audioContext clock
drawWaveform(buildTimeSyncedOutputBuf(), outputCanvas, '#a855f7');
rafHandle = requestAnimationFrame(frame);
}
rafHandle = requestAnimationFrame(frame);
}
function stopVisualizerLoop() {
if (rafHandle) {
cancelAnimationFrame(rafHandle);
rafHandle = null;
}
outputChunkQueue = [];
}
// Setup Canvas Sizes dynamically
function resizeCanvases() {
inputCanvas.width = inputCanvas.clientWidth * window.devicePixelRatio;
inputCanvas.height = inputCanvas.clientHeight * window.devicePixelRatio;
outputCanvas.width = outputCanvas.clientWidth * window.devicePixelRatio;
outputCanvas.height = outputCanvas.clientHeight * window.devicePixelRatio;
}
resizeCanvases();
window.addEventListener('resize', resizeCanvases);
// Connect / Disconnect WebSocket
connectBtn.addEventListener('click', () => {
if (socket && (socket.readyState === WebSocket.OPEN || socket.readyState === WebSocket.CONNECTING)) {
disconnectServer();
} else {
connectServer();
}
});
function connectServer() {
const url = wsUrlInput.value.trim();
updateConnectionStatus('connecting');
try {
socket = new WebSocket(url);
socket.binaryType = 'arraybuffer';
socket.onopen = () => {
console.log('Connected to RVC Server');
updateConnectionStatus('connected');
sendConfigToServer(); // Send initial configurations
streamBtn.disabled = false;
playToggleBtn.disabled = false;
};
socket.onclose = () => {
console.log('WebSocket Connection Closed');
disconnectServer();
};
socket.onerror = (err) => {
console.error('WebSocket Error:', err);
disconnectServer();
};
socket.onmessage = (event) => {
if (typeof event.data === 'string') {
// Config or control response
try {
const response = JSON.parse(event.data);
if (response.type === 'config_success') {
targetSampleRate = response.target_sr;
console.log('Server configuration synced successfully:', response);
} else if (response.type === 'init_devices') {
populateServerDevices(response.devices, response.default_input, response.default_output);
} else if (response.type === 'visualizer') {
// Feed rolling display buffers — RAF loop handles drawing at 60fps
pushToDisplayBuf(inputDisplayBuf, new Float32Array(response.input));
pushToDisplayBuf(outputDisplayBuf, new Float32Array(response.output));
if (!rafHandle) startVisualizerLoop();
} else if (response.type === 'error') {
alert('Server Error: ' + response.message);
}
} catch (e) {
console.error('Error parsing text message:', e);
}
} else if (event.data instanceof ArrayBuffer) {
// Binary processed PCM audio chunk returned from server (Browser Mode only)
handleServerAudioChunk(event.data);
}
};
} catch (e) {
console.error('Connection failed:', e);
disconnectServer();
}
}
function disconnectServer() {
if (isStreaming) {
stopStreaming();
}
if (socket) {
try {
socket.close();
} catch (e) {}
socket = null;
}
updateConnectionStatus('disconnected');
streamBtn.disabled = true;
playToggleBtn.disabled = true;
}
function updateConnectionStatus(status) {
connectionStatus.className = 'status-badge ' + status;
if (status === 'connected') {
connectionStatus.textContent = 'Terhubung';
connectBtn.textContent = 'Putuskan Server';
connectBtn.className = 'btn btn-primary';
} else if (status === 'connecting') {
connectionStatus.textContent = 'Menghubungkan';
connectBtn.textContent = 'Batal';
} else {
connectionStatus.textContent = 'Terputus';
connectBtn.textContent = 'Hubungkan Server';
connectBtn.className = 'btn btn-primary';
}
}
// Config synchronization
function sendConfigToServer() {
if (!socket || socket.readyState !== WebSocket.OPEN) return;
const activeF0 = document.querySelector('input[name="f0_method"]:checked').value;
const config = {
type: 'config',
model_name: modelSelect.value,
device: deviceSelect.value,
f0_method: activeF0,
f0_up_key: parseInt(transposeSlider.value),
noise_gate: parseFloat(gateSlider.value),
input_gain: parseFloat(inputGainSlider.value),
output_gain: parseFloat(outputGainSlider.value),
input_sr: audioContext ? audioContext.sampleRate : 44100,
routing_mode: routingModeSelect.value,
input_device: serverInputSelect.value ? parseInt(serverInputSelect.value) : null,
output_device: serverOutputSelect.value ? parseInt(serverOutputSelect.value) : null,
chunk_size: parseInt(chunkSelect.value)
};
socket.send(jsonEncode(config));
console.log('Sent configuration change:', config);
}
// Populate Server Audio Devices dropdowns
function populateServerDevices(devices, defaultInput, defaultOutput) {
serverInputSelect.innerHTML = '';
serverOutputSelect.innerHTML = '';
if (devices.length === 0) {
const optIn = document.createElement('option');
optIn.textContent = 'Tidak ada mic terdeteksi di server';
serverInputSelect.appendChild(optIn);
const optOut = document.createElement('option');
optOut.textContent = 'Tidak ada output terdeteksi di server';
serverOutputSelect.appendChild(optOut);
return;
}
devices.forEach(device => {
if (device.max_input_channels > 0) {
const opt = document.createElement('option');
opt.value = device.id;
opt.textContent = `[ID ${device.id}] ${device.name}`;
if (device.id === defaultInput) opt.selected = true;
serverInputSelect.appendChild(opt);
}
if (device.max_output_channels > 0) {
const opt = document.createElement('option');
opt.value = device.id;
opt.textContent = `[ID ${device.id}] ${device.name}`;
if (device.id === defaultOutput) opt.selected = true;
serverOutputSelect.appendChild(opt);
}
});
console.log('Successfully populated server hardware devices in UI.');
}
// UI Event Listeners to trigger instant sync
modelSelect.addEventListener('change', sendConfigToServer);
deviceSelect.addEventListener('change', sendConfigToServer);
document.querySelectorAll('input[name="f0_method"]').forEach(radio => {
radio.addEventListener('change', sendConfigToServer);
});
transposeSlider.addEventListener('input', () => {
transposeVal.textContent = (transposeSlider.value >= 0 ? '+' : '') + transposeSlider.value + ' semitone';
});
transposeSlider.addEventListener('change', sendConfigToServer);
gateSlider.addEventListener('input', () => {
gateVal.textContent = gateSlider.value + ' dB';
});
gateSlider.addEventListener('change', sendConfigToServer);
inputGainSlider.addEventListener('input', () => {
inputGainVal.textContent = parseFloat(inputGainSlider.value).toFixed(1) + 'x';
});
inputGainSlider.addEventListener('change', sendConfigToServer);
outputGainSlider.addEventListener('input', () => {
outputGainVal.textContent = parseFloat(outputGainSlider.value).toFixed(1) + 'x';
});
outputGainSlider.addEventListener('change', sendConfigToServer);
chunkSelect.addEventListener('change', () => {
// Reinitialize stream if buffer size is changed during active streaming
if (isStreaming) {
stopStreaming();
startStreaming();
}
});
noiseCancelCheckbox.addEventListener('change', () => {
// Reinitialize microphone with new noise cancellation constraints if streaming
if (isStreaming) {
stopStreaming();
startStreaming();
}
});
// Helper to dynamically adjust UI layout based on Routing Mode
function applyAudioRoutingUI() {
if (routingModeSelect.value === 'hardware') {
hardwareDevicesPanel.style.display = 'block';
playToggleBtn.style.display = 'none'; // Hide browser-only "Mendengarkan" button
browserNoiseCancelGroup.style.display = 'none'; // Hide browser-only Noise Cancel checkbox
} else {
hardwareDevicesPanel.style.display = 'none';
playToggleBtn.style.display = 'inline-block'; // Show browser-only "Mendengarkan" button
browserNoiseCancelGroup.style.display = 'block'; // Show browser-only Noise Cancel checkbox
}
}
// Routing Mode Event Listeners
routingModeSelect.addEventListener('change', () => {
applyAudioRoutingUI();
sendConfigToServer();
if (isStreaming) {
stopStreaming();
startStreaming();
}
});
serverInputSelect.addEventListener('change', sendConfigToServer);
serverOutputSelect.addEventListener('change', sendConfigToServer);
// Quick Presets Event Listeners
presetLatencyBtn.addEventListener('click', () => {
const radioPM = document.querySelector('input[name="f0_method"][value="pm"]');
if (radioPM) radioPM.checked = true;
chunkSelect.value = "8192";
console.log("Preset loaded: Latency (PM + 8192)");
sendConfigToServer();
if (isStreaming) {
stopStreaming();
startStreaming();
}
});
presetQualityBtn.addEventListener('click', () => {
const radioRMVPE = document.querySelector('input[name="f0_method"][value="rmvpe"]');
if (radioRMVPE) radioRMVPE.checked = true;
chunkSelect.value = "16384";
console.log("Preset loaded: Quality (RMVPE + 16384)");
sendConfigToServer();
if (isStreaming) {
stopStreaming();
startStreaming();
}
});
// Helper functions for UI JSON safely
function jsonEncode(obj) {
return JSON.stringify(obj);
}
playToggleBtn.addEventListener('click', () => {
playOutput = !playOutput;
if (playOutput) {
playToggleBtn.textContent = '🔊 Mendengarkan: AKTIF';
playToggleBtn.className = 'btn btn-primary';
} else {
playToggleBtn.textContent = '🔇 Mendengarkan: SENYAP';
playToggleBtn.className = 'btn btn-accent';
}
});
// Stream Toggle
streamBtn.addEventListener('click', () => {
if (isStreaming) {
stopStreaming();
} else {
startStreaming();
}
});
async function startStreaming() {
isStreaming = true;
streamBtn.textContent = 'Hentikan Pengubah Suara';
streamBtn.className = 'btn btn-primary';
const isHardwareMode = (routingModeSelect.value === 'hardware');
if (isHardwareMode) {
// --- SERVER HARDWARE ROUTING MODE ---
inputDisplayBuf = new Float32Array(VIS_DISPLAY_SIZE);
outputDisplayBuf = new Float32Array(VIS_DISPLAY_SIZE);
startVisualizerLoop();
sendConfigToServer(); // Sends config with routing_mode: 'hardware' which triggers stream start on server
console.log('Server Hardware Mode initialized.');
return;
}
// --- CLIENT BROWSER MODE ---
// 1. Create AudioContext if not active
if (!audioContext) {
audioContext = new (window.AudioContext || window.webkitAudioContext)({
latencyHint: 'interactive'
});
}
if (audioContext.state === 'suspended') {
await audioContext.resume();
}
hudSr.textContent = audioContext.sampleRate + ' Hz';
sendConfigToServer(); // sync actual input sample rate
// 2. Request user microphone with high-fidelity, lowest possible latency constraints
try {
const useNoiseCancel = noiseCancelCheckbox.checked;
micStream = await navigator.mediaDevices.getUserMedia({
audio: {
echoCancellation: useNoiseCancel,
noiseSuppression: useNoiseCancel,
autoGainControl: useNoiseCancel
}
});
micSourceNode = audioContext.createMediaStreamSource(micStream);
// 3. Create Audio Processing Loop Node (ScriptProcessorNode)
// BaseAudioContext's createScriptProcessor buffer size MUST be a power of two between 256 and 16384.
// We use a fixed, highly supported buffer size of 4096 for recording, and accumulate samples in-memory
// to support ANY arbitrary or extremely large chunk size (like 12288, 24576, 32768) selected by the user!
const recordBufferSize = 4096;
scriptProcessorNode = audioContext.createScriptProcessor(recordBufferSize, 1, 1);
scriptProcessorNode.onaudioprocess = (event) => {
if (!isStreaming) return;
const inputBuffer = event.inputBuffer;
const inputData = inputBuffer.getChannelData(0); // 4096 samples
// Push latest mic samples into the rolling display buffer every callback (~85ms)
pushToDisplayBuf(inputDisplayBuf, inputData);
// Append incoming recorded samples to our accumulator
const temp = new Float32Array(micAccumulator.length + inputData.length);
temp.set(micAccumulator);
temp.set(inputData, micAccumulator.length);
micAccumulator = temp;
const targetChunkSize = parseInt(chunkSelect.value);
// Process and send chunks of the user's selected target size
while (micAccumulator.length >= targetChunkSize) {
const chunkToSend = micAccumulator.slice(0, targetChunkSize);
micAccumulator = micAccumulator.slice(targetChunkSize); // Keep remainder
// Voice Activity Detection for gate status badge
let maxVal = 0;
for (let i = 0; i < chunkToSend.length; i++) maxVal = Math.max(maxVal, Math.abs(chunkToSend[i]));
if (maxVal > 0.005) {
hudGateStatus.textContent = 'Bicara';
hudGateStatus.className = 'hud-value active-badge';
} else {
hudGateStatus.textContent = 'Berdiam';
hudGateStatus.className = 'hud-value text-muted';
}
// Send binary PCM Float32 audio chunk of target size to Python Server
if (socket && socket.readyState === WebSocket.OPEN) {
const packetTime = performance.now();
sentTimestamps.push({ id: packetTime, sent: packetTime });
if (sentTimestamps.length > maxSentLogs) {
sentTimestamps.shift();
}
socket.send(chunkToSend.buffer); // Send direct array buffer
}
}
};
micSourceNode.connect(scriptProcessorNode);
scriptProcessorNode.connect(audioContext.destination); // Required to trigger onaudioprocess
// Reset playback sync clock
nextPlaybackTime = 0;
micAccumulator = new Float32Array(0); // Reset accumulator
inputDisplayBuf = new Float32Array(VIS_DISPLAY_SIZE);
outputDisplayBuf = new Float32Array(VIS_DISPLAY_SIZE);
startVisualizerLoop();
console.log('Browser Streaming active. Recording buffer size: 4096 | Target chunk size:', chunkSelect.value);
} catch (e) {
console.error('Failed to access microphone:', e);
alert('Gagal mengakses mikrofon Anda: ' + e.message);
stopStreaming();
}
}
function stopStreaming() {
isStreaming = false;
streamBtn.textContent = 'Mulai Mengubah Suara';
streamBtn.className = 'btn btn-accent';
playOutput = true;
playToggleBtn.textContent = '🔊 Mendengarkan: AKTIF';
playToggleBtn.className = 'btn btn-primary';
const isHardwareMode = (routingModeSelect.value === 'hardware');
if (isHardwareMode) {
// --- SERVER HARDWARE ROUTING MODE ---
if (socket && socket.readyState === WebSocket.OPEN) {
const config = {
type: 'config',
routing_mode: 'browser' // Tells server to stop local hardware stream
};
socket.send(jsonEncode(config));
}
console.log('Server Hardware Mode stopped.');
hudGateStatus.textContent = 'Berdiam';
hudGateStatus.className = 'hud-value text-muted';
hudLatency.textContent = '-- ms';
hudTime.textContent = '-- ms';
stopVisualizerLoop();
inputDisplayBuf = new Float32Array(VIS_DISPLAY_SIZE);
outputDisplayBuf = new Float32Array(VIS_DISPLAY_SIZE);
clearCanvas(inputCanvas);
clearCanvas(outputCanvas);
return;
}
// --- CLIENT BROWSER MODE ---
// Stop microphone stream tracks
if (micStream) {
micStream.getTracks().forEach(track => track.stop());
micStream = null;
}
// Disconnect Web Audio nodes
if (micSourceNode) {
micSourceNode.disconnect();
micSourceNode = null;
}
if (scriptProcessorNode) {
scriptProcessorNode.disconnect();
scriptProcessorNode = null;
}
micAccumulator = new Float32Array(0); // Reset accumulator
stopVisualizerLoop();
inputDisplayBuf = new Float32Array(VIS_DISPLAY_SIZE);
outputDisplayBuf = new Float32Array(VIS_DISPLAY_SIZE);
hudGateStatus.textContent = 'Berdiam';
hudGateStatus.className = 'hud-value text-muted';
hudLatency.textContent = '-- ms';
hudTime.textContent = '-- ms';
clearCanvas(inputCanvas);
clearCanvas(outputCanvas);
}
// Seamless Audio Playback Scheduler (Absorbs WebSocket & processing jitter)
function handleServerAudioChunk(arrayBuffer) {
if (!isStreaming) return;
// 1. Measure Round-Trip Time Latency (RTT)
const now = performance.now();
let rtt = 0;
if (sentTimestamps.length > 0) {
const oldestSent = sentTimestamps.shift();
rtt = now - oldestSent.sent;
hudLatency.textContent = Math.round(rtt) + ' ms';
}
// Convert arrayBuffer to Float32 samples
const payload = new Float32Array(arrayBuffer);
const processingTime = payload[0]; // first float32 is the server processing time in ms
const pcmData = payload.subarray(1); // the rest is the audio
// 2. Schedule chunk smoothly inside the AudioContext timeline
const audioBuf = audioContext.createBuffer(1, pcmData.length, targetSampleRate);
audioBuf.getChannelData(0).set(pcmData);
const source = audioContext.createBufferSource();
source.buffer = audioBuf;
if (playOutput) {
source.connect(audioContext.destination);
}
// Calculate precise playback clock scheduling
const currentTime = audioContext.currentTime;
const chunkDuration = audioBuf.duration; // actual chunk duration in seconds
// Adaptive buffer: enough headroom so next chunk always arrives before this one ends.
// 2.5× chunk or 500ms cap — absorbs even 300ms+ processing spikes.
const adaptiveBuf = Math.min(chunkDuration * 2.5, 0.50);
if (nextPlaybackTime < currentTime) {
// Clock behind — first chunk or dropout recovery.
// Use full adaptiveBuf on BOTH cases so recovery fully rebuilds headroom.
// (0.5× recovery was causing cascading dropouts: one late chunk → the next also late)
nextPlaybackTime = currentTime + adaptiveBuf;
} else if (nextPlaybackTime > currentTime + chunkDuration * 5.0) {
// --- ADAPTIVE LATENCY BUSTER ---
// Only snap when queue is >5 chunk-durations ahead (genuine backlog, not normal look-ahead).
// At 8192 (170ms): threshold = 850ms
// At 65536 (1.6s): threshold = 8s
const snapTarget = currentTime + adaptiveBuf;
console.log(`Latency Buster: ${Math.round((nextPlaybackTime-currentTime)*1000)}ms → ${Math.round(adaptiveBuf*1000)}ms`);
nextPlaybackTime = snapTarget;
}
// Record schedule start time BEFORE advancing the clock (for time-synced visualizer)
const scheduleStartTime = nextPlaybackTime;
// Schedule play
source.start(nextPlaybackTime);
hudTime.textContent = Math.max(0, Math.round(processingTime)) + ' ms';
// Advance playback sync clock
nextPlaybackTime += audioBuf.duration;
// Push to time-synced output queue for visualizer (keyed by when audio actually plays)
outputChunkQueue.push({ data: pcmData, startTime: scheduleStartTime });
// Keep queue bounded to ~10 seconds of audio max
while (outputChunkQueue.length > 0) {
const c = outputChunkQueue[0];
if (c.startTime + c.data.length / targetSampleRate < audioContext.currentTime - 2.0) {
outputChunkQueue.shift();
} else break;
}
}
// --- VISUALIZATION / DRAWING ROUTINES ---
function drawWaveform(dataArray, canvas, strokeColor) {
const ctx = canvas.getContext('2d');
const width = canvas.width;
const height = canvas.height;
// Dark transparent redraw for trace/motion-blur effect
ctx.fillStyle = 'rgba(11, 12, 19, 0.4)';
ctx.fillRect(0, 0, width, height);
ctx.lineWidth = 2 * window.devicePixelRatio;
ctx.strokeStyle = strokeColor;
ctx.beginPath();
const sliceWidth = width / dataArray.length;
let x = 0;
for (let i = 0; i < dataArray.length; i++) {
// Center the wave around half-height and scale scale amplitude
const v = dataArray[i] * 1.5;
const y = (v * (height / 2)) + (height / 2);
if (i === 0) {
ctx.moveTo(x, y);
} else {
ctx.lineTo(x, y);
}
x += sliceWidth;
}
ctx.lineTo(width, height / 2);
ctx.stroke();
// Draw a subtle baseline center glowing path
ctx.strokeStyle = 'rgba(255, 255, 255, 0.05)';
ctx.lineWidth = 1;
ctx.beginPath();
ctx.moveTo(0, height / 2);
ctx.lineTo(width, height / 2);
ctx.stroke();
}
function clearCanvas(canvas) {
const ctx = canvas.getContext('2d');
ctx.fillStyle = '#0b0c13';
ctx.fillRect(0, 0, canvas.width, canvas.height);
}
// Apply initial UI layout on startup
applyAudioRoutingUI();
+18
View File
@@ -0,0 +1,18 @@
import { defineConfig, globalIgnores } from "eslint/config";
import nextVitals from "eslint-config-next/core-web-vitals";
import nextTs from "eslint-config-next/typescript";
const eslintConfig = defineConfig([
...nextVitals,
...nextTs,
// Override default ignores of eslint-config-next.
globalIgnores([
// Default ignores of eslint-config-next:
".next/**",
"out/**",
"build/**",
"next-env.d.ts",
]),
]);
export default eslintConfig;
-243
View File
@@ -1,243 +0,0 @@
<!DOCTYPE html>
<html lang="id">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<meta name="description" content="Omni Real-time Voice Changer - Pengubah suara real-time berbasis AI berlatensi sangat rendah dengan ONNX Runtime.">
<title>🎙️ Omni Real-Time Voice Changer - High-Performance AI Audio</title>
<!-- Modern Typography: Inter & Outfit from Google Fonts -->
<link rel="preconnect" href="https://fonts.googleapis.com">
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
<link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&family=Outfit:wght@400;600;800&display=swap" rel="stylesheet">
<!-- Link to premium Vanilla CSS -->
<link rel="stylesheet" href="styles.css">
</head>
<body>
<div class="glow-backdrop"></div>
<div class="dashboard-container">
<!-- HEADER -->
<header class="app-header">
<div class="logo-area">
<span class="pulse-indicator active"></span>
<h1>🎙️ OMNI VOICE CHANGER</h1>
</div>
<p class="tagline">Pengubah Suara Real-Time AI Berlatensi Ultra Rendah menggunakan Akselerasi ONNX Runtime</p>
</header>
<!-- CONNECTION BAR -->
<div class="connection-bar card glassmorphism">
<div class="form-row">
<div class="input-group">
<label for="ws_url">URL Server WebSocket</label>
<input type="text" id="ws_url" value="ws://127.0.0.1:8765" placeholder="ws://localhost:8765">
</div>
<div class="connection-status-container">
<span id="connection_status" class="status-badge disconnected">Terputus</span>
</div>
<div class="btn-group-row">
<button id="connect_btn" class="btn btn-primary">Hubungkan Server</button>
<button id="stream_btn" class="btn btn-accent" disabled>Mulai Mengubah Suara</button>
<button id="play_toggle_btn" class="btn btn-primary" disabled>🔊 Mendengarkan: AKTIF</button>
</div>
</div>
</div>
<!-- MAIN DASHBOARD CONTENT -->
<main class="dashboard-grid">
<!-- MODEL CONFIGURATION -->
<section class="card glassmorphism col-span-1" aria-labelledby="model-config-title">
<h2 id="model-config-title" class="card-title">⚙️ Konfigurasi Model &amp; Perangkat</h2>
<!-- QUICK PRESETS PANEL -->
<div class="control-group">
<label>⚡ Quick Presets (Profil Performa)</label>
<div class="btn-group-row" style="width: 100%; display: grid; grid-template-columns: repeat(2, 1fr); gap: 0.5rem; height: auto; margin-bottom: 0.75rem;">
<button id="preset_latency_btn" class="btn btn-primary" style="font-size: 0.8rem; padding: 0.65rem 0.5rem;">⚡ Respon Kilat (PM)</button>
<button id="preset_quality_btn" class="btn btn-accent" style="font-size: 0.8rem; padding: 0.65rem 0.5rem;">🎙️ Kualitas Tinggi (RMVPE)</button>
</div>
</div>
<div class="control-group">
<label for="model_select">Pilih Model Suara (RVC ONNX)</label>
<select id="model_select" class="custom-select">
<option value="HuTao">HuTao (Genshin Impact)</option>
<option value="HuoHuo">HuoHuo (Honkai Star Rail)</option>
</select>
</div>
<div class="control-group">
<label for="device_select">Execution Provider (Akselerasi GPU)</label>
<select id="device_select" class="custom-select">
<option value="cpu">CPU (Sangat Stabil)</option>
<option value="cuda" selected>CUDA (NVIDIA GPU - Super Cepat)</option>
<option value="dml">DirectML (AMD/Intel GPU Windows)</option>
</select>
</div>
<!-- DUAL AUDIO ROUTING MODE (SERVER VS CLIENT) -->
<div class="control-group" style="border-top: 1px solid rgba(255, 255, 255, 0.05); padding-top: 0.75rem; margin-top: 0.75rem;">
<label for="routing_mode_select">Mode Audio (Routing Mode)</label>
<select id="routing_mode_select" class="custom-select">
<option value="browser" selected>Client Mode (Browser Streaming - Portabel)</option>
<option value="hardware">Server Mode (Hardware Direct - Latensi Nol)</option>
</select>
</div>
<div id="hardware_devices_panel" class="control-group" style="display: none; border: 1px solid rgba(99, 102, 241, 0.2); padding: 0.75rem; border-radius: 8px; background: rgba(11, 12, 19, 0.5); box-shadow: 0 0 10px rgba(99, 102, 241, 0.05);">
<div style="margin-bottom: 0.75rem;">
<label for="server_input_select" style="font-size: 0.75rem; margin-bottom: 0.25rem; color: var(--primary); text-transform: uppercase; font-weight: 600;">🎙️ Input Mikrofon Server</label>
<select id="server_input_select" class="custom-select" style="font-size: 0.8rem; padding: 0.4rem;"></select>
</div>
<div>
<label for="server_output_select" style="font-size: 0.75rem; margin-bottom: 0.25rem; color: var(--accent); text-transform: uppercase; font-weight: 600;">🔊 Output Speaker/Kabel Server</label>
<select id="server_output_select" class="custom-select" style="font-size: 0.8rem; padding: 0.4rem;"></select>
</div>
</div>
<div class="control-group">
<label>Metode Deteksi Nada (Pitch Extraction)</label>
<div class="radio-group-modern">
<label class="radio-tile">
<input type="radio" name="f0_method" value="pm" checked>
<span class="tile-label">PM (Tercepat)</span>
</label>
<label class="radio-tile">
<input type="radio" name="f0_method" value="dio">
<span class="tile-label">DIO (Ringan)</span>
</label>
<label class="radio-tile">
<input type="radio" name="f0_method" value="harvest">
<span class="tile-label">Harvest (Stabil)</span>
</label>
<label class="radio-tile">
<input type="radio" name="f0_method" value="rmvpe">
<span class="tile-label">RMVPE (Fidelitas Tinggi)</span>
</label>
</div>
</div>
<div class="control-group">
<div class="slider-header">
<label for="transpose_slider">Transpose (Pengubah Nada)</label>
<span id="transpose_val" class="slider-value">0 semitone</span>
</div>
<input type="range" id="transpose_slider" min="-24" max="24" value="0" step="1" class="custom-slider">
<div class="slider-ticks">
<span>-24 (Pria Berat)</span>
<span>0 (Asli)</span>
<span>+24 (Wanita/Anime)</span>
</div>
</div>
</section>
<!-- AUDIO DSP & PROCESSING -->
<section class="card glassmorphism col-span-1" aria-labelledby="dsp-title">
<h2 id="dsp-title" class="card-title">🎛️ Pemrosesan Audio (DSP)</h2>
<div class="control-group">
<div class="slider-header">
<label for="gate_slider">Noise Gate (Threshold)</label>
<span id="gate_val" class="slider-value">-40 dB</span>
</div>
<input type="range" id="gate_slider" min="-60" max="-10" value="-40" step="1" class="custom-slider">
<div class="slider-ticks">
<span>-60 dB (Sensitif)</span>
<span>-40 dB (Default)</span>
<span>-10 dB (Ketat)</span>
</div>
</div>
<div class="control-group">
<div class="slider-header">
<label for="input_gain_slider">Input Gain (Penguat Mic)</label>
<span id="input_gain_val" class="slider-value">1.0x</span>
</div>
<input type="range" id="input_gain_slider" min="0" max="3" value="1" step="0.1" class="custom-slider">
</div>
<div class="control-group">
<div class="slider-header">
<label for="output_gain_slider">Output Gain (Volume Suara)</label>
<span id="output_gain_val" class="slider-value">1.0x</span>
</div>
<input type="range" id="output_gain_slider" min="0" max="3" value="1" step="0.1" class="custom-slider">
</div>
<div id="browser_noise_cancel_group" class="control-group">
<label class="checkbox-container" style="display: flex; align-items: center; gap: 0.5rem; cursor: pointer; user-select: none;">
<input type="checkbox" id="noise_cancel_checkbox" checked style="width: 18px; height: 18px; cursor: pointer; accent-color: var(--primary);">
<span class="checkbox-label" style="font-size: 0.85rem; font-weight: 500; color: var(--text-muted); text-transform: uppercase;">🚫 Peredam Bising (Noise Cancel)</span>
</label>
</div>
<div class="control-group">
<label for="chunk_select">Ukuran Buffer (Chunk Size - Latensi vs Stabilitas)</label>
<select id="chunk_select" class="custom-select">
<option value="8192" selected>8192 sampel (~170ms - Rekomendasi Minim Distorsi)</option>
<option value="12288">12288 sampel (~250ms - Sangat Halus &amp; Merdu)</option>
<option value="16384">16384 sampel (~340ms - Kualitas Studio Sangat Stabil)</option>
<option value="24576">24576 sampel (~510ms - Super Halus &amp; Kokoh)</option>
<option value="32768">32768 sampel (~680ms - Fidelitas Maksimal)</option>
<option value="49152">49152 sampel (~1.0 detik - Ultra Smooth Cinema)</option>
<option value="65536">65536 sampel (~1.3 detik - Kestabilan Maksimal)</option>
<option value="98304">98304 sampel (~2.0 detik - Mode Penyiaran/Broadcasting)</option>
</select>
</div>
</section>
<!-- OSCILLOSCOPES / WAVEFORM VISUALIZERS -->
<section class="card glassmorphism col-span-2" aria-labelledby="visualizer-title">
<h2 id="visualizer-title" class="card-title">📊 Live Audio Waveform &amp; Visualizer</h2>
<div class="visualizer-row">
<div class="visualizer-container">
<div class="vis-label">
<span class="dot input-dot"></span>
<span>Sinyal Mikrofon (Input)</span>
</div>
<canvas id="input_canvas" class="waveform-canvas"></canvas>
</div>
<div class="visualizer-container">
<div class="vis-label">
<span class="dot output-dot"></span>
<span>Hasil AI Voice (Output)</span>
</div>
<canvas id="output_canvas" class="waveform-canvas"></canvas>
</div>
</div>
</section>
</main>
<!-- PERFORMANCE HUD FOOTER -->
<footer class="performance-hud card glassmorphism">
<div class="hud-item">
<span class="hud-label">Latensi Bulat (RTT)</span>
<span id="hud_latency" class="hud-value italic">-- ms</span>
</div>
<div class="hud-separator"></div>
<div class="hud-item">
<span class="hud-label">Rasio Pemrosesan</span>
<span id="hud_time" class="hud-value text-accent">-- ms</span>
</div>
<div class="hud-separator"></div>
<div class="hud-item">
<span class="hud-label">Sinyal Suara</span>
<span id="hud_gate_status" class="hud-value active-badge">Berdiam</span>
</div>
<div class="hud-separator"></div>
<div class="hud-item">
<span class="hud-label">Frekuensi Audio</span>
<span id="hud_sr" class="hud-value">44100 Hz</span>
</div>
</footer>
</div>
<!-- Link to premium Javascript logic -->
<script src="app.js"></script>
</body>
</html>
+10
View File
@@ -0,0 +1,10 @@
import type { NextConfig } from "next";
const nextConfig: NextConfig = {
output: 'export',
images: {
unoptimized: true,
},
};
export default nextConfig;
+6829
View File
File diff suppressed because it is too large Load Diff
+30
View File
@@ -0,0 +1,30 @@
{
"name": "frontend",
"version": "0.1.0",
"private": true,
"scripts": {
"dev": "next dev",
"build": "next build",
"start": "next start",
"lint": "eslint"
},
"dependencies": {
"clsx": "^2.1.1",
"framer-motion": "^12.40.0",
"lucide-react": "^1.17.0",
"next": "16.2.6",
"react": "19.2.4",
"react-dom": "19.2.4",
"tailwind-merge": "^3.6.0"
},
"devDependencies": {
"@tailwindcss/postcss": "^4",
"@types/node": "^20",
"@types/react": "^19",
"@types/react-dom": "^19",
"eslint": "^9",
"eslint-config-next": "16.2.6",
"tailwindcss": "^4",
"typescript": "^5"
}
}
+7
View File
@@ -0,0 +1,7 @@
const config = {
plugins: {
"@tailwindcss/postcss": {},
},
};
export default config;
+1
View File
@@ -0,0 +1 @@
<svg fill="none" viewBox="0 0 16 16" xmlns="http://www.w3.org/2000/svg"><path d="M14.5 13.5V5.41a1 1 0 0 0-.3-.7L9.8.29A1 1 0 0 0 9.08 0H1.5v13.5A2.5 2.5 0 0 0 4 16h8a2.5 2.5 0 0 0 2.5-2.5m-1.5 0v-7H8v-5H3v12a1 1 0 0 0 1 1h8a1 1 0 0 0 1-1M9.5 5V2.12L12.38 5zM5.13 5h-.62v1.25h2.12V5zm-.62 3h7.12v1.25H4.5zm.62 3h-.62v1.25h7.12V11z" clip-rule="evenodd" fill="#666" fill-rule="evenodd"/></svg>

After

Width:  |  Height:  |  Size: 391 B

+1
View File
@@ -0,0 +1 @@
<svg fill="none" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16"><g clip-path="url(#a)"><path fill-rule="evenodd" clip-rule="evenodd" d="M10.27 14.1a6.5 6.5 0 0 0 3.67-3.45q-1.24.21-2.7.34-.31 1.83-.97 3.1M8 16A8 8 0 1 0 8 0a8 8 0 0 0 0 16m.48-1.52a7 7 0 0 1-.96 0H7.5a4 4 0 0 1-.84-1.32q-.38-.89-.63-2.08a40 40 0 0 0 3.92 0q-.25 1.2-.63 2.08a4 4 0 0 1-.84 1.31zm2.94-4.76q1.66-.15 2.95-.43a7 7 0 0 0 0-2.58q-1.3-.27-2.95-.43a18 18 0 0 1 0 3.44m-1.27-3.54a17 17 0 0 1 0 3.64 39 39 0 0 1-4.3 0 17 17 0 0 1 0-3.64 39 39 0 0 1 4.3 0m1.1-1.17q1.45.13 2.69.34a6.5 6.5 0 0 0-3.67-3.44q.65 1.26.98 3.1M8.48 1.5l.01.02q.41.37.84 1.31.38.89.63 2.08a40 40 0 0 0-3.92 0q.25-1.2.63-2.08a4 4 0 0 1 .85-1.32 7 7 0 0 1 .96 0m-2.75.4a6.5 6.5 0 0 0-3.67 3.44 29 29 0 0 1 2.7-.34q.31-1.83.97-3.1M4.58 6.28q-1.66.16-2.95.43a7 7 0 0 0 0 2.58q1.3.27 2.95.43a18 18 0 0 1 0-3.44m.17 4.71q-1.45-.12-2.69-.34a6.5 6.5 0 0 0 3.67 3.44q-.65-1.27-.98-3.1" fill="#666"/></g><defs><clipPath id="a"><path fill="#fff" d="M0 0h16v16H0z"/></clipPath></defs></svg>

After

Width:  |  Height:  |  Size: 1.0 KiB

+1
View File
@@ -0,0 +1 @@
<svg xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 394 80"><path fill="#000" d="M262 0h68.5v12.7h-27.2v66.6h-13.6V12.7H262V0ZM149 0v12.7H94v20.4h44.3v12.6H94v21h55v12.6H80.5V0h68.7zm34.3 0h-17.8l63.8 79.4h17.9l-32-39.7 32-39.6h-17.9l-23 28.6-23-28.6zm18.3 56.7-9-11-27.1 33.7h17.8l18.3-22.7z"/><path fill="#000" d="M81 79.3 17 0H0v79.3h13.6V17l50.2 62.3H81Zm252.6-.4c-1 0-1.8-.4-2.5-1s-1.1-1.6-1.1-2.6.3-1.8 1-2.5 1.6-1 2.6-1 1.8.3 2.5 1a3.4 3.4 0 0 1 .6 4.3 3.7 3.7 0 0 1-3 1.8zm23.2-33.5h6v23.3c0 2.1-.4 4-1.3 5.5a9.1 9.1 0 0 1-3.8 3.5c-1.6.8-3.5 1.3-5.7 1.3-2 0-3.7-.4-5.3-1s-2.8-1.8-3.7-3.2c-.9-1.3-1.4-3-1.4-5h6c.1.8.3 1.6.7 2.2s1 1.2 1.6 1.5c.7.4 1.5.5 2.4.5 1 0 1.8-.2 2.4-.6a4 4 0 0 0 1.6-1.8c.3-.8.5-1.8.5-3V45.5zm30.9 9.1a4.4 4.4 0 0 0-2-3.3 7.5 7.5 0 0 0-4.3-1.1c-1.3 0-2.4.2-3.3.5-.9.4-1.6 1-2 1.6a3.5 3.5 0 0 0-.3 4c.3.5.7.9 1.3 1.2l1.8 1 2 .5 3.2.8c1.3.3 2.5.7 3.7 1.2a13 13 0 0 1 3.2 1.8 8.1 8.1 0 0 1 3 6.5c0 2-.5 3.7-1.5 5.1a10 10 0 0 1-4.4 3.5c-1.8.8-4.1 1.2-6.8 1.2-2.6 0-4.9-.4-6.8-1.2-2-.8-3.4-2-4.5-3.5a10 10 0 0 1-1.7-5.6h6a5 5 0 0 0 3.5 4.6c1 .4 2.2.6 3.4.6 1.3 0 2.5-.2 3.5-.6 1-.4 1.8-1 2.4-1.7a4 4 0 0 0 .8-2.4c0-.9-.2-1.6-.7-2.2a11 11 0 0 0-2.1-1.4l-3.2-1-3.8-1c-2.8-.7-5-1.7-6.6-3.2a7.2 7.2 0 0 1-2.4-5.7 8 8 0 0 1 1.7-5 10 10 0 0 1 4.3-3.5c2-.8 4-1.2 6.4-1.2 2.3 0 4.4.4 6.2 1.2 1.8.8 3.2 2 4.3 3.4 1 1.4 1.5 3 1.5 5h-5.8z"/></svg>

After

Width:  |  Height:  |  Size: 1.3 KiB

+1
View File
@@ -0,0 +1 @@
<svg fill="none" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1155 1000"><path d="m577.3 0 577.4 1000H0z" fill="#fff"/></svg>

After

Width:  |  Height:  |  Size: 128 B

+1
View File
@@ -0,0 +1 @@
<svg fill="none" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16"><path fill-rule="evenodd" clip-rule="evenodd" d="M1.5 2.5h13v10a1 1 0 0 1-1 1h-11a1 1 0 0 1-1-1zM0 1h16v11.5a2.5 2.5 0 0 1-2.5 2.5h-11A2.5 2.5 0 0 1 0 12.5zm3.75 4.5a.75.75 0 1 0 0-1.5.75.75 0 0 0 0 1.5M7 4.75a.75.75 0 1 1-1.5 0 .75.75 0 0 1 1.5 0m1.75.75a.75.75 0 1 0 0-1.5.75.75 0 0 0 0 1.5" fill="#666"/></svg>

After

Width:  |  Height:  |  Size: 385 B

Binary file not shown.

After

Width:  |  Height:  |  Size: 25 KiB

+82
View File
@@ -0,0 +1,82 @@
@import "tailwindcss";
@custom-variant dark (&:where(.dark, .dark *));
:root {
--track-bg: #e4e4e7;
}
.dark {
--track-bg: #27272a;
}
@theme {
--color-background: #fafafa;
--color-foreground: #18181b;
--color-primary: #84cc16; /* Lime-500 */
--color-hover: #65a30d; /* Lime-600 */
--color-soft-accent: #d9f99d; /* Lime-200 */
--color-success: #10b981; /* Emerald-500 */
--color-text-primary: #18181b;
--color-text-secondary: #52525b;
--radius-lg: 1rem; /* rounded-2xl */
--radius-md: 0.75rem; /* rounded-xl */
--radius-sm: 0.5rem; /* rounded-lg */
}
/* Custom styling presets */
body {
background-color: #fafafa;
color: #18181b;
font-family: 'Inter', system-ui, -apple-system, sans-serif;
overflow-x: hidden;
}
/* Glowing Aura Background */
.glow-backdrop {
position: fixed;
top: 0;
left: 0;
right: 0;
bottom: 0;
z-index: -10;
pointer-events: none;
background-image:
radial-gradient(circle at 10% 20%, rgba(132, 204, 22, 0.04) 0%, transparent 40%),
radial-gradient(circle at 90% 80%, rgba(16, 185, 129, 0.04) 0%, transparent 40%);
}
/* Glassmorphism panel overlay */
.glass-panel {
background: rgba(255, 255, 255, 0.85);
backdrop-filter: blur(12px);
-webkit-backdrop-filter: blur(12px);
border: 1px solid rgba(24, 24, 27, 0.05);
}
/* Custom pulse animations for recording signals */
@keyframes signal-pulse {
0%, 100% {
transform: scale(1);
box-shadow: 0 0 0 0 rgba(132, 204, 22, 0.4);
}
50% {
transform: scale(1.1);
box-shadow: 0 0 10px 4px rgba(132, 204, 22, 0.2);
}
}
.pulse-indicator {
display: inline-block;
width: 10px;
height: 10px;
border-radius: 9999px;
background-color: #84cc16;
}
.pulse-indicator.active {
animation: signal-pulse 2s infinite ease-in-out;
}
+33
View File
@@ -0,0 +1,33 @@
import type { Metadata } from "next";
import { Geist, Geist_Mono } from "next/font/google";
import "./globals.css";
const geistSans = Geist({
variable: "--font-geist-sans",
subsets: ["latin"],
});
const geistMono = Geist_Mono({
variable: "--font-geist-mono",
subsets: ["latin"],
});
export const metadata: Metadata = {
title: "🎙️ ONNX VC - Real-Time AI Voice Changer",
description: "ONNX VC - Pengubah suara real-time berbasis AI berlatensi ultra-rendah dengan ONNX Runtime.",
};
export default function RootLayout({
children,
}: Readonly<{
children: React.ReactNode;
}>) {
return (
<html
lang="en"
className={`${geistSans.variable} ${geistMono.variable} h-full antialiased`}
>
<body className="min-h-full flex flex-col">{children}</body>
</html>
);
}
File diff suppressed because it is too large Load Diff
+32
View File
@@ -0,0 +1,32 @@
import * as React from "react";
import { twMerge } from "tailwind-merge";
export interface BadgeProps extends React.HTMLAttributes<HTMLSpanElement> {
variant?: 'default' | 'secondary' | 'outline' | 'success' | 'warning' | 'danger' | 'info';
}
const Badge = React.forwardRef<HTMLSpanElement, BadgeProps>(
({ className, variant = "default", ...props }, ref) => {
return (
<span
ref={ref}
className={twMerge(
"inline-flex items-center rounded-full px-2.5 py-0.5 text-xs font-semibold select-none border tracking-wide uppercase",
variant === 'default' && "bg-[var(--accent-color)] text-white border-transparent",
variant === 'secondary' && "bg-[var(--accent-soft)] text-[var(--accent-text)] border-transparent",
variant === 'success' && "bg-[#10b981]/10 text-[#059669] border-[#10b981]/20",
variant === 'warning' && "bg-amber-500/10 text-amber-700 border-amber-500/20",
variant === 'danger' && "bg-red-500/10 text-red-700 border-red-500/20",
variant === 'info' && "bg-sky-500/10 text-sky-700 border-sky-500/20",
variant === 'outline' && "text-zinc-600 dark:text-zinc-400 border-zinc-200 dark:border-zinc-800 bg-white dark:bg-zinc-900",
className
)}
{...props}
/>
);
}
);
Badge.displayName = "Badge";
export { Badge };
export default Badge;
+40
View File
@@ -0,0 +1,40 @@
import * as React from "react";
import { clsx } from "clsx";
import { twMerge } from "tailwind-merge";
export interface ButtonProps extends React.ButtonHTMLAttributes<HTMLButtonElement> {
variant?: 'primary' | 'secondary' | 'outline' | 'ghost' | 'accent' | 'success' | 'danger';
size?: 'default' | 'sm' | 'lg' | 'icon';
}
const Button = React.forwardRef<HTMLButtonElement, ButtonProps>(
({ className, variant = "primary", size = "default", ...props }, ref) => {
return (
<button
ref={ref}
className={twMerge(
"inline-flex items-center justify-center font-medium rounded-xl transition-all duration-200 focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-lime-500 focus-visible:ring-offset-2 disabled:pointer-events-none disabled:opacity-50 active:scale-[0.98] cursor-pointer",
// Sizing
size === 'default' && "h-11 px-5 py-2.5 text-sm",
size === 'sm' && "h-9 px-3.5 text-xs rounded-lg",
size === 'lg' && "h-12 px-7 py-3 text-base rounded-2xl",
size === 'icon' && "h-10 w-10 rounded-xl",
// Variants
variant === 'primary' && "bg-[var(--accent-color)] hover:bg-[var(--accent-hover)] text-white shadow-sm shadow-lime-500/10 font-semibold",
variant === 'secondary' && "bg-[var(--accent-soft)] hover:bg-[var(--accent-soft)]/80 text-[var(--accent-text)] font-semibold",
variant === 'accent' && "bg-zinc-900 dark:bg-[var(--accent-color)] hover:bg-zinc-850 dark:hover:bg-[var(--accent-hover)] text-white dark:text-zinc-950 shadow-sm font-semibold",
variant === 'outline' && "border border-zinc-200 dark:border-zinc-800 bg-white dark:bg-zinc-900 hover:bg-zinc-50 dark:hover:bg-zinc-800 text-zinc-700 dark:text-zinc-300",
variant === 'ghost' && "hover:bg-zinc-50 dark:hover:bg-zinc-800 text-zinc-700 dark:text-zinc-300",
variant === 'success' && "bg-[#10b981] hover:bg-[#059669] text-white shadow-sm font-semibold",
variant === 'danger' && "bg-red-500 hover:bg-red-600 text-white shadow-sm font-semibold",
className
)}
{...props}
/>
);
}
);
Button.displayName = "Button";
export { Button };
export default Button;
+78
View File
@@ -0,0 +1,78 @@
import * as React from "react";
import { motion, AnimatePresence } from "framer-motion";
import { X } from "lucide-react";
import { twMerge } from "tailwind-merge";
export interface DialogProps {
isOpen: boolean;
onClose: () => void;
title: string;
children: React.ReactNode;
className?: string;
}
export const Dialog: React.FC<DialogProps> = ({
isOpen,
onClose,
title,
children,
className
}) => {
// Close dialog on escape key press
React.useEffect(() => {
if (!isOpen) return;
const handleKeyDown = (e: KeyboardEvent) => {
if (e.key === "Escape") onClose();
};
window.addEventListener("keydown", handleKeyDown);
return () => window.removeEventListener("keydown", handleKeyDown);
}, [isOpen, onClose]);
return (
<AnimatePresence>
{isOpen && (
<div className="fixed inset-0 z-50 flex items-center justify-center p-4">
{/* Backdrop Overlay */}
<motion.div
className="fixed inset-0 bg-zinc-950/40 backdrop-blur-sm"
initial={{ opacity: 0 }}
animate={{ opacity: 1 }}
exit={{ opacity: 0 }}
onClick={onClose}
/>
{/* Modal content */}
<motion.div
className={twMerge(
"relative bg-white dark:bg-zinc-900 w-full max-w-lg rounded-2xl p-6 shadow-xl border border-zinc-200/50 dark:border-zinc-800/80 z-10 overflow-hidden flex flex-col max-h-[85vh] text-zinc-800 dark:text-zinc-100",
className
)}
initial={{ opacity: 0, scale: 0.95, y: 10 }}
animate={{ opacity: 1, scale: 1, y: 0 }}
exit={{ opacity: 0, scale: 0.95, y: 10 }}
transition={{ duration: 0.2, ease: "easeOut" }}
>
{/* Header */}
<div className="flex justify-between items-center mb-4 pb-2 border-b border-zinc-100 dark:border-zinc-800">
<h2 className="text-lg font-bold text-zinc-900 dark:text-zinc-100 leading-none">
{title}
</h2>
<button
onClick={onClose}
className="p-1.5 rounded-lg text-zinc-400 dark:text-zinc-500 hover:bg-zinc-50 dark:hover:bg-zinc-800 hover:text-zinc-700 dark:hover:text-zinc-300 transition-colors focus:outline-none focus:ring-2 focus:ring-[var(--accent-color)] cursor-pointer"
>
<X className="w-4 h-4" />
</button>
</div>
{/* Body */}
<div className="overflow-y-auto pr-1 flex-1">
{children}
</div>
</motion.div>
</div>
)}
</AnimatePresence>
);
};
export default Dialog;
+50
View File
@@ -0,0 +1,50 @@
import * as React from "react";
import { twMerge } from "tailwind-merge";
export interface SelectProps extends React.SelectHTMLAttributes<HTMLSelectElement> {
options: { value: string | number; label: string }[];
}
const Select = React.forwardRef<HTMLSelectElement, SelectProps>(
({ className, options, ...props }, ref) => {
return (
<div className="relative w-full">
<select
ref={ref}
className={twMerge(
"w-full h-11 px-4 text-sm bg-white dark:bg-zinc-900 border border-zinc-200 dark:border-zinc-800 rounded-xl focus:outline-none focus:ring-2 focus:ring-[var(--accent-color)] focus:border-[var(--accent-color)] transition-all cursor-pointer appearance-none text-zinc-800 dark:text-zinc-100 pr-10",
className
)}
{...props}
>
{options.map((opt) => (
<option key={opt.value} value={opt.value}>
{opt.label}
</option>
))}
</select>
{/* Custom Arrow Icon */}
<div className="pointer-events-none absolute right-4 top-1/2 -translate-y-1/2 flex items-center justify-center text-zinc-400">
<svg
className="w-4 h-4"
fill="none"
stroke="currentColor"
viewBox="0 0 24 24"
>
<path
strokeLinecap="round"
strokeLinejoin="round"
strokeWidth="2"
d="M19 9l-7 7-7-7"
/>
</svg>
</div>
</div>
);
}
);
Select.displayName = "Select";
export { Select };
export default Select;
+46
View File
@@ -0,0 +1,46 @@
import * as React from "react";
import { clsx } from "clsx";
import { twMerge } from "tailwind-merge";
export interface SliderProps extends Omit<React.InputHTMLAttributes<HTMLInputElement>, 'type'> {
value: number;
min: number;
max: number;
step?: number;
onValueChange?: (val: number) => void;
}
const Slider = React.forwardRef<HTMLInputElement, SliderProps>(
({ className, value, min, max, step = 1, onValueChange, ...props }, ref) => {
const percentage = ((value - min) / (max - min)) * 100;
return (
<div className="relative w-full flex items-center select-none group">
<input
type="range"
ref={ref}
min={min}
max={max}
step={step}
value={value}
onChange={(e) => onValueChange?.(parseFloat(e.target.value))}
className={twMerge(
"w-full h-2 rounded-lg bg-zinc-200 dark:bg-zinc-800 appearance-none cursor-pointer outline-none focus:outline-none",
// custom thumb styles
"[&::-webkit-slider-thumb]:appearance-none [&::-webkit-slider-thumb]:w-5 [&::-webkit-slider-thumb]:h-5 [&::-webkit-slider-thumb]:rounded-full [&::-webkit-slider-thumb]:bg-white [&::-webkit-slider-thumb]:border-2 [&::-webkit-slider-thumb]:border-[var(--accent-color)] [&::-webkit-slider-thumb]:shadow-md [&::-webkit-slider-thumb]:transition-all [&::-webkit-slider-thumb]:active:scale-110",
"[&::-moz-range-thumb]:w-5 [&::-moz-range-thumb]:h-5 [&::-moz-range-thumb]:rounded-full [&::-moz-range-thumb]:bg-white [&::-moz-range-thumb]:border-2 [&::-moz-range-thumb]:border-[var(--accent-color)] [&::-moz-range-thumb]:shadow-md [&::-moz-range-thumb]:transition-all [&::-moz-range-thumb]:active:scale-110",
className
)}
style={{
background: `linear-gradient(to right, var(--accent-color) 0%, var(--accent-color) ${percentage}%, var(--track-bg, #e4e4e7) ${percentage}%, var(--track-bg, #e4e4e7) 100%)`
}}
{...props}
/>
</div>
);
}
);
Slider.displayName = "Slider";
export { Slider };
export default Slider;
+52
View File
@@ -0,0 +1,52 @@
import * as React from "react";
import { twMerge } from "tailwind-merge";
export interface SwitchProps extends Omit<React.InputHTMLAttributes<HTMLInputElement>, 'type'> {
checked: boolean;
onCheckedChange?: (checked: boolean) => void;
label?: string;
variant?: 'default' | 'dark';
}
const Switch = React.forwardRef<HTMLInputElement, SwitchProps>(
({ className, checked, onCheckedChange, label, variant = 'default', ...props }, ref) => {
return (
<label className="flex items-center gap-3 cursor-pointer select-none group">
<div className="relative">
<input
type="checkbox"
ref={ref}
checked={checked}
onChange={(e) => onCheckedChange?.(e.target.checked)}
className="sr-only"
{...props}
/>
<div
className={twMerge(
"w-10 h-6 bg-zinc-200 dark:bg-zinc-800 rounded-full transition-all duration-200 group-focus-within:ring-2 group-focus-within:ring-offset-2",
variant === 'dark' ? "group-focus-within:ring-zinc-500" : "group-focus-within:ring-[var(--accent-color)]",
checked && (variant === 'dark' ? "bg-zinc-800 dark:bg-zinc-700 border border-zinc-700/50 dark:border-zinc-650/80" : "bg-[var(--accent-color)]"),
className
)}
/>
<div
className={twMerge(
"absolute left-0.5 top-0.5 w-5 h-5 bg-white dark:bg-zinc-900 rounded-full transition-all duration-200 shadow-sm border border-zinc-200/50 dark:border-zinc-800/80",
checked && "translate-x-4",
checked && (variant === 'dark' ? "border-zinc-500" : "border-[var(--accent-color)]")
)}
/>
</div>
{label && (
<span className="text-sm font-medium text-zinc-700 dark:text-zinc-300 select-none">
{label}
</span>
)}
</label>
);
}
);
Switch.displayName = "Switch";
export { Switch };
export default Switch;
@@ -0,0 +1,126 @@
'use client';
import React, { useEffect } from 'react';
import { motion, AnimatePresence } from 'framer-motion';
import { useWaveformCanvas } from '../../../hooks/useWaveformCanvas';
import { usePictureInPicture } from '../../../hooks/usePictureInPicture';
import { Button } from '../../../components/ui/button';
import { Maximize2, MonitorOff, Activity } from 'lucide-react';
import { translations, Language } from '../../../utils/translations';
interface WaveformPanelProps {
title: string;
buffer: React.MutableRefObject<Float32Array>;
strokeColor: string;
isTalking?: boolean;
lineWidth?: number;
traceFade?: number;
isDark?: boolean;
lang?: Language;
}
export const WaveformPanel: React.FC<WaveformPanelProps> = ({
title,
buffer,
strokeColor,
isTalking = false,
lineWidth = 2,
traceFade = 0.4,
isDark = false,
lang = 'en',
}) => {
const t = translations[lang];
// Background clear color: white for light mode, dark zinc-950 for dark mode
const canvasBgColor = isDark ? `rgba(9, 9, 11, ${traceFade})` : `rgba(255, 255, 255, ${traceFade})`;
const { canvasRef, updateData } = useWaveformCanvas({
strokeColor,
fillColor: canvasBgColor, // dynamic trail alpha blending
scaleAmplitude: 2.0,
lineWidth,
});
const { togglePip, isPipActive, isSupported } = usePictureInPicture();
// Draw buffer updates
useEffect(() => {
let active = true;
const loop = () => {
if (!active) return;
updateData(buffer.current);
requestAnimationFrame(loop);
};
loop();
return () => {
active = false;
};
}, [buffer, updateData]);
return (
<motion.div
className="bg-white dark:bg-zinc-900 border border-zinc-200/50 dark:border-zinc-800/80 shadow-sm rounded-2xl p-5 relative overflow-hidden flex flex-col h-full transition-colors"
initial={{ opacity: 0, y: 15 }}
animate={{ opacity: 1, y: 0 }}
transition={{ duration: 0.4 }}
>
<div className="flex justify-between items-center mb-3">
<div className="flex items-center gap-2">
{isTalking && (
<motion.span
className="w-2.5 h-2.5 rounded-full bg-[var(--accent-color)]"
animate={{ scale: [1, 1.4, 1], opacity: [1, 0.4, 1] }}
transition={{ repeat: Infinity, duration: 1.2 }}
/>
)}
<h3 className="font-bold text-zinc-800 dark:text-zinc-200 text-xs tracking-wider uppercase flex items-center gap-1.5">
<Activity className="w-4 h-4 text-[var(--accent-color)]" />
{title}
</h3>
</div>
{isSupported && (
<Button
variant="outline"
size="sm"
onClick={() => togglePip(canvasRef.current)}
className="text-[10px] h-8 px-2.5 flex items-center gap-1 border-zinc-200 dark:border-zinc-800 hover:bg-[var(--accent-soft)] dark:hover:bg-[var(--accent-soft)]/20 hover:text-[var(--accent-text)] dark:hover:text-[var(--accent-color)] text-zinc-700 dark:text-zinc-300 transition-colors"
>
{isPipActive ? (
<>
<MonitorOff className="w-3 h-3" />
{t.pipClose}
</>
) : (
<>
<Maximize2 className="w-3 h-3" />
{t.pipStream}
</>
)}
</Button>
)}
</div>
<div className="relative flex-1 min-h-[140px] bg-zinc-50 dark:bg-zinc-950 border border-zinc-100 dark:border-zinc-900 rounded-xl overflow-hidden shadow-inner transition-colors">
<canvas
ref={canvasRef}
className="w-full h-full block cursor-pointer"
/>
<AnimatePresence>
{isTalking && (
<motion.div
className="absolute top-2.5 right-2.5 bg-[var(--accent-color)]/90 text-white text-[9px] font-bold px-2 py-0.5 rounded-full shadow-sm uppercase tracking-wider backdrop-blur-sm"
initial={{ opacity: 0, scale: 0.8 }}
animate={{ opacity: 1, scale: 1 }}
exit={{ opacity: 0, scale: 0.8 }}
transition={{ duration: 0.2 }}
>
{t.activeSignal}
</motion.div>
)}
</AnimatePresence>
</div>
</motion.div>
);
};
export default WaveformPanel;
+320
View File
@@ -0,0 +1,320 @@
import { useEffect, useRef, useState, useCallback } from 'react';
import { AudioConfig, ConnectionStatus, HardwareDevice } from '../types/audio';
export const useAudioPipeline = (
wsUrl: string,
config: AudioConfig,
onConfigSync: (sr: number, list: HardwareDevice[]) => void
) => {
const [status, setStatus] = useState<ConnectionStatus>('disconnected');
const [rtt, setRtt] = useState<number | null>(null);
const [processingTime, setProcessingTime] = useState<number | null>(null);
const [isTalking, setIsTalking] = useState<boolean>(false);
const [isStreaming, setIsStreaming] = useState<boolean>(false);
const [playOutput, setPlayOutput] = useState<boolean>(true);
const socketRef = useRef<WebSocket | null>(null);
const audioCtxRef = useRef<AudioContext | null>(null);
const micStreamRef = useRef<MediaStream | null>(null);
const micSourceRef = useRef<MediaStreamAudioSourceNode | null>(null);
const processorRef = useRef<ScriptProcessorNode | null>(null);
const sampleRateRef = useRef<number>(40000);
// High-performance canvas rolling buffers
const inputDisplayBuf = useRef<Float32Array>(new Float32Array(4096));
const outputDisplayBuf = useRef<Float32Array>(new Float32Array(4096));
const micAccumulator = useRef<Float32Array>(new Float32Array(0));
// Playback scheduling & timing
const sentTimestamps = useRef<{ id: number; sent: number }[]>([]);
const nextPlaybackTime = useRef<number>(0);
const outputChunkQueue = useRef<{ data: Float32Array; startTime: number }[]>([]);
// Function to stringify and sync configs
const sendConfig = useCallback(() => {
const socket = socketRef.current;
if (!socket || socket.readyState !== WebSocket.OPEN) return;
socket.send(JSON.stringify({
type: 'config',
model_name: config.model_name,
device: config.device,
f0_method: config.f0_method,
f0_up_key: config.f0_up_key,
noise_gate: config.noise_gate,
input_gain: config.input_gain,
output_gain: config.output_gain,
input_sr: audioCtxRef.current ? audioCtxRef.current.sampleRate : 44100,
routing_mode: config.routing_mode,
input_device: config.input_device,
output_device: config.output_device,
chunk_size: config.chunk_size
}));
}, [config]);
// Decodes array buffers from Python server
const handleServerAudio = useCallback((arrayBuffer: ArrayBuffer) => {
if (!audioCtxRef.current) return;
const now = performance.now();
if (sentTimestamps.current.length > 0) {
const oldest = sentTimestamps.current.shift();
if (oldest) {
setRtt(Math.round(now - oldest.sent));
}
}
const payload = new Float32Array(arrayBuffer);
const procTime = payload[0];
const pcmData = payload.subarray(1);
setProcessingTime(Math.max(0, Math.round(procTime)));
const ctx = audioCtxRef.current;
const audioBuf = ctx.createBuffer(1, pcmData.length, sampleRateRef.current);
audioBuf.getChannelData(0).set(pcmData);
const source = ctx.createBufferSource();
source.buffer = audioBuf;
// Only route node to speaker output if user didn't mute local listening
if (playOutput) {
source.connect(ctx.destination);
}
// Precise schedule timelines
const currentTime = ctx.currentTime;
const duration = audioBuf.duration;
const adaptiveBuf = Math.min(duration * 2.5, 0.50);
if (nextPlaybackTime.current < currentTime) {
nextPlaybackTime.current = currentTime + adaptiveBuf;
} else if (nextPlaybackTime.current > currentTime + duration * 5.0) {
nextPlaybackTime.current = currentTime + adaptiveBuf; // Latency Buster
}
const startSchedule = nextPlaybackTime.current;
source.start(startSchedule);
nextPlaybackTime.current += duration;
// Queue for syncing waveform outputs
outputChunkQueue.current.push({ data: pcmData, startTime: startSchedule });
while (outputChunkQueue.current.length > 0) {
const c = outputChunkQueue.current[0];
if (c.startTime + c.data.length / sampleRateRef.current < ctx.currentTime - 2.0) {
outputChunkQueue.current.shift();
} else break;
}
// Push output PCM samples to rolling display buffers
const size = 4096;
const display = outputDisplayBuf.current;
if (pcmData.length >= size) {
display.set(pcmData.slice(pcmData.length - size));
} else {
display.copyWithin(0, pcmData.length);
display.set(pcmData, size - pcmData.length);
}
}, [playOutput]);
const disconnect = useCallback(() => {
if (socketRef.current) {
try {
socketRef.current.close();
} catch (e) {}
socketRef.current = null;
}
setStatus('disconnected');
}, []);
const connect = useCallback(() => {
disconnect();
setStatus('connecting');
try {
const ws = new WebSocket(wsUrl);
ws.binaryType = 'arraybuffer';
ws.onopen = () => {
setStatus('connected');
socketRef.current = ws;
sendConfig();
};
ws.onclose = () => {
setStatus('disconnected');
socketRef.current = null;
};
ws.onerror = () => {
setStatus('disconnected');
socketRef.current = null;
};
ws.onmessage = (event) => {
if (typeof event.data === 'string') {
try {
const data = JSON.parse(event.data);
if (data.type === 'config_success') {
sampleRateRef.current = data.target_sr;
} else if (data.type === 'init_devices') {
onConfigSync(data.target_sr || 40000, data.devices || []);
} else if (data.type === 'visualizer') {
// Hardware mode visualizer data stream
inputDisplayBuf.current.set(new Float32Array(data.input));
outputDisplayBuf.current.set(new Float32Array(data.output));
}
} catch (e) {
console.error('WS JSON parse error:', e);
}
} else if (event.data instanceof ArrayBuffer) {
handleServerAudio(event.data);
}
};
} catch (e) {
console.error('WS Connection failed:', e);
setStatus('disconnected');
}
}, [wsUrl, sendConfig, handleServerAudio, onConfigSync, disconnect]);
const stopStream = useCallback(() => {
setIsStreaming(false);
setIsTalking(false);
if (config.routing_mode === 'hardware') {
const socket = socketRef.current;
if (socket && socket.readyState === WebSocket.OPEN) {
socket.send(JSON.stringify({
type: 'config',
routing_mode: 'browser' // tells server hardware stream to stop
}));
}
}
if (micStreamRef.current) {
micStreamRef.current.getTracks().forEach(t => t.stop());
micStreamRef.current = null;
}
if (micSourceRef.current) {
micSourceRef.current.disconnect();
micSourceRef.current = null;
}
if (processorRef.current) {
processorRef.current.disconnect();
processorRef.current = null;
}
micAccumulator.current = new Float32Array(0);
setRtt(null);
setProcessingTime(null);
}, [config.routing_mode]);
const startStream = useCallback(async () => {
if (config.routing_mode === 'hardware') {
setIsStreaming(true);
sendConfig();
return;
}
if (!audioCtxRef.current) {
audioCtxRef.current = new (window.AudioContext || (window as any).webkitAudioContext)({
latencyHint: 'interactive',
});
}
const ctx = audioCtxRef.current;
if (ctx.state === 'suspended') {
await ctx.resume();
}
try {
micStreamRef.current = await navigator.mediaDevices.getUserMedia({
audio: {
echoCancellation: true,
noiseSuppression: true,
autoGainControl: true,
},
});
micSourceRef.current = ctx.createMediaStreamSource(micStreamRef.current);
processorRef.current = ctx.createScriptProcessor(4096, 1, 1);
processorRef.current.onaudioprocess = (e) => {
const inputData = e.inputBuffer.getChannelData(0);
// Update input waveform display buffer
const display = inputDisplayBuf.current;
display.copyWithin(0, inputData.length);
display.set(inputData, display.length - inputData.length);
// Append to local accumulator
const nextAcc = new Float32Array(micAccumulator.current.length + inputData.length);
nextAcc.set(micAccumulator.current);
nextAcc.set(inputData, micAccumulator.current.length);
micAccumulator.current = nextAcc;
const size = config.chunk_size;
while (micAccumulator.current.length >= size) {
const chunk = micAccumulator.current.slice(0, size);
micAccumulator.current = micAccumulator.current.slice(size);
// Simple RMS for Voice Activity Badge
let sum = 0;
for (let i = 0; i < chunk.length; i++) sum += chunk[i] * chunk[i];
const rms = Math.sqrt(sum / chunk.length);
setIsTalking(rms > 0.005);
// Stream raw float PCM bytes
const ws = socketRef.current;
if (ws && ws.readyState === WebSocket.OPEN) {
const time = performance.now();
sentTimestamps.current.push({ id: time, sent: time });
ws.send(chunk.buffer);
}
}
};
micSourceRef.current.connect(processorRef.current);
processorRef.current.connect(ctx.destination);
nextPlaybackTime.current = 0;
setIsStreaming(true);
} catch (e) {
console.error('Failed to start microphone streaming:', e);
alert('Microphone access failed: ' + (e instanceof Error ? e.message : String(e)));
stopStream();
}
}, [config.routing_mode, config.chunk_size, sendConfig, stopStream]);
// Sync config whenever React props config changes
useEffect(() => {
sendConfig();
}, [config, sendConfig]);
// Lifecycle cleanups
useEffect(() => {
return () => {
disconnect();
stopStream();
if (audioCtxRef.current) {
audioCtxRef.current.close().catch(() => {});
}
};
}, [disconnect, stopStream]);
return {
status,
rtt,
processingTime,
isTalking,
isStreaming,
playOutput,
setPlayOutput,
connect,
disconnect,
startStream,
stopStream,
inputBuffer: inputDisplayBuf,
outputBuffer: outputDisplayBuf
};
};
export default useAudioPipeline;
@@ -0,0 +1,64 @@
import { useEffect, useRef, useCallback } from 'react';
export interface ShortcutBinding {
keys: string; // e.g. "Control+k", " ", "m", "alt+1"
description: string;
action: () => void;
}
export const useKeyboardShortcuts = (bindings: ShortcutBinding[], enabled: boolean = true) => {
const bindingsRef = useRef<ShortcutBinding[]>(bindings);
useEffect(() => {
bindingsRef.current = bindings;
}, [bindings]);
const handleKeyDown = useCallback((e: KeyboardEvent) => {
if (!enabled) return;
// Avoid hijacking keystrokes when editing inputs
const active = document.activeElement;
if (active) {
const name = active.tagName.toLowerCase();
if (name === 'input' || name === 'textarea' || active.getAttribute('contenteditable') === 'true') {
return;
}
}
const pressedKey = e.key.toLowerCase();
const isCtrl = e.ctrlKey || e.metaKey;
const isAlt = e.altKey;
const isShift = e.shiftKey;
for (const binding of bindingsRef.current) {
const keys = binding.keys.toLowerCase().split('+');
const requiresCtrl = keys.includes('control') || keys.includes('ctrl');
const requiresAlt = keys.includes('alt');
const requiresShift = keys.includes('shift');
const baseKey = keys.filter(k => !['control', 'ctrl', 'alt', 'shift'].includes(k))[0];
const matchesCtrl = requiresCtrl ? isCtrl : !isCtrl;
const matchesAlt = requiresAlt ? isAlt : !isAlt;
const matchesShift = requiresShift ? isShift : !isShift;
const normalizedBaseKey = baseKey === 'space' ? ' ' : baseKey;
const matchesBase = pressedKey === normalizedBaseKey;
if (matchesCtrl && matchesAlt && matchesShift && matchesBase) {
e.preventDefault();
binding.action();
break;
}
}
}, [enabled]);
useEffect(() => {
window.addEventListener('keydown', handleKeyDown);
return () => {
window.removeEventListener('keydown', handleKeyDown);
};
}, [handleKeyDown]);
};
export default useKeyboardShortcuts;
+71
View File
@@ -0,0 +1,71 @@
import { useState, useCallback, useRef, useEffect } from 'react';
export const usePictureInPicture = () => {
const [isPipActive, setIsPipActive] = useState(false);
const videoRef = useRef<HTMLVideoElement | null>(null);
useEffect(() => {
if (typeof window === 'undefined') return;
const video = document.createElement('video');
video.muted = true;
video.playsInline = true;
videoRef.current = video;
const handleLeavePip = () => {
setIsPipActive(false);
};
video.addEventListener('leavepictureinpicture', handleLeavePip);
return () => {
video.removeEventListener('leavepictureinpicture', handleLeavePip);
if (document.pictureInPictureElement === video) {
document.exitPictureInPicture().catch(() => {});
}
};
}, []);
const togglePip = useCallback(async (canvas: HTMLCanvasElement | null) => {
if (!canvas || !videoRef.current) return;
const video = videoRef.current;
try {
if (document.pictureInPictureElement === video) {
await document.exitPictureInPicture();
setIsPipActive(false);
} else {
// Capture a stream of the Canvas at 30 fps
const stream = canvas.captureStream
? canvas.captureStream(30)
: (canvas as any).mozCaptureStream
? (canvas as any).mozCaptureStream(30)
: null;
if (!stream) {
throw new Error("Canvas.captureStream() is not supported on this browser.");
}
video.srcObject = stream;
await new Promise<void>((resolve) => {
video.onloadedmetadata = () => {
video.play().then(() => resolve());
};
});
await video.requestPictureInPicture();
setIsPipActive(true);
}
} catch (error) {
console.error("Picture-in-Picture failed:", error);
alert("Picture-in-Picture error: " + (error instanceof Error ? error.message : String(error)));
}
}, []);
const isSupported = typeof window !== 'undefined' && 'pictureInPictureEnabled' in document;
return { togglePip, isPipActive, isSupported };
};
export default usePictureInPicture;
+101
View File
@@ -0,0 +1,101 @@
import { useEffect, useRef, useCallback } from 'react';
interface UseWaveformCanvasOptions {
strokeColor: string;
fillColor?: string;
scaleAmplitude?: number;
lineWidth?: number;
}
export const useWaveformCanvas = (options: UseWaveformCanvasOptions) => {
const canvasRef = useRef<HTMLCanvasElement | null>(null);
const animationFrameRef = useRef<number | null>(null);
const bufferRef = useRef<Float32Array | null>(null);
const updateData = useCallback((data: Float32Array) => {
bufferRef.current = data;
}, []);
useEffect(() => {
const canvas = canvasRef.current;
if (!canvas) return;
const ctx = canvas.getContext('2d');
if (!ctx) return;
const handleResize = () => {
const rect = canvas.getBoundingClientRect();
canvas.width = rect.width * window.devicePixelRatio;
canvas.height = rect.height * window.devicePixelRatio;
const baseColor = (options.fillColor || 'rgba(10, 10, 10, 0.4)').replace(/[\d.]+\)$/, '1)');
ctx.fillStyle = baseColor;
ctx.fillRect(0, 0, canvas.width, canvas.height);
};
handleResize();
window.addEventListener('resize', handleResize);
// Initial canvas clear with solid color
const baseColor = (options.fillColor || 'rgba(10, 10, 10, 0.4)').replace(/[\d.]+\)$/, '1)');
ctx.fillStyle = baseColor;
ctx.fillRect(0, 0, canvas.width, canvas.height);
const draw = () => {
const width = canvas.width;
const height = canvas.height;
const dataArray = bufferRef.current;
// Dark transparent fill for trace/fade visual trails
ctx.fillStyle = options.fillColor || 'rgba(10, 10, 10, 0.4)';
ctx.fillRect(0, 0, width, height);
if (dataArray && dataArray.length > 0) {
ctx.lineWidth = (options.lineWidth ?? 2) * window.devicePixelRatio;
ctx.strokeStyle = options.strokeColor;
ctx.lineJoin = 'round';
ctx.beginPath();
const sliceWidth = width / dataArray.length;
let x = 0;
for (let i = 0; i < dataArray.length; i++) {
const v = dataArray[i] * (options.scaleAmplitude ?? 1.5);
const y = (v * (height / 2)) + (height / 2);
if (i === 0) {
ctx.moveTo(x, y);
} else {
ctx.lineTo(x, y);
}
x += sliceWidth;
}
ctx.lineTo(width, height / 2);
ctx.stroke();
}
// Draw subtle zero amplitude baseline
ctx.strokeStyle = options.fillColor?.includes('255') ? 'rgba(0, 0, 0, 0.06)' : 'rgba(255, 255, 255, 0.06)';
ctx.lineWidth = 1;
ctx.beginPath();
ctx.moveTo(0, height / 2);
ctx.lineTo(width, height / 2);
ctx.stroke();
animationFrameRef.current = requestAnimationFrame(draw);
};
animationFrameRef.current = requestAnimationFrame(draw);
return () => {
window.removeEventListener('resize', handleResize);
if (animationFrameRef.current) {
cancelAnimationFrame(animationFrameRef.current);
}
};
}, [options.strokeColor, options.fillColor, options.scaleAmplitude, options.lineWidth]);
return { canvasRef, updateData };
};
export default useWaveformCanvas;
+23
View File
@@ -0,0 +1,23 @@
export interface AudioConfig {
model_name: string;
device: 'cpu' | 'cuda' | 'dml';
f0_method: 'pm' | 'dio' | 'harvest' | 'rmvpe';
f0_up_key: number;
noise_gate: number;
input_gain: number;
output_gain: number;
input_sr: number;
routing_mode: 'browser' | 'hardware';
input_device: number | null;
output_device: number | null;
chunk_size: number;
}
export interface HardwareDevice {
id: number;
name: string;
max_input_channels: number;
max_output_channels: number;
}
export type ConnectionStatus = 'disconnected' | 'connecting' | 'connected';
+592
View File
@@ -0,0 +1,592 @@
export type Language = 'en' | 'ja' | 'zh' | 'es' | 'id';
export const languages: { code: Language; label: string; flag: string }[] = [
{ code: 'en', label: 'English', flag: '🇺🇸' },
{ code: 'id', label: 'Bahasa Indonesia', flag: '🇮🇩' },
{ code: 'ja', label: '日本語', flag: '🇯🇵' },
{ code: 'zh', label: '简体中文', flag: '🇨🇳' },
{ code: 'es', label: 'Español', flag: '🇪🇸' }
];
export const translations = {
en: {
appTitle: "🎙️ ONNX VC",
appSubtitle: "Low-latency real-time AI voice conversion powered by ONNX Runtime acceleration.",
wsServerUrl: "WebSocket Server URL",
wsPlaceholder: "ws://localhost:8765",
connectionStatus: "Connection Status",
disconnected: "Disconnected",
connecting: "Connecting",
connected: "Connected",
connect: "Connect Server",
disconnect: "Disconnect Server",
startChanger: "Start Voice Changer",
stopChanger: "Stop Voice Changer",
listeningActive: "Listening: ACTIVE",
listeningMute: "Listening: MUTED",
// Tabs
tabDashboard: "Workspace",
tabModel: "Model Settings",
tabDsp: "Audio DSP",
tabShortcuts: "Shortcuts",
// Model Config
modelConfigTitle: "Model & Device Configuration",
quickPresets: "Quick Presets (Performance Profile)",
latencyPreset: "⚡ Instant Response (PM)",
qualityPreset: "🎙️ High Fidelity (RMVPE)",
selectModel: "Select Character Model (RVC ONNX)",
executionProvider: "Execution Provider (GPU Acceleration)",
routingMode: "Audio Routing Mode",
clientMode: "Client Mode (Browser Streaming)",
serverMode: "Server Mode (Direct Sounddevice)",
serverInput: "Server Input Microphone",
serverOutput: "Server Output Speaker",
pitchMethod: "Pitch Extraction Method",
transpose: "Transpose (Pitch Modifier)",
transposeMale: "-24 (Male Pitch)",
transposeNormal: "0 (Original)",
transposeFemale: "+24 (Female/Anime Pitch)",
// DSP
dspTitle: "Audio Processing Settings (DSP)",
noiseGate: "Noise Gate (Threshold)",
noiseGateSens: "-60 dB (Sensitive)",
noiseGateDefault: "-40 dB (Default)",
noiseGateStrict: "-10 dB (Strict)",
inputGain: "Input Gain (Microphone)",
outputGain: "Output Gain (AI Volume)",
noiseCancel: "Noise Cancellation (Filter)",
noiseCancelDesc: "Filters browser echo & background hum",
bufferSize: "Buffer Size (Chunk Size - Latency vs Stability)",
// Visualizers
visualizerTitle: "Real-Time Audio Visualizer",
micSignal: "Microphone Input Signal",
aiSignal: "AI Voice Output Signal",
activeSignal: "Active Signal",
pipStream: "PiP Waveform",
pipClose: "Close PiP",
// HUD
hudLatency: "RTT Latency",
hudInference: "Inference Speed",
hudDetector: "Voice Detector",
hudTalking: "Speaking",
hudSilent: "Silent",
hudSr: "Model Frequency",
hudHelp: "Press ? to view hotkeys menu",
// Shortcuts Dialog
shortcutsTitle: "Keyboard Shortcuts Guide",
shortcutsDesc: "Use these keyboard shortcuts to navigate the dashboard without a mouse:",
shortcutsClose: "Close",
shortcutConnect: "Connect / Disconnect WebSocket Server",
shortcutStream: "Start / Stop AI Voice Changer",
shortcutMute: "Mute / Unmute Output Audio Local Listening",
shortcutPreset1: "Apply Preset: Instant Response (PM)",
shortcutPreset2: "Apply Preset: High Fidelity (RMVPE)",
shortcutHelp: "Open / Close Shortcuts Help Dialog",
// Premium layouts
characterCardTitle: "Active Voice Character",
characterAvatarDesc: "Currently loaded voice weight profile.",
welcomeBack: "Real-Time Audio Control Center",
currentLang: "Language",
themeSettings: "Interface Theme & Accent",
themeMode: "Theme Mode",
themeDark: "Dark Mode",
themeLight: "Light Mode",
accentColorLabel: "Global Accent Color",
tabCredits: "Credits",
creditsTitle: "💖 Open Source Credits",
creditsDescription: "ONNX VC is made possible thanks to the following incredible open-source projects and libraries:",
liveTuningTitle: "Live Settings Tuning",
customCanvasTitle: "Custom Canvas Visualizer",
showMicInput: "Show Mic Input",
showAiOutput: "Show AI Output",
lineWidthLabel: "Line Width",
traceDecayLabel: "Trace Decay (Fading)",
inputLineColorLabel: "Input Line Color",
outputLineColorLabel: "Output Line Color",
creditCreatorTitle: "Creator & Integrator",
creditNeuralTitle: "Neural Conversion",
creditEngineTitle: "Inference Engine",
creditPitchTitle: "Pitch Extraction",
creditPipelineTitle: "Streaming Pipeline",
creditFrameworkTitle: "Frontend Framework",
creditDesignTitle: "Design & Animation",
creditCreatorDesc: "Creators of the ONNX VC client interface and low-latency audio control workspace integration layer.",
creditNeuralDesc: "The core neural network architecture for real-time voice feature retrieval and vocal conversion.",
creditEngineDesc: "Cross-platform accelerator for machine learning models running on CPU, NVIDIA CUDA, and DirectML GPU backends.",
creditPitchDesc: "Robust Minimum Vocal Pitch Estimation model providing highly accurate vocals pitch tracking under ambient noise.",
creditPipelineDesc: "High-speed binary data transfer loops passing raw PCM float32 frames between the client browser and backend.",
creditFrameworkDesc: "Modern web framework compiling React client-side components to statically optimized static exports.",
creditDesignDesc: "Utility-first styling utility and fluid declarative animation libraries for interactive visual user interfaces."
},
id: {
appTitle: "🎙️ ONNX VC",
appSubtitle: "Pengubah suara real-time berbasis AI berlatensi ultra-rendah dengan akselerasi ONNX Runtime.",
wsServerUrl: "URL Server WebSocket",
wsPlaceholder: "ws://localhost:8765",
connectionStatus: "Status Koneksi",
disconnected: "Terputus",
connecting: "Menghubungkan",
connected: "Terhubung",
connect: "Hubungkan Server",
disconnect: "Putuskan Server",
startChanger: "Mulai Mengubah Suara",
stopChanger: "Hentikan Mengubah",
listeningActive: "Mendengarkan: AKTIF",
listeningMute: "Mendengarkan: SENYAP",
// Tabs
tabDashboard: "Ruang Kerja",
tabModel: "Setelan Model",
tabDsp: "Audio DSP",
tabShortcuts: "Shortcut",
// Model Config
modelConfigTitle: "Konfigurasi Model & Perangkat",
quickPresets: "Quick Presets (Profil Performa)",
latencyPreset: "⚡ Respon Kilat (PM)",
qualityPreset: "🎙️ Kualitas Tinggi (RMVPE)",
selectModel: "Pilih Model Suara (RVC ONNX)",
executionProvider: "Execution Provider (Akselerasi GPU)",
routingMode: "Mode Routing Audio",
clientMode: "Client Mode (Streaming Browser)",
serverMode: "Server Mode (Direct Sounddevice)",
serverInput: "Input Mikrofon Server",
serverOutput: "Output Speaker Server",
pitchMethod: "Metode Deteksi Nada (Pitch Extraction)",
transpose: "Transpose (Pengubah Nada)",
transposeMale: "-24 (Pria Berat)",
transposeNormal: "0 (Asli)",
transposeFemale: "+24 (Wanita/Anime)",
// DSP
dspTitle: "Pemrosesan Audio (DSP)",
noiseGate: "Noise Gate (Threshold)",
noiseGateSens: "-60 dB (Sensitif)",
noiseGateDefault: "-40 dB (Default)",
noiseGateStrict: "-10 dB (Ketat)",
inputGain: "Input Gain (Microphone)",
outputGain: "Output Gain (Volume AI)",
noiseCancel: "Peredam Bising (Noise Cancel)",
noiseCancelDesc: "Filter gema & desah di browser",
bufferSize: "Ukuran Buffer (Chunk Size - Latensi vs Stabilitas)",
// Visualizers
visualizerTitle: "Visualisasi Waveform Live",
micSignal: "Sinyal Mikrofon (Input)",
aiSignal: "Hasil AI Voice (Output)",
activeSignal: "Signal Aktif",
pipStream: "PiP Waveform",
pipClose: "Batal PiP",
// HUD
hudLatency: "Latensi Bulat (RTT)",
hudInference: "Kecepatan Inference",
hudDetector: "Detektor Suara",
hudTalking: "Bicara",
hudSilent: "Berdiam",
hudSr: "Frekuensi Model",
hudHelp: "Tekan ? untuk melihat menu hotkey",
// Shortcuts Dialog
shortcutsTitle: "Panduan Keyboard Shortcut",
shortcutsDesc: "Gunakan keyboard shortcuts berikut untuk navigasi dashboard tanpa mouse:",
shortcutsClose: "Tutup",
shortcutConnect: "Hubungkan / Putuskan Server WebSocket",
shortcutStream: "Mulai / Hentikan Pengubah Suara AI",
shortcutMute: "Bungkam / Dengarkan Audio Output Lokal",
shortcutPreset1: "Terapkan Profil: Respon Kilat (PM)",
shortcutPreset2: "Terapkan Profil: Kualitas Tinggi (RMVPE)",
shortcutHelp: "Buka / Tutup Dialog Panduan Shortcut",
// Premium layouts
characterCardTitle: "Karakter Suara Aktif",
characterAvatarDesc: "Profil bobot suara yang sedang dimuat saat ini.",
welcomeBack: "Pusat Kontrol Audio Real-Time",
currentLang: "Bahasa",
themeSettings: "Tema Antarmuka & Aksen",
themeMode: "Mode Tema",
themeDark: "Mode Gelap",
themeLight: "Mode Terang",
accentColorLabel: "Warna Aksen Global",
tabCredits: "Kredit Open Source",
creditsTitle: "💖 Kredit Lisensi & Open Source",
creditsDescription: "ONNX VC dimungkinkan berkat proyek dan pustaka open source luar biasa berikut:",
liveTuningTitle: "Setelan Cepat Pemrosesan",
customCanvasTitle: "Kustomisasi Canvas Visualizer",
showMicInput: "Tampilkan Input Mic",
showAiOutput: "Tampilkan Output AI",
lineWidthLabel: "Ketebalan Garis",
traceDecayLabel: "Intensitas Ekor (Trail Fading)",
inputLineColorLabel: "Warna Garis Input",
outputLineColorLabel: "Warna Garis Output",
creditCreatorTitle: "Pencipta & Integrator",
creditNeuralTitle: "Konversi Neural",
creditEngineTitle: "Mesin Inferensi",
creditPitchTitle: "Ekstraksi Nada Vokal",
creditPipelineTitle: "Streaming Pipeline",
creditFrameworkTitle: "Framework Frontend",
creditDesignTitle: "Desain & Animasi",
creditCreatorDesc: "Pengembang antarmuka audio ONNX VC dan pengintegrasi workspace kontrol audio real-time berlatensi ultra-rendah.",
creditNeuralDesc: "Kerangka kerja pengubah suara berbasis AI yang menggunakan fitur retrieval untuk transfer karakter suara berlatensi rendah.",
creditEngineDesc: "Mesin akselerasi inferensi model lintas platform untuk CPU, CUDA GPU, dan Windows DirectML GPU.",
creditPitchDesc: "Model deteksi pitch vokal berkinerja tinggi yang presisi terhadap desau latar belakang.",
creditPipelineDesc: "Pipa transfer data audio biner mentah PCM float32 yang berjalan lancar antara peramban dan server python.",
creditFrameworkDesc: "Kerangka kerja aplikasi web terstruktur yang dikompilasi ke statik HTML ekspor.",
creditDesignDesc: "Mesin animasi layout deklaratif dan utilitas CSS presisi untuk tampilan premium."
},
ja: {
appTitle: "🎙️ ONNX VC",
appSubtitle: "ONNX Runtime高速化による低遅延リアルタイムAI音声変換システム。",
wsServerUrl: "WebSocketサーバーURL",
wsPlaceholder: "ws://localhost:8765",
connectionStatus: "接続状態",
disconnected: "切断",
connecting: "接続中...",
connected: "接続完了",
connect: "サーバー接続",
disconnect: "接続解除",
startChanger: "音声変換開始",
stopChanger: "音声変換停止",
listeningActive: "モニター音:ON",
listeningMute: "モニター音:OFF",
// Tabs
tabDashboard: "ワークスペース",
tabModel: "モデル設定",
tabDsp: "オーディオDSP",
tabShortcuts: "ショートカット",
// Model Config
modelConfigTitle: "モデルとデバイスの構成",
quickPresets: "クイックプリセット (パフォーマンス)",
latencyPreset: "⚡ 低遅延優先 (PM)",
qualityPreset: "🎙️ 高音質優先 (RMVPE)",
selectModel: "キャラクターモデルの選択 (RVC ONNX)",
executionProvider: "実行プロバイダー (GPUアクセラレーション)",
routingMode: "音声ルーティングモード",
clientMode: "クライアントモード (ブラウザ再生)",
serverMode: "サーバーモード (ハードウェア直結)",
serverInput: "サーバー入力マイク",
serverOutput: "サーバー出力スピーカー",
pitchMethod: "ピッチ検出アルゴリズム",
transpose: "ピッチ変換 (トランスポーズ)",
transposeMale: "-24 (男声向け)",
transposeNormal: "0 (原音)",
transposeFemale: "+24 (女声/アニメ声)",
// DSP
dspTitle: "オーディオ処理設定 (DSP)",
noiseGate: "ノイズゲート (閾値)",
noiseGateSens: "-60 dB (高感度)",
noiseGateDefault: "-40 dB (推奨)",
noiseGateStrict: "-10 dB (厳格)",
inputGain: "入力ゲイン (マイク)",
outputGain: "出力ゲイン (AI音量)",
noiseCancel: "ノイズキャンセリング",
noiseCancelDesc: "ブラウザのエコーと環境音を除去します",
bufferSize: "バッファサイズ (遅延時間 vs 安定性)",
// Visualizers
visualizerTitle: "リアルタイム波形表示",
micSignal: "マイク入力信号",
aiSignal: "AI音声出力信号",
activeSignal: "音声検出中",
pipStream: "PiP波形ウィンドウ",
pipClose: "PiPを閉じる",
// HUD
hudLatency: "応答速度 (RTT)",
hudInference: "推論速度",
hudDetector: "音声検出",
hudTalking: "発話中",
hudSilent: "無音",
hudSr: "モデルサンプリングレート",
hudHelp: "?キーでショートカットヘルプを表示",
// Shortcuts Dialog
shortcutsTitle: "キーボードショートカット一覧",
shortcutsDesc: "キーボードを使ってマウスなしで素早く操作できます:",
shortcutsClose: "閉じる",
shortcutConnect: "WebSocketサーバーの接続 / 切断",
shortcutStream: "AI音声変換の開始 / 停止",
shortcutMute: "ローカル出力のミュート / 解除",
shortcutPreset1: "プリセット適用:低遅延優先 (PM)",
shortcutPreset2: "プリセット適用:高音質優先 (RMVPE)",
shortcutHelp: "ショートカット一覧の表示 / 非表示",
// Premium layouts
characterCardTitle: "現在のボイスモデル",
characterAvatarDesc: "現在ロードされている音声のキャラクタープロファイルです。",
welcomeBack: "リアルタイムオーディオコントロールセンター",
currentLang: "言語",
themeSettings: "テーマとアクセント",
themeMode: "テーマモード",
themeDark: "ダークモード",
themeLight: "ライトモード",
accentColorLabel: "グローバルアクセントカラー",
tabCredits: "オープンソース",
creditsTitle: "💖 オープンソースクレジット",
creditsDescription: "ONNX VCは、以下の素晴らしいオープンソースプロジェクトとライブラリのおかげで実現しました。",
liveTuningTitle: "常用パラメータ微調整",
customCanvasTitle: "カスタムビジュアライザ",
showMicInput: "マイク入力を表示",
showAiOutput: "AI出力を表示",
lineWidthLabel: "線の太さ",
traceDecayLabel: "残像フェード率",
inputLineColorLabel: "入力線の色",
outputLineColorLabel: "出力線の色",
creditCreatorTitle: "開発・統合元",
creditNeuralTitle: "ニューラル音声変換",
creditEngineTitle: "推推論エンジン",
creditPitchTitle: "ピッチ検出",
creditPipelineTitle: "ストリーミング・パイプライン",
creditFrameworkTitle: "フロントエンドフレームワーク",
creditDesignTitle: "デザインとアニメーション",
creditCreatorDesc: "ONNX VCクライアントインターフェースおよび超低遅延リアルタイムオーディオ制御ワークスペースの統合開発チーム。",
creditNeuralDesc: "リアルタイムの音声特徴抽出および声質変換のためのコアニューラルネットワークアーキテクチャ。",
creditEngineDesc: "CPU、NVIDIA CUDA、およびWindows DirectML GPUバックエンド上で動作する、クロスプラットフォームの推論高速化エンジン。",
creditPitchDesc: "周囲のノイズ下でも高精度にボーカルのピッチ追跡を行うことができる高性能ピッチ推定モデル。",
creditPipelineDesc: "ブラウザクライアントとPythonサーバー間で生のPCM float32フレームを高速に送受信するバイナリデータパイプライン。",
creditFrameworkDesc: "Reactクライアントコンポーネントを静的に最適化されたHTMLにエクスポートするモダンウェブフレームワーク。",
creditDesignDesc: "インタラクティブで高品質なUIデザインのための、ユーティリティ優先CSSおよび宣言的アニメーションライブラリ。"
},
zh: {
appTitle: "🎙️ ONNX VC",
appSubtitle: "基于 ONNX 运行时加速的低延迟实时 AI 变声器系统。",
wsServerUrl: "WebSocket 服务器地址",
wsPlaceholder: "ws://localhost:8765",
connectionStatus: "连接状态",
disconnected: "已断开",
connecting: "连接中...",
connected: "已连接",
connect: "连接服务器",
disconnect: "断开连接",
startChanger: "开启变声",
stopChanger: "停止变声",
listeningActive: "声音监听:开启",
listeningMute: "声音监听:静音",
// Tabs
tabDashboard: "控制工作台",
tabModel: "模型设置",
tabDsp: "音频 DSP",
tabShortcuts: "快捷键",
// Model Config
modelConfigTitle: "变声模型与硬件设备配置",
quickPresets: "快速预设 (性能配置)",
latencyPreset: "⚡ 极速响应 (PM)",
qualityPreset: "🎙️ 高清音质 (RMVPE)",
selectModel: "选择声音模型 (RVC ONNX)",
executionProvider: "运行加速提供商 (GPU 加速)",
routingMode: "音频路由模式",
clientMode: "客户端模式 (浏览器音频流转换)",
serverMode: "服务器模式 (直连服务端硬件)",
serverInput: "服务器输入麦克风",
serverOutput: "服务器输出扬声器",
pitchMethod: "基频检测算法 (Pitch)",
transpose: "变调参数 (Transpose)",
transposeMale: "-24 (男声声调)",
transposeNormal: "0 (原音)",
transposeFemale: "+24 (女声/动漫声调)",
// DSP
dspTitle: "音频效果器配置 (DSP)",
noiseGate: "噪声门限阈值 (Noise Gate)",
noiseGateSens: "-60 dB (灵敏)",
noiseGateDefault: "-40 dB (默认)",
noiseGateStrict: "-10 dB (严格)",
inputGain: "输入增益 (麦克风音量)",
outputGain: "输出增益 (变声后音量)",
noiseCancel: "回声抑噪过滤",
noiseCancelDesc: "过滤浏览器的回声和杂音",
bufferSize: "缓冲区大小 (延迟时间 vs 稳定性)",
// Visualizers
visualizerTitle: "实时音频波形图",
micSignal: "麦克风输入波形",
aiSignal: "AI变声输出波形",
activeSignal: "正在输入",
pipStream: "画中画波形图",
pipClose: "关闭画中画",
// HUD
hudLatency: "双向延迟 (RTT)",
hudInference: "推理用时",
hudDetector: "声控指示器",
hudTalking: "检测到讲话",
hudSilent: "静音中",
hudSr: "模型音频采样率",
hudHelp: "按 ? 键打开快捷键指南",
// Shortcuts Dialog
shortcutsTitle: "键盘快捷键指南",
shortcutsDesc: "使用键盘快捷键可以在没有鼠标的情况下极速控制工作台:",
shortcutsClose: "关闭",
shortcutConnect: "连接 / 断开 WebSocket 服务器",
shortcutStream: "开启 / 停止 AI 变声器",
shortcutMute: "静音 / 开启本地输出监听",
shortcutPreset1: "加载预设:极速响应 (PM)",
shortcutPreset2: "加载预设:高清音质 (RMVPE)",
shortcutHelp: "打开 / 关闭快捷键帮助面板",
// Premium layouts
characterCardTitle: "当前声音人物",
characterAvatarDesc: "当前正在承载的音频权重包与神经网络特征。",
welcomeBack: "实时音频变声控制台",
currentLang: "语言",
themeSettings: "界面主题与强调色",
themeMode: "主题模式",
themeDark: "深色模式",
themeLight: "浅色模式",
accentColorLabel: "全局强调颜色",
tabCredits: "开源鸣谢",
creditsTitle: "💖 开源软件鸣谢",
creditsDescription: "ONNX VC 的诞生离不开以下优秀的开源项目与函数库的支持:",
liveTuningTitle: "常用变声微调",
customCanvasTitle: "画布自定设置",
showMicInput: "显示麦克风输入",
showAiOutput: "显示AI变声输出",
lineWidthLabel: "线条宽度",
traceDecayLabel: "余晖消退率 (渐变)",
inputLineColorLabel: "输入线颜色",
outputLineColorLabel: "输出线颜色",
creditCreatorTitle: "核心集成开发商",
creditNeuralTitle: "声线转换算法",
creditEngineTitle: "深度学习推理引擎",
creditPitchTitle: "基频音高提取",
creditPipelineTitle: "数据流通通道",
creditFrameworkTitle: "前端应用框架",
creditDesignTitle: "界面设计与动效",
creditCreatorDesc: "ONNX VC 客户端界面设计与超低延迟音频控制工作台的集成开发者。",
creditNeuralDesc: "基于检索的神经网络架构,用于实现低延迟的实时声音特征提取与音色转换。",
creditEngineDesc: "跨平台的机器学习模型推理加速引擎,支持 CPU、NVIDIA CUDA 以及 Windows DirectML GPU 后端。",
creditPitchDesc: "高性能人声基频检测模型,在背景嘈杂的环境下仍能提供极高精度的音高跟踪。",
creditPipelineDesc: "在浏览器客户端与 Python 服务端之间高速传输原始 PCM Float32 音频帧的双向二进制数据通道。",
creditFrameworkDesc: "现代网页开发框架,支持将 React 客户端组件编译并打包为高度优化的静态资源导出。",
creditDesignDesc: "功能类优先 CSS 框架与流式声明式动画库,用以打造流畅的高级交互式视觉界面。"
},
es: {
appTitle: "🎙️ ONNX VC",
appSubtitle: "Modulador de voz por IA en tiempo real y baja latencia acelerado por ONNX Runtime.",
wsServerUrl: "URL del Servidor WebSocket",
wsPlaceholder: "ws://localhost:8765",
connectionStatus: "Estado de la Conexión",
disconnected: "Desconectado",
connecting: "Conectando...",
connected: "Conectado",
connect: "Conectar Servidor",
disconnect: "Desconectar Servidor",
startChanger: "Iniciar Modulador",
stopChanger: "Detener Modulador",
listeningActive: "Escucha: ACTIVA",
listeningMute: "Escucha: SILENCIADO",
// Tabs
tabDashboard: "Espacio Trabajo",
tabModel: "Ajustes Modelo",
tabDsp: "Audio DSP",
tabShortcuts: "Atajos Teclado",
// Model Config
modelConfigTitle: "Configuración de Modelo y Dispositivo",
quickPresets: "Ajustes Rápidos (Perfil de Rendimiento)",
latencyPreset: "⚡ Respuesta Instantánea (PM)",
qualityPreset: "🎙️ Alta Fidelidad (RMVPE)",
selectModel: "Seleccionar Modelo de Voz (RVC ONNX)",
executionProvider: "Proveedor de Ejecución (Aceleración GPU)",
routingMode: "Modo de Ruta de Audio",
clientMode: "Modo Cliente (Streaming en Navegador)",
serverMode: "Modo Servidor (Sounddevice Directo)",
serverInput: "Micrófono de Entrada del Servidor",
serverOutput: "Altavoz de Salida del Servidor",
pitchMethod: "Método de Extracción de Tono",
transpose: "Transposición (Modificador de Tono)",
transposeMale: "-24 (Tono Grave Masculino)",
transposeNormal: "0 (Original)",
transposeFemale: "+24 (Tono Agudo/Anime)",
// DSP
dspTitle: "Configuración de Procesamiento de Audio (DSP)",
noiseGate: "Puerta de Ruido (Umbral)",
noiseGateSens: "-60 dB (Sensible)",
noiseGateDefault: "-40 dB (Predeterminado)",
noiseGateStrict: "-10 dB (Estricto)",
inputGain: "Ganancia de Entrada (Micrófono)",
outputGain: "Ganancia de Salida (Volumen IA)",
noiseCancel: "Cancelación de Ruido (Filtro)",
noiseCancelDesc: "Filtra el eco y el zumbido de fondo",
bufferSize: "Tamaño de Búfer (Tamaño de Chunk - Latencia vs Estabilidad)",
// Visualizers
visualizerTitle: "Visualizador de Ondas de Audio",
micSignal: "Señal de Entrada del Micrófono",
aiSignal: "Señal de Salida de Voz IA",
activeSignal: "Señal Activa",
pipStream: "Forma de Onda PiP",
pipClose: "Cerrar PiP",
// HUD
hudLatency: "Latencia RTT",
hudInference: "Velocidad de Inferencia",
hudDetector: "Detector de Voz",
hudTalking: "Hablando",
hudSilent: "Silencio",
hudSr: "Frecuencia del Modelo",
hudHelp: "Presione ? para ver el menú de atajos",
// Shortcuts Dialog
shortcutsTitle: "Guía de Atajos de Teclado",
shortcutsDesc: "Utilice los siguientes atajos para controlar el panel de control sin el mouse:",
shortcutsClose: "Cerrar",
shortcutConnect: "Conectar / Desconectar Servidor WebSocket",
shortcutStream: "Iniciar / Detener Modulador de Voz IA",
shortcutMute: "Silenciar / Activar Escucha Local de Salida",
shortcutPreset1: "Cargar Ajuste: Respuesta Instantánea (PM)",
shortcutPreset2: "Cargar Ajuste: Alta Fidelidad (RMVPE)",
shortcutHelp: "Abrir / Cerrar Diálogo de Ayuda de Atajos",
// Premium layouts
characterCardTitle: "Voz del Personaje Activo",
characterAvatarDesc: "Perfil de pesos de voz cargado actualmente.",
welcomeBack: "Centro de Control de Audio en Tiempo Real",
currentLang: "Idioma",
themeSettings: "Tema de Interfaz y Acento",
themeMode: "Modo de Tema",
themeDark: "Modo Oscuro",
themeLight: "Modo Claro",
accentColorLabel: "Color de Acento Global",
tabCredits: "Créditos",
creditsTitle: "💖 Créditos de Código Abierto",
creditsDescription: "ONNX VC es posible gracias a los siguientes increíbles proyectos y bibliotecas de código abierto:",
liveTuningTitle: "Ajustes en Vivo",
customCanvasTitle: "Ajustes de Canvas",
showMicInput: "Mostrar Entrada Mic",
showAiOutput: "Mostrar Salida IA",
lineWidthLabel: "Grosor de Línea",
traceDecayLabel: "Decaimiento del Trazo",
inputLineColorLabel: "Color de Línea de Entrada",
outputLineColorLabel: "Color de Línea de Salida",
creditCreatorTitle: "Creador e Integrador",
creditNeuralTitle: "Conversión Neuronal",
creditEngineTitle: "Motor de Inferencia",
creditPitchTitle: "Extracción de Tono",
creditPipelineTitle: "Línea de Transmisión",
creditFrameworkTitle: "Marco Frontend",
creditDesignTitle: "Diseño y Animación",
creditCreatorDesc: "Creadores de la interfaz de cliente ONNX VC e integradores del entorno de control de audio en tiempo real.",
creditNeuralDesc: "Arquitectura central de red neuronal para la extracción de características de voz y conversión vocal.",
creditEngineDesc: "Acelerador multiplataforma de inferencia de modelos de IA para CPU, GPU CUDA y GPU DirectML de Windows.",
creditPitchDesc: "Modelo robusto de estimación de tono mínimo para un seguimiento de tono vocal de alta precisión.",
creditPipelineDesc: "Tubería binaria de alta velocidad para la transferencia de tramas PCM float32 nativas entre el cliente y el servidor.",
creditFrameworkDesc: "Marco de desarrollo web moderno que compila componentes de React para exportaciones estáticas optimizadas.",
creditDesignDesc: "Utilidad de estilos CSS y librerías de animación declarativa para interfaces de usuario interactivas de primera calidad."
}
};
-595
View File
@@ -1,595 +0,0 @@
/* ==========================================================================
CSS GLOBAL TOKENS & RESET
========================================================================== */
:root {
--bg-dark: #07080e;
--bg-card: rgba(13, 17, 30, 0.7);
--border-color: rgba(99, 102, 241, 0.18);
--primary: #6366f1;
--primary-glow: rgba(99, 102, 241, 0.4);
--accent: #a855f7;
--accent-glow: rgba(168, 85, 247, 0.45);
--emerald: #10b981;
--rose: #ef4444;
--text-main: #e2e8f0;
--text-muted: #94a3b8;
--font-header: 'Outfit', 'Inter', -apple-system, BlinkMacSystemFont, sans-serif;
--font-body: 'Inter', -apple-system, BlinkMacSystemFont, sans-serif;
--transition-smooth: all 0.3s cubic-bezier(0.25, 0.8, 0.25, 1);
}
* {
box-sizing: border-box;
margin: 0;
padding: 0;
}
body {
background-color: var(--bg-dark);
color: var(--text-main);
font-family: var(--font-body);
min-height: 100vh;
overflow-x: hidden;
position: relative;
padding: 2rem 1.5rem;
}
/* ==========================================================================
DYNAMIC GLOWING BACKGROUND
========================================================================== */
.glow-backdrop {
position: fixed;
top: 0;
left: 0;
right: 0;
bottom: 0;
z-index: -1;
background:
radial-gradient(circle at 10% 20%, rgba(99, 102, 241, 0.08) 0%, transparent 40%),
radial-gradient(circle at 90% 80%, rgba(168, 85, 247, 0.09) 0%, transparent 45%);
pointer-events: none;
}
/* ==========================================================================
LAYOUT CONTAINER & CARDS
========================================================================== */
.dashboard-container {
max-width: 1200px;
margin: 0 auto;
display: flex;
flex-direction: column;
gap: 1.5rem;
}
.glassmorphism {
background: var(--bg-card);
backdrop-filter: blur(16px);
-webkit-backdrop-filter: blur(16px);
border: 1px solid var(--border-color);
border-radius: 16px;
box-shadow: 0 8px 32px 0 rgba(0, 0, 0, 0.37);
transition: var(--transition-smooth);
}
.glassmorphism:hover {
border-color: rgba(99, 102, 241, 0.3);
box-shadow: 0 10px 40px 0 rgba(99, 102, 241, 0.1);
}
.card {
padding: 1.75rem;
}
.card-title {
font-family: var(--font-header);
font-size: 1.25rem;
font-weight: 600;
margin-bottom: 1.25rem;
background: linear-gradient(135deg, #fff 0%, var(--text-muted) 100%);
-webkit-background-clip: text;
-webkit-text-fill-color: transparent;
border-bottom: 1px solid rgba(255, 255, 255, 0.05);
padding-bottom: 0.75rem;
}
/* ==========================================================================
APP HEADER
========================================================================== */
.app-header {
text-align: center;
margin-bottom: 1rem;
}
.logo-area {
display: inline-flex;
align-items: center;
gap: 0.75rem;
margin-bottom: 0.5rem;
}
.logo-area h1 {
font-family: var(--font-header);
font-size: 2.5rem;
font-weight: 800;
letter-spacing: -0.5px;
background: linear-gradient(135deg, var(--primary) 0%, var(--accent) 100%);
-webkit-background-clip: text;
-webkit-text-fill-color: transparent;
text-shadow: 0 0 40px rgba(99, 102, 241, 0.2);
}
.pulse-indicator {
width: 10px;
height: 10px;
border-radius: 50%;
background-color: var(--rose);
box-shadow: 0 0 10px var(--rose);
}
.pulse-indicator.active {
background-color: var(--emerald);
box-shadow: 0 0 10px var(--emerald);
animation: pulse 1.8s infinite;
}
.tagline {
color: var(--text-muted);
font-size: 0.95rem;
font-weight: 400;
max-width: 600px;
margin: 0 auto;
}
/* ==========================================================================
DASHBOARD GRID LAYOUT
========================================================================== */
.dashboard-grid {
display: grid;
grid-template-columns: repeat(2, 1fr);
gap: 1.5rem;
}
@media (max-width: 768px) {
.dashboard-grid {
grid-template-columns: 1fr;
}
.col-span-2 {
grid-column: span 1 !important;
}
}
.col-span-2 {
grid-column: span 2;
}
/* ==========================================================================
INPUTS & CONTROLS
========================================================================== */
.control-group {
margin-bottom: 1.25rem;
}
.control-group:last-child {
margin-bottom: 0;
}
label {
display: block;
font-size: 0.85rem;
font-weight: 500;
color: var(--text-muted);
margin-bottom: 0.5rem;
text-transform: uppercase;
letter-spacing: 0.5px;
}
.custom-select {
width: 100%;
padding: 0.8rem 1rem;
background-color: rgba(20, 24, 45, 0.8);
border: 1px solid var(--border-color);
border-radius: 8px;
color: var(--text-main);
font-size: 0.9rem;
font-family: var(--font-body);
outline: none;
transition: var(--transition-smooth);
cursor: pointer;
appearance: none;
background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='24' height='24' viewBox='0 0 24 24' fill='none' stroke='%2394a3b8' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'%3E%3C/polyline%3E%3C/svg%3E");
background-repeat: no-repeat;
background-position: right 1rem center;
background-size: 1.2rem;
}
.custom-select:focus {
border-color: var(--primary);
box-shadow: 0 0 8px var(--primary-glow);
}
.input-group input {
background-color: rgba(20, 24, 45, 0.8);
border: 1px solid var(--border-color);
border-radius: 8px;
color: var(--text-main);
padding: 0.8rem 1rem;
width: 100%;
font-family: var(--font-body);
font-size: 0.9rem;
outline: none;
transition: var(--transition-smooth);
}
.input-group input:focus {
border-color: var(--primary);
box-shadow: 0 0 8px var(--primary-glow);
}
/* ==========================================================================
SLIDERS STYLING
========================================================================== */
.slider-header {
display: flex;
justify-content: space-between;
align-items: center;
margin-bottom: 0.25rem;
}
.slider-value {
font-family: var(--font-header);
font-weight: 600;
color: var(--accent);
text-shadow: 0 0 8px var(--accent-glow);
font-size: 0.95rem;
}
.custom-slider {
-webkit-appearance: none;
width: 100%;
height: 6px;
border-radius: 3px;
background: rgba(99, 102, 241, 0.15);
outline: none;
margin: 0.75rem 0;
}
.custom-slider::-webkit-slider-thumb {
-webkit-appearance: none;
appearance: none;
width: 18px;
height: 18px;
border-radius: 50%;
background: linear-gradient(135deg, var(--primary) 0%, var(--accent) 100%);
cursor: pointer;
box-shadow: 0 0 10px var(--primary-glow);
transition: transform 0.1s ease;
}
.custom-slider::-webkit-slider-thumb:hover {
transform: scale(1.2);
}
.slider-ticks {
display: flex;
justify-content: space-between;
font-size: 0.75rem;
color: var(--text-muted);
}
/* ==========================================================================
BUTTONS
========================================================================== */
.btn {
padding: 0.8rem 1.5rem;
border-radius: 8px;
font-family: var(--font-header);
font-weight: 600;
font-size: 0.9rem;
cursor: pointer;
border: none;
outline: none;
transition: var(--transition-smooth);
display: inline-flex;
align-items: center;
justify-content: center;
gap: 0.5rem;
}
.btn-primary {
background: linear-gradient(135deg, var(--primary) 0%, #4f46e5 100%);
color: white;
box-shadow: 0 4px 14px 0 var(--primary-glow);
}
.btn-primary:hover:not(:disabled) {
transform: translateY(-2px);
box-shadow: 0 6px 20px 0 rgba(99, 102, 241, 0.6);
}
.btn-accent {
background: linear-gradient(135deg, var(--accent) 0%, #7c3aed 100%);
color: white;
box-shadow: 0 4px 14px 0 var(--accent-glow);
}
.btn-accent:hover:not(:disabled) {
transform: translateY(-2px);
box-shadow: 0 6px 20px 0 rgba(168, 85, 247, 0.65);
}
.btn:active:not(:disabled) {
transform: translateY(0);
}
.btn:disabled {
opacity: 0.5;
cursor: not-allowed;
box-shadow: none;
}
/* ==========================================================================
CONNECTION BAR
========================================================================== */
.connection-bar {
padding: 1rem 1.5rem !important;
}
.form-row {
display: flex;
align-items: flex-end;
gap: 1.5rem;
flex-wrap: wrap;
}
.form-row .input-group {
flex: 1;
min-width: 250px;
}
.connection-status-container {
display: flex;
align-items: center;
height: 48px;
}
.status-badge {
padding: 0.4rem 0.8rem;
border-radius: 20px;
font-size: 0.8rem;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.5px;
display: inline-flex;
align-items: center;
gap: 0.35rem;
}
.status-badge::before {
content: '';
display: inline-block;
width: 6px;
height: 6px;
border-radius: 50%;
}
.status-badge.connected {
background-color: rgba(16, 185, 129, 0.15);
color: var(--emerald);
border: 1px solid rgba(16, 185, 129, 0.3);
}
.status-badge.connected::before {
background-color: var(--emerald);
box-shadow: 0 0 6px var(--emerald);
}
.status-badge.disconnected {
background-color: rgba(239, 68, 68, 0.15);
color: var(--rose);
border: 1px solid rgba(239, 68, 68, 0.3);
}
.status-badge.disconnected::before {
background-color: var(--rose);
box-shadow: 0 0 6px var(--rose);
}
.status-badge.connecting {
background-color: rgba(168, 85, 247, 0.15);
color: var(--accent);
border: 1px solid rgba(168, 85, 247, 0.3);
}
.status-badge.connecting::before {
background-color: var(--accent);
box-shadow: 0 0 6px var(--accent);
animation: blink 1s infinite;
}
.btn-group-row {
display: flex;
gap: 0.75rem;
height: 48px;
}
/* ==========================================================================
MODERN RADIO TILES
========================================================================== */
.radio-group-modern {
display: grid;
grid-template-columns: repeat(2, 1fr);
gap: 0.5rem;
}
.radio-tile {
position: relative;
cursor: pointer;
margin: 0;
}
.radio-tile input {
position: absolute;
opacity: 0;
}
.tile-label {
display: block;
padding: 0.6rem;
background-color: rgba(20, 24, 45, 0.5);
border: 1px solid var(--border-color);
border-radius: 8px;
text-align: center;
font-size: 0.8rem;
font-weight: 500;
color: var(--text-muted);
transition: var(--transition-smooth);
}
.radio-tile input:checked + .tile-label {
background-color: rgba(99, 102, 241, 0.12);
border-color: var(--primary);
color: var(--text-main);
box-shadow: 0 0 10px rgba(99, 102, 241, 0.2);
}
.radio-tile:hover .tile-label {
border-color: rgba(99, 102, 241, 0.4);
}
/* ==========================================================================
OSCILLOSCOPE WAVEFORM CANVASES
========================================================================== */
.visualizer-row {
display: flex;
gap: 1.5rem;
flex-wrap: wrap;
}
.visualizer-container {
flex: 1;
min-width: 280px;
display: flex;
flex-direction: column;
gap: 0.5rem;
}
.vis-label {
display: flex;
align-items: center;
gap: 0.5rem;
font-size: 0.8rem;
font-weight: 500;
color: var(--text-muted);
}
.dot {
width: 6px;
height: 6px;
border-radius: 50%;
}
.input-dot {
background-color: var(--primary);
box-shadow: 0 0 6px var(--primary);
}
.output-dot {
background-color: var(--accent);
box-shadow: 0 0 6px var(--accent);
}
.waveform-canvas {
width: 100%;
height: 150px;
background-color: #0b0c13;
border-radius: 8px;
border: 1px solid rgba(255, 255, 255, 0.03);
}
/* ==========================================================================
PERFORMANCE HUD
========================================================================== */
.performance-hud {
display: flex;
justify-content: space-between;
align-items: center;
padding: 0.85rem 1.75rem !important;
}
.hud-item {
display: flex;
flex-direction: column;
gap: 0.15rem;
}
.hud-label {
font-size: 0.7rem;
text-transform: uppercase;
letter-spacing: 1px;
color: var(--text-muted);
font-weight: 500;
}
.hud-value {
font-family: var(--font-header);
font-size: 1.1rem;
font-weight: 700;
color: white;
}
.hud-separator {
width: 1px;
height: 30px;
background-color: rgba(255, 255, 255, 0.08);
}
.hud-value.text-accent {
color: var(--accent);
text-shadow: 0 0 8px var(--accent-glow);
}
.active-badge {
color: var(--emerald);
text-shadow: 0 0 6px rgba(16, 185, 129, 0.4);
}
@media (max-width: 600px) {
.performance-hud {
flex-direction: column;
align-items: flex-start;
gap: 0.75rem;
}
.hud-separator {
display: none;
}
}
/* ==========================================================================
KEYFRAME ANIMATIONS
========================================================================== */
@keyframes pulse {
0% {
transform: scale(0.9);
box-shadow: 0 0 0 0 rgba(16, 185, 129, 0.7);
}
70% {
transform: scale(1.1);
box-shadow: 0 0 0 10px rgba(16, 185, 129, 0);
}
100% {
transform: scale(0.9);
box-shadow: 0 0 0 0 rgba(16, 185, 129, 0);
}
}
@keyframes blink {
0%, 100% {
opacity: 1;
}
50% {
opacity: 0.4;
}
}
+34
View File
@@ -0,0 +1,34 @@
{
"compilerOptions": {
"target": "ES2017",
"lib": ["dom", "dom.iterable", "esnext"],
"allowJs": true,
"skipLibCheck": true,
"strict": true,
"noEmit": true,
"esModuleInterop": true,
"module": "esnext",
"moduleResolution": "bundler",
"resolveJsonModule": true,
"isolatedModules": true,
"jsx": "react-jsx",
"incremental": true,
"plugins": [
{
"name": "next"
}
],
"paths": {
"@/*": ["./src/*"]
}
},
"include": [
"next-env.d.ts",
"**/*.ts",
"**/*.tsx",
".next/types/**/*.ts",
".next/dev/types/**/*.ts",
"**/*.mts"
],
"exclude": ["node_modules"]
}
+134
View File
@@ -0,0 +1,134 @@
import os
import sys
import torch
import argparse
import traceback
# Menambahkan direktori aktif ke path agar lib dapat diimpor
sys.path.append(os.getcwd())
from lib.infer_pack.models_onnx import SynthesizerTrnMsNSFsidM
def export_model_to_onnx(model_path, output_onnx_path):
print(f"Loading PyTorch checkpoint from: {model_path}")
try:
# Load checkpoint ke CPU
cpt = torch.load(model_path, map_location="cpu")
except Exception as e:
print(f"Error loading checkpoint: {e}")
return False
# Ambil metadata model
tgt_sr = cpt["config"][-1]
# Ambil jumlah spk dari bobot embedding
if "emb_g.weight" in cpt["weight"]:
n_spk = cpt["weight"]["emb_g.weight"].shape[0]
else:
n_spk = 1
# Sesuaikan config spk_embed_dim
cpt["config"][-3] = n_spk
version = cpt.get("version", "v1")
if_f0 = cpt.get("f0", 1)
print(f"Model Version: {version}")
print(f"Pitch (F0) Enabled: {if_f0}")
print(f"Target Sample Rate: {tgt_sr} Hz")
print(f"Number of Speakers: {n_spk}")
# Inisialisasi model khusus ONNX (SynthesizerTrnMsNSFsidM)
# is_half set ke False untuk ekspor dalam FP32 demi kompabilitas ONNX Runtime yang stabil
try:
net_g = SynthesizerTrnMsNSFsidM(*cpt["config"], version=version, is_half=False)
# Hapus bagian encoder posterior yang tidak digunakan saat inferensi
if hasattr(net_g, "enc_q"):
del net_g.enc_q
# Muat bobot model, biarkan strict=False agar mengabaikan enc_q yang dihapus
net_g.load_state_dict(cpt["weight"], strict=False)
net_g.eval()
print("PyTorch model loaded successfully. Preparing dummy inputs...")
except Exception as e:
print(f"Failed to initialize RVC ONNX model class: {e}")
traceback.print_exc()
return False
# Siapkan dummy inputs untuk tracing ekspor
test_len = 10 # Panjang sekuens dummy
feat_dim = 256 if version == "v1" else 768
phone = torch.randn(1, test_len, feat_dim, dtype=torch.float32)
phone_lengths = torch.tensor([test_len], dtype=torch.int64)
pitch = torch.randint(1, 254, (1, test_len), dtype=torch.int64)
nsff0 = torch.randn(1, test_len, dtype=torch.float32)
g = torch.tensor([0], dtype=torch.int64) # Speaker ID 0
rnd = torch.randn(1, 192, test_len, dtype=torch.float32)
input_names = ["phone", "phone_lengths", "pitch", "nsff0", "g", "rnd"]
output_names = ["audio"]
dynamic_axes = {
"phone": {1: "length"},
"pitch": {1: "length"},
"nsff0": {1: "length"},
"rnd": {2: "length"},
"audio": {1: "audio_length"}
}
print(f"Exporting model to ONNX format at: {output_onnx_path}")
try:
torch.onnx.export(
net_g,
(phone, phone_lengths, pitch, nsff0, g, rnd),
output_onnx_path,
opset_version=17,
input_names=input_names,
output_names=output_names,
dynamic_axes=dynamic_axes,
verbose=False
)
print("ONNX model exported successfully!")
return True
except Exception as e:
print(f"Error during ONNX export: {e}")
traceback.print_exc()
return False
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="Export RVC PyTorch .pth model to ONNX")
parser.add_argument("--model_name", type=str, required=True, help="Nama model di folder weights (nama sub-folder)")
parser.add_argument("--output", type=str, default="", help="Path output file ONNX (opsional)")
args = parser.parse_args()
model_root = "weights"
model_dir = os.path.join(model_root, args.model_name)
if not os.path.isdir(model_dir):
print(f"Error: Folder '{model_dir}' tidak ditemukan!")
sys.exit(1)
pth_files = [f for f in os.listdir(model_dir) if f.endswith(".pth")]
if not pth_files:
print(f"Error: Tidak ada berkas .pth di dalam folder '{model_dir}'!")
sys.exit(1)
pth_path = os.path.join(model_dir, pth_files[0])
if args.output:
onnx_path = args.output
else:
# Default simpan di dalam sub-folder weights yang sama
onnx_name = os.path.splitext(pth_files[0])[0] + ".onnx"
onnx_path = os.path.join(model_dir, onnx_name)
success = export_model_to_onnx(pth_path, onnx_path)
if success:
print(f"\nSelesai! Model ONNX disimpan di: {onnx_path}")
else:
print("\nEkspor gagal!")
sys.exit(1)
+1 -45
View File
@@ -20,9 +20,6 @@ import logging
import traceback
import argparse
import threading
import webbrowser
from http.server import SimpleHTTPRequestHandler
import socketserver
import numpy as np
import torch
import onnxruntime as ort
@@ -616,27 +613,7 @@ async def start_websocket_server(host, port):
async with websockets.serve(websocket_handler, host, port):
await asyncio.Future()
# --- HTTP STATIC FILE SERVER FOR FRONTEND ---
def start_http_server(port, directory="frontend"):
class MyHandler(SimpleHTTPRequestHandler):
def __init__(self, *args, **kwargs):
# Force serve from directory relative to the project root
base_dir = os.path.dirname(os.path.abspath(__file__))
full_dir = os.path.join(base_dir, directory)
super().__init__(*args, directory=full_dir, **kwargs)
def log_message(self, format, *args):
# Suppress standard logging to prevent console pollution
pass
try:
# Create a TCPServer that allows address reuse
socketserver.TCPServer.allow_reuse_address = True
with socketserver.TCPServer(("", port), MyHandler) as httpd:
logger.info(f"Serving HTTP frontend on http://localhost:{port}")
httpd.serve_forever()
except Exception as e:
logger.error(f"Failed to start HTTP server: {e}")
# --- LOCAL AUDIO DEVICE STREAM MODE ---
def run_local_device_mode(model_name, f0_up_key, f0_method, device, input_device, output_device, chunk_size):
@@ -709,7 +686,6 @@ if __name__ == "__main__":
parser.add_argument("--mode", type=str, default="websocket", choices=["websocket", "device"], help="Server running mode")
parser.add_argument("--host", type=str, default="127.0.0.1", help="WebSocket host")
parser.add_argument("--port", type=int, default=8765, help="WebSocket port")
parser.add_argument("--http_port", type=int, default=8000, help="HTTP static server port for Web UI")
parser.add_argument("--model", type=str, default="", help="RVC Model folder name inside weights/")
parser.add_argument("--transpose", type=int, default=0, help="Pitch shift in semitones (transpose)")
parser.add_argument("--f0_method", type=str, default="pm", choices=["pm", "harvest", "dio", "rmvpe"], help="Pitch extraction method")
@@ -731,27 +707,7 @@ if __name__ == "__main__":
sys.exit(1)
if args.mode == "websocket":
# 1. Start HTTP Server in a background thread to serve the frontend!
http_thread = threading.Thread(
target=start_http_server,
args=(args.http_port, "frontend"),
daemon=True
)
http_thread.start()
# 2. Automatically open the Web UI in the default browser!
web_ui_url = f"http://127.0.0.1:{args.http_port}"
logger.info(f"Automatically launching Web UI at {web_ui_url} in browser...")
# We give it a tiny delay to ensure the HTTP server socket is open
def open_browser():
time.sleep(0.5)
webbrowser.open(web_ui_url)
browser_thread = threading.Thread(target=open_browser, daemon=True)
browser_thread.start()
# 3. Start the WebSocket server on the main event loop
# Start the WebSocket server on the main event loop
try:
asyncio.run(start_websocket_server(args.host, args.port))
except KeyboardInterrupt:
+2 -2
View File
@@ -6,10 +6,10 @@ set VENV_PYTHON=..\rvc-tts-webui\venv\Scripts\python.exe
if exist "%VENV_PYTHON%" (
echo Menjalankan menggunakan virtual environment dari rvc-tts-webui...
"%VENV_PYTHON%" -u server.py --host 127.0.0.1 --port 8765 --http_port 8000
"%VENV_PYTHON%" -u server.py --host 127.0.0.1 --port 8765
) else (
echo Virtual environment tidak ditemukan, mencoba menggunakan python sistem...
python -u server.py --host 127.0.0.1 --port 8765 --http_port 8000
python -u server.py --host 127.0.0.1 --port 8765
)
pause