Compare commits
15 Commits
68020c5990
..
master
@@ -29,3 +29,8 @@ Thumbs.db
|
||||
*.index
|
||||
weights/
|
||||
pretrained/
|
||||
|
||||
# Next.js workspace exclusions
|
||||
frontend/node_modules/
|
||||
frontend/.next/
|
||||
frontend/out/
|
||||
|
||||
@@ -0,0 +1,78 @@
|
||||
# 🤖 ONNX VC - Agent Guidelines & Architecture Map
|
||||
|
||||
This file serves as a guide for AI agents (Gemini, Claude, Cursor, etc.) working on the ONNX Voice Changer repository. It explains the project architecture, directory structure, core conventions, and how to maintain the codebase.
|
||||
|
||||
---
|
||||
|
||||
## 🛠️ Technology Stack
|
||||
1. **Backend:** Python 3.10+, WebSocket (using `websockets`), ONNX Runtime, NumPy, PyTorch (only for RVC export).
|
||||
2. **Frontend:** Next.js 15 (App Router), React 19, TypeScript, Tailwind CSS, Lucide React, Framer Motion.
|
||||
3. **Voice Conversion:** Retrieval-based Voice Conversion (RVC) models accelerated via ONNX Runtime (CPU, CUDA, DirectML).
|
||||
|
||||
---
|
||||
|
||||
## 📁 Repository Structure
|
||||
* [/server.py](file:///M:/Users/ahmad/project/onnx-voice-changer/server.py) — The main WebSocket API server.
|
||||
* [/frontend/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend) — Next.js 15 client dashboard app.
|
||||
* [/frontend-deprecated/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend-deprecated) — Legacy static single-page web files (HTML/CSS/JS). Do not modify.
|
||||
* [/docs/](file:///M:/Users/ahmad/project/onnx-voice-changer/docs) — Holds localized README documentation files (Indonesian, Spanish, Japanese, Chinese).
|
||||
* [/lib/](file:///M:/Users/ahmad/project/onnx-voice-changer/lib) — RVC models and export scripts (e.g. `export_onnx.py`).
|
||||
* [/weights/](file:///M:/Users/ahmad/project/onnx-voice-changer/weights) — Character voice models (e.g., `weights/HuTao/HuTao.onnx`).
|
||||
* [/pretrained/](file:///M:/Users/ahmad/project/onnx-voice-changer/pretrained) — Holds the pre-trained `vec-768-layer-12.onnx` ContentVec model.
|
||||
|
||||
---
|
||||
|
||||
## ⚙️ Core Architecture & Conventions
|
||||
|
||||
### 1. Pure API Backend (No Static Hosting)
|
||||
* **Rule:** The Python backend (`server.py`) operates **strictly as a WebSocket API**.
|
||||
* **Do NOT** configure Python to serve frontend static pages, build files, or index HTML.
|
||||
* The Next.js frontend client runs independently (via `npm run dev` or a separate production server).
|
||||
|
||||
### 2. WebSocket Audio Pipeline
|
||||
* Audio chunks are sent to and from the server as **binary WebSocket messages** containing raw `Float32` PCM audio data.
|
||||
* Configuration changes, telemetry, and status controls are handled using **JSON WebSocket messages** sent in the same connection.
|
||||
* Always check the message payload type (binary vs. string JSON text) in `server.py`.
|
||||
|
||||
### 3. Digital Signal Processing (DSP) Staging
|
||||
* Audio preprocessing is handled on the server side:
|
||||
1. **Low-Cut Filter:** Active Butterworth 1st order high-pass filter at 80Hz to eliminate AC hum.
|
||||
2. **Noise Gate:** Threshold-based silence gate to bypass inference when the user is silent.
|
||||
3. **Gain Controls:** Input and output gain staging before and after inference.
|
||||
* Ensure all DSP math is optimized using `numpy` arrays to maintain low latency.
|
||||
|
||||
### 4. RVC ONNX Export
|
||||
* PyTorch RVC models (`.pth`) must be converted to ONNX (`.onnx`) before inference.
|
||||
* Always use [/lib/export_onnx.py](file:///M:/Users/ahmad/project/onnx-voice-changer/lib/export_onnx.py) for conversion:
|
||||
```bash
|
||||
python lib/export_onnx.py --model_name <CharacterFolder>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🎨 Frontend Design Guidelines
|
||||
* **Responsive Layout:** Must support mobile and desktop views, utilizing a collapsible sidebar.
|
||||
* **Themes & Accent Colors:** Supports dark/light mode toggling, with a custom accent color system (Purple, Blue, Emerald, Rose, Amber) stored in state.
|
||||
* **i18n Translation:** Do not hardcode English/Indonesian strings. Ensure all labels, warnings, and messages are registered in [translations.ts](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend/src/utils/translations.ts).
|
||||
|
||||
---
|
||||
|
||||
## 🏃 Useful Development Commands
|
||||
|
||||
### Running Backend
|
||||
```bash
|
||||
python server.py --host 127.0.0.1 --port 8765 --device cuda
|
||||
```
|
||||
|
||||
### Running Frontend Dev Server
|
||||
```bash
|
||||
cd frontend
|
||||
npm run dev
|
||||
```
|
||||
|
||||
### Building Frontend Production Server
|
||||
```bash
|
||||
cd frontend
|
||||
npm run build
|
||||
npm run start
|
||||
```
|
||||
@@ -0,0 +1,201 @@
|
||||
Apache License
|
||||
Version 2.0, January 2004
|
||||
http://www.apache.org/licenses/
|
||||
|
||||
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
|
||||
|
||||
1. Definitions.
|
||||
|
||||
"License" shall mean the terms and conditions for use, reproduction,
|
||||
and distribution as defined by Sections 1 through 9 of this document.
|
||||
|
||||
"Licensor" shall mean the copyright owner or entity authorized by
|
||||
the copyright owner that is granting the License.
|
||||
|
||||
"Legal Entity" shall mean the union of the acting entity and all
|
||||
other entities that control, are controlled by, or are under common
|
||||
control with that entity. For the purposes of this definition,
|
||||
"control" means (i) the power, direct or indirect, to cause the
|
||||
direction or management of such entity, whether by contract or
|
||||
otherwise, or (ii) ownership of fifty percent (50%) or more of the
|
||||
outstanding shares, or (iii) beneficial ownership of such entity.
|
||||
|
||||
"You" (or "Your") shall mean an individual or Legal Entity
|
||||
exercising permissions granted by this License.
|
||||
|
||||
"Source" form shall mean the preferred form for making modifications,
|
||||
including but not limited to software source code, documentation
|
||||
source, and configuration files.
|
||||
|
||||
"Object" form shall mean any form resulting from mechanical
|
||||
transformation or translation of a Source form, including but
|
||||
not limited to compiled object code, generated documentation,
|
||||
and conversions to other media types.
|
||||
|
||||
"Work" shall mean the work of authorship, whether in Source or
|
||||
Object form, made available under the License, as indicated by a
|
||||
copyright notice that is included in or attached to the work
|
||||
(an example is provided in the Appendix below).
|
||||
|
||||
"Derivative Works" shall mean any work, whether in Source or Object
|
||||
form, that is based on (or derived from) the Work and for which the
|
||||
editorial revisions, annotations, elaborations, or other modifications
|
||||
represent, as a whole, an original work of authorship. For the purposes
|
||||
of this License, Derivative Works shall not include works that remain
|
||||
separable from, or merely link (or bind by name) to the interfaces of,
|
||||
the Work and Derivative Works thereof.
|
||||
|
||||
"Contribution" shall mean any work of authorship, including
|
||||
the original version of the Work and any modifications or additions
|
||||
to that Work or Derivative Works thereof, that is intentionally
|
||||
submitted to Licensor for inclusion in the Work by the copyright owner
|
||||
or by an individual or Legal Entity authorized to submit on behalf
|
||||
of the copyright owner. For the purposes of this definition, "submitted"
|
||||
means any form of electronic, verbal, or written communication sent
|
||||
to the Licensor or its representatives, including but not limited to
|
||||
communication on electronic mailing lists, source code control systems,
|
||||
and issue tracking systems that are managed by, or on behalf of, the
|
||||
Licensor for the purpose of discussing and improving the Work, but
|
||||
excluding communication that is conspicuously marked or otherwise
|
||||
designated in writing by the copyright owner as "Not a Contribution."
|
||||
|
||||
"Contributor" shall mean Licensor and any individual or Legal Entity
|
||||
on behalf of whom a Contribution has been received by Licensor and
|
||||
subsequently incorporated within the Work.
|
||||
|
||||
2. Grant of Copyright License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
copyright license to reproduce, prepare Derivative Works of,
|
||||
publicly display, publicly perform, sublicense, and distribute the
|
||||
Work and such Derivative Works in Source or Object form.
|
||||
|
||||
3. Grant of Patent License. Subject to the terms and conditions of
|
||||
this License, each Contributor hereby grants to You a perpetual,
|
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable
|
||||
(except as stated in this section) patent license to make, have made,
|
||||
use, offer to sell, sell, import, and otherwise transfer the Work,
|
||||
where such license applies only to those patent claims licensable
|
||||
by such Contributor that are necessarily infringed by their
|
||||
Contribution(s) alone or by combination of their Contribution(s)
|
||||
with the Work to which such Contribution(s) was submitted. If You
|
||||
institute patent litigation against any entity (including a
|
||||
cross-claim or counterclaim in a lawsuit) alleging that the Work
|
||||
or a Contribution incorporated within the Work constitutes direct
|
||||
or contributory patent infringement, then any patent licenses
|
||||
granted to You under this License for that Work shall terminate
|
||||
as of the date such litigation is filed.
|
||||
|
||||
4. Redistribution. You may reproduce and distribute copies of the
|
||||
Work or Derivative Works thereof in any medium, with or without
|
||||
modifications, and in Source or Object form, provided that You
|
||||
meet the following conditions:
|
||||
|
||||
(a) You must give any other recipients of the Work or
|
||||
Derivative Works a copy of this License; and
|
||||
|
||||
(b) You must cause any modified files to carry prominent notices
|
||||
stating that You changed the files; and
|
||||
|
||||
(c) You must retain, in the Source form of any Derivative Works
|
||||
that You distribute, all copyright, patent, trademark, and
|
||||
attribution notices from the Source form of the Work,
|
||||
excluding those notices that do not pertain to any part of
|
||||
the Derivative Works; and
|
||||
|
||||
(d) If the Work includes a "NOTICE" text file as part of its
|
||||
distribution, then any Derivative Works that You distribute must
|
||||
include a readable copy of the attribution notices contained
|
||||
within such NOTICE file, excluding those notices that do not
|
||||
pertain to any part of the Derivative Works, in at least one
|
||||
of the following places: within a NOTICE text file distributed
|
||||
as part of the Derivative Works; within the Source form or
|
||||
documentation, if distributed along with the Derivative Works; or,
|
||||
within a display generated by the Derivative Works, if and
|
||||
wherever such third-party notices normally appear. The contents
|
||||
of the NOTICE file are for informational purposes only and
|
||||
do not modify the License. You may add Your own attribution
|
||||
notices within Derivative Works that You distribute, alongside
|
||||
or as an addendum to the Work text from the License, provided
|
||||
that such additional attribution notices cannot be construed
|
||||
as modifying the License.
|
||||
|
||||
You may add Your own copyright statement to Your modifications and
|
||||
may provide additional or different license terms and conditions
|
||||
for use, reproduction, or distribution of Your modifications, or
|
||||
for any such Derivative Works as a whole, provided Your use,
|
||||
reproduction, and distribution of the Work otherwise complies with
|
||||
the conditions stated in this License.
|
||||
|
||||
5. Submission of Contributions. Unless You explicitly state otherwise,
|
||||
any Contribution intentionally submitted for inclusion in the Work
|
||||
by You to the Licensor shall be under the terms and conditions of
|
||||
this License, without any additional terms or conditions.
|
||||
Notwithstanding the above, nothing herein shall supersede or modify
|
||||
the terms of any separate license agreement you may have executed
|
||||
with Licensor regarding such Contributions.
|
||||
|
||||
6. Trademarks. This License does not grant permission to use the trade
|
||||
names, trademarks, service marks, or product names of the Licensor,
|
||||
except as required for reasonable and customary use in describing the
|
||||
origin of the Work and reproducing the content of the NOTICE file.
|
||||
|
||||
7. Disclaimer of Warranty. Unless required by applicable law or
|
||||
agreed to in writing, Licensor provides the Work (and each
|
||||
Contributor provides its Contributions) on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
|
||||
implied, including, without limitation, any warranties or conditions
|
||||
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
|
||||
PARTICULAR PURPOSE. You are solely responsible for determining the
|
||||
appropriateness of using or redistributing the Work and assume any
|
||||
risks associated with Your exercise of permissions under this License.
|
||||
|
||||
8. Limitation of Liability. In no event and under no legal theory,
|
||||
whether in tort (including negligence), contract, or otherwise,
|
||||
unless required by applicable law (such as deliberate and grossly
|
||||
negligent acts) or agreed to in writing, shall any Contributor be
|
||||
liable to You for damages, including any direct, indirect, special,
|
||||
incidental, or consequential damages of any character arising as a
|
||||
result of this License or out of the use or inability to use the
|
||||
Work (including but not limited to damages for loss of goodwill,
|
||||
work stoppage, computer failure or malfunction, or any and all
|
||||
other commercial damages or losses), even if such Contributor
|
||||
has been advised of the possibility of such damages.
|
||||
|
||||
9. Accepting Warranty or Additional Liability. While redistributing
|
||||
the Work or Derivative Works thereof, You may choose to offer,
|
||||
and charge a fee for, acceptance of support, warranty, indemnity,
|
||||
or other liability obligations and/or rights consistent with this
|
||||
License. However, in accepting such obligations, You may act only
|
||||
on Your own behalf and on Your sole responsibility, not on behalf
|
||||
of any other Contributor, and only if You agree to indemnify,
|
||||
defend, and hold each Contributor harmless for any liability
|
||||
incurred by, or claims asserted against, such Contributor by reason
|
||||
of your accepting any such warranty or additional liability.
|
||||
|
||||
END OF TERMS AND CONDITIONS
|
||||
|
||||
APPENDIX: How to apply the Apache License to your work.
|
||||
|
||||
To apply the Apache License to your work, attach the following
|
||||
boilerplate notice, with the fields enclosed by brackets "[]"
|
||||
replaced with your own identifying information. (Don't include
|
||||
the brackets!) The text should be enclosed in the appropriate
|
||||
comment syntax for the file format. We also recommend that a
|
||||
file or class name and description of the purpose be included on the
|
||||
same "printed page" as the copyright notice for easier
|
||||
identification within third-party archives.
|
||||
|
||||
Copyright 2026 Kanara Technology
|
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License");
|
||||
you may not use this file except in compliance with the License.
|
||||
You may obtain a copy of the License at
|
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software
|
||||
distributed under the License is distributed on an "AS IS" BASIS,
|
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
See the License for the specific language governing permissions and
|
||||
limitations under the License.
|
||||
@@ -1,12 +1,16 @@
|
||||
# 🎙️ Standalone ONNX Real-Time Voice Changer Service
|
||||
# 🎙️ ONNX VC - Standalone Real-Time Voice Changer
|
||||
|
||||
A high-performance, low-latency, real-time voice conversion system powered by **ONNX Runtime** and **Retrieval-based Voice Conversion (RVC)**. This application enables real-time voice conversion from a microphone/browser source to a designated target character model with minimal processing latency.
|
||||
🌐 **Languages:** [English](README.md) | [Bahasa Indonesia](docs/README.id.md) | [Español](docs/README.es.md) | [日本語](docs/README.ja.md) | [简体中文](docs/README.zh.md)
|
||||
|
||||
A high-performance, low-latency, real-time AI voice conversion system powered by **ONNX Runtime** and **Retrieval-based Voice Conversion (RVC)**. Features a premium dashboard built with **Next.js App Router**, **TypeScript**, and **Tailwind CSS**, supporting full internationalization.
|
||||
|
||||
---
|
||||
|
||||
## ✨ Key Features
|
||||
* **🚀 WebSocket Audio Pipeline:** Streaming audio transfer using binary WebSocket connections (raw PCM float32) for minimal overhead.
|
||||
* **⚡ Multi-Backend ONNX Acceleration:** Supports execution providers including NVIDIA `CUDA`, AMD/Intel `DirectML`, and fallback `CPU`.
|
||||
* **🌐 Universal Localisation:** Fully translatable interface supporting English, Indonesian, Japanese, Chinese, and Spanish.
|
||||
* **🎨 Premium Dashboard**: Fully responsive workspace built using React 19, Radix UI, Framer Motion, and Tailwind CSS.
|
||||
* **🎼 High-Fidelity DSP Pipeline:**
|
||||
* **Low-Cut Filter:** Active 1st order Butterworth high-pass filter at 80Hz to eliminate AC hum and rumble.
|
||||
* **Noise Gate:** Threshold-based noise suppression to bypass inference during silence (saving CPU/GPU cycles).
|
||||
@@ -35,61 +39,122 @@ graph TD
|
||||
---
|
||||
|
||||
## 📁 Repository Structure
|
||||
* [server.py](file:///M:/Users/ahmad/project/onnx-voice-changer/server.py) — The main WebSocket backend and static HTTP server managing connection loops, audio resampling, and model execution.
|
||||
* [server.py](file:///M:/Users/ahmad/project/onnx-voice-changer/server.py) — The main WebSocket backend server managing connection loops, audio resampling, and model execution.
|
||||
* [start.bat](file:///M:/Users/ahmad/project/onnx-voice-changer/start.bat) — Windows launcher batch file that automatically resolves the Python virtual environment and executes the server.
|
||||
* [requirements.txt](file:///M:/Users/ahmad/project/onnx-voice-changer/requirements.txt) — Python dependencies list.
|
||||
* [frontend/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend) — Contains client-side Web UI files:
|
||||
* [frontend/index.html](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend/index.html) — Control interface layout.
|
||||
* [frontend/app.js](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend/app.js) — WebSocket communication and client-side audio rendering.
|
||||
* [frontend/styles.css](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend/styles.css) — Custom dashboard styling.
|
||||
* [lib/](file:///M:/Users/ahmad/project/onnx-voice-changer/lib) — core package containing inference models and prediction tools.
|
||||
* [weights/](file:///M:/Users/ahmad/project/onnx-voice-changer/weights) — Directory for voice model weights. Place your custom `.onnx` and `.pth` model sub-directories here.
|
||||
* [pretrained/](file:///M:/Users/ahmad/project/onnx-voice-changer/pretrained) — Directory containing base pre-trained models such as `vec-768-layer-12.onnx` or `vec-256-layer-12.onnx`.
|
||||
* [frontend/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend) — The frontend client workspace built with Next.js (TypeScript, Tailwind CSS).
|
||||
* [frontend-deprecated/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend-deprecated) — The old deprecated frontend code.
|
||||
* [lib/](file:///M:/Users/ahmad/project/onnx-voice-changer/lib) — Core package containing inference models, ONNX conversion scripts, and prediction tools.
|
||||
* [weights/](file:///M:/Users/ahmad/project/onnx-voice-changer/weights) — Directory for character voice model weights (e.g. `weights/HuTao/`).
|
||||
* [pretrained/](file:///M:/Users/ahmad/project/onnx-voice-changer/pretrained) — Directory containing base pre-trained models.
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Getting Started
|
||||
## 🚀 Installation & Setup
|
||||
|
||||
### 📋 Prerequisites
|
||||
* **Python 3.10+** (Recommended)
|
||||
* **Python 3.10+**
|
||||
* **FFmpeg** installed and added to the system PATH (Required for audio processing utilities).
|
||||
* **Node.js 18+** & **npm** (Required to run the Next.js frontend client).
|
||||
* (Optional) **NVIDIA CUDA Toolkit** (v11.x/12.x) and **cuDNN** for GPU execution acceleration.
|
||||
|
||||
### 📦 Installation
|
||||
---
|
||||
|
||||
### 📦 1. Python Backend Installation
|
||||
1. Clone this repository to your local directory.
|
||||
2. Initialize and activate a virtual environment (optional but recommended):
|
||||
2. Initialize and activate a virtual environment:
|
||||
```bash
|
||||
python -m venv venv
|
||||
# On Windows:
|
||||
.\venv\Scripts\activate
|
||||
# On Linux/macOS:
|
||||
source venv/bin/activate
|
||||
```
|
||||
3. Install the required dependencies:
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
4. Place your ContentVec base model (`vec-768-layer-12.onnx` or `vec-256-layer-12.onnx`) inside the [pretrained/](file:///M:/Users/ahmad/project/onnx-voice-changer/pretrained) directory.
|
||||
5. Place your character models in [weights/](file:///M:/Users/ahmad/project/onnx-voice-changer/weights) in structured folders (e.g., `weights/HuTao/` containing `HuTao.onnx` and `HuTao.pth`).
|
||||
|
||||
### 🏃 Running the Server
|
||||
---
|
||||
|
||||
#### Option A: Quick Launch (Windows)
|
||||
Simply double-click the [start.bat](file:///M:/Users/ahmad/project/onnx-voice-changer/start.bat) file. It will automatically detect Python, set up the directory paths, and launch the service.
|
||||
### 📥 2. Download Pre-trained ContentVec (Required)
|
||||
The model requires a ContentVec base model to generate speaker features from voice chunks.
|
||||
1. Download the `vec-768-layer-12.onnx` model from Hugging Face:
|
||||
👉 **[Download vec-768-layer-12.onnx](https://huggingface.co/DogManTC/test-rvc-onnx/blob/main/vec-768-layer-12.onnx)**
|
||||
2. Save the downloaded file inside the [pretrained/](file:///M:/Users/ahmad/project/onnx-voice-changer/pretrained) directory:
|
||||
```
|
||||
pretrained/
|
||||
└── vec-768-layer-12.onnx
|
||||
```
|
||||
|
||||
#### Option B: Manual CLI execution
|
||||
Execute the server using your terminal:
|
||||
---
|
||||
|
||||
### 🔄 3. Setup & Export RVC Models to ONNX
|
||||
To run character models on ONNX Runtime, you must place your standard PyTorch RVC models (`.pth`) under the [weights/](file:///M:/Users/ahmad/project/onnx-voice-changer/weights) directory and convert them.
|
||||
|
||||
1. Create a sub-folder under `weights/` named after your character (e.g. `HuTao`):
|
||||
```
|
||||
weights/
|
||||
└── HuTao/
|
||||
└── HuTao.pth
|
||||
```
|
||||
2. Run the ONNX conversion script by passing the folder name of the model:
|
||||
```bash
|
||||
python lib/export_onnx.py --model_name HuTao
|
||||
```
|
||||
3. The script will automatically search for the `.pth` file inside `weights/HuTao/` and export a corresponding `HuTao.onnx` file inside the same directory:
|
||||
```
|
||||
weights/
|
||||
└── HuTao/
|
||||
├── HuTao.pth
|
||||
└── HuTao.onnx
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 🖥️ 4. Running the Frontend Client
|
||||
The frontend client runs as a standalone Next.js development server or built production server.
|
||||
|
||||
1. Navigate to the frontend directory:
|
||||
```bash
|
||||
cd frontend
|
||||
```
|
||||
2. Install npm dependencies:
|
||||
```bash
|
||||
npm install
|
||||
```
|
||||
3. Start the development server:
|
||||
```bash
|
||||
npm run dev
|
||||
```
|
||||
Open your browser and navigate to **`http://localhost:3000`**.
|
||||
|
||||
Alternatively, to build and run the production server:
|
||||
```bash
|
||||
python server.py --host 127.0.0.1 --port 8765 --http_port 8000 --device cuda
|
||||
npm run build
|
||||
npm run start
|
||||
```
|
||||
|
||||
### ⚙️ Command-Line Arguments
|
||||
---
|
||||
|
||||
## 🏃 Running the Voice Changer
|
||||
|
||||
### Step 1: Start the Python WebSocket Backend
|
||||
Run the server using your terminal (defaults to port `8765`):
|
||||
```bash
|
||||
python server.py --host 127.0.0.1 --port 8765 --device cuda
|
||||
```
|
||||
|
||||
#### ⚙️ Command-Line Arguments
|
||||
| Argument | Description | Default |
|
||||
|---|---|---|
|
||||
| `--host` | The address the WebSocket server binds to. | `127.0.0.1` |
|
||||
| `--port` | WebSocket communication port. | `8765` |
|
||||
| `--http_port`| Port serving the static frontend Web UI. | `8000` |
|
||||
| `--device` | The ONNX Runtime execution device (`cpu`, `cuda`, `dml`). | `cuda` |
|
||||
| `--model` | Target folder name in `weights/` to load directly upon startup. | `None` |
|
||||
|
||||
Once the server begins execution, it will spin up the local server, and your Web UI should open automatically at `http://localhost:8000`.
|
||||
### Step 2: Open the Frontend Dashboard
|
||||
Make sure your frontend client is running (via `npm run dev` or `npm run start` on `http://localhost:3000`), open it in your browser, and it will automatically connect to the WebSocket API backend.
|
||||
|
||||
---
|
||||
|
||||
@@ -98,3 +163,14 @@ To achieve low latency without output artifacts, the audio processing utilizes:
|
||||
1. **Sliding Window Context Buffer:** Keeps a short historical buffer of the audio to feed the model the required context frames while minimizing output audio delay.
|
||||
2. **Convolution Padding Fadeout:** 120ms of trailing silent padding is temporarily appended to input segments to avoid edge-fading anomalies inherent to RVC convolutional steps.
|
||||
3. **Linear Resampling:** Low-overhead linear interpolation for quick sample rate adaptation.
|
||||
|
||||
---
|
||||
|
||||
## 🤝 Credits & Acknowledgements
|
||||
* **Made with ❤️ by [Kanara Technology](https://github.com/kanaratechnologyindonesia)** (Mirror: [git.kanara.tech](https://git.kanara.tech/kanara))
|
||||
* Powered by [ONNX Runtime](https://onnxruntime.ai/) and [Retrieval-based Voice Conversion (RVC)](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI)
|
||||
|
||||
---
|
||||
|
||||
## 📄 License
|
||||
This project is licensed under the Apache License 2.0. See the [LICENSE](file:///M:/Users/ahmad/project/onnx-voice-changer/LICENSE) file for details.
|
||||
|
||||
@@ -0,0 +1,176 @@
|
||||
# 🎙️ ONNX VC - Standalone Real-Time Voice Changer
|
||||
|
||||
🌐 **Idiomas:** [English](../README.md) | [Bahasa Indonesia](README.id.md) | [Español](README.es.md) | [日本語](README.ja.md) | [简体中文](README.zh.md)
|
||||
|
||||
Un sistema de conversión de voz por IA en tiempo real, de alto rendimiento y baja latencia, impulsado por **ONNX Runtime** y **Retrieval-based Voice Conversion (RVC)**. Cuenta con un panel de control premium desarrollado con **Next.js App Router**, **TypeScript** y **Tailwind CSS**, con soporte para internacionalización completa.
|
||||
|
||||
---
|
||||
|
||||
## ✨ Características Clave
|
||||
* **🚀 Canal de Audio WebSocket (Audio Pipeline):** Transferencia de audio en tiempo real mediante conexiones WebSocket binarias (PCM float32 sin procesar) para una latencia mínima.
|
||||
* **⚡ Aceleración ONNX Multi-Backend:** Soporta proveedores de ejecución que incluyen NVIDIA `CUDA`, AMD/Intel `DirectML` y CPU como respaldo.
|
||||
* **🌐 Localización Universal:** Interfaz completamente traducible que soporta inglés, indonesio, japonés, chino y español.
|
||||
* **🎨 Panel de Control Premium:** Entorno de trabajo responsivo construido con React 19, Radix UI, Framer Motion y Tailwind CSS.
|
||||
* **🎼 Canal DSP de Alta Fidelidad:**
|
||||
* **Filtro de Corte Bajo (Low-Cut Filter):** Filtro de paso alto Butterworth activo de primer orden a 80 Hz para eliminar el zumbido de CA y los ruidos de fondo.
|
||||
* **Puerta de Ruido (Noise Gate):** Supresión de ruido basada en umbral para omitir la inferencia durante el silencio (ahorrando ciclos de CPU/GPU).
|
||||
* **Controles de Ganancia:** Control de ganancia digital independiente de entrada y salida.
|
||||
* **🧠 Extracción Avanzada de Tono:** Predicción de tono optimizada a 16 kHz utilizando el modelo RMVPE (Retrieval-based Minimum Vocal Pitch Estimation).
|
||||
* **🌐 Arquitectura de Enrutamiento Dual:** Permite enrutar el audio a través del navegador web (Web Audio API) o directamente mediante el hardware de audio local del servidor (utilizando `sounddevice`).
|
||||
|
||||
---
|
||||
|
||||
## 🛠️ Arquitectura del Sistema
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
A[Micrófono / Navegador Web] -->|Web Audio API| B(Conexión WebSocket)
|
||||
B -->|Fragmento de PCM Float32 sin procesar| C[Backend de server.py]
|
||||
C -->|1. Filtro de paso alto 80Hz| D[Tahap DSP]
|
||||
D -->|2. Ganancia y puerta de ruido| D
|
||||
D -->|3. Remuestreo a 16kHz| E[Hubert/ContentVec ONNX]
|
||||
D -->|4. Estimación de tono RMVPE| F[Predictor de tono]
|
||||
E --> G[Inferencia del modelo RVC ONNX]
|
||||
F --> G
|
||||
G -->|Fragmentos de audio de destino| H(Conexión WebSocket)
|
||||
H -->|Reproducir audio| I[Altavoces del navegador / Dispositivo de audio]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📁 Estructura del Repositorio
|
||||
* [server.py](file:///M:/Users/ahmad/project/onnx-voice-changer/server.py) — El servidor backend WebSocket principal que gestiona los bucles de conexión, el remuestreo de audio y la ejecución del modelo.
|
||||
* [start.bat](file:///M:/Users/ahmad/project/onnx-voice-changer/start.bat) — Archivo por lotes ejecutable para Windows que inicializa automáticamente el entorno virtual de Python y ejecuta el servidor.
|
||||
* [requirements.txt](file:///M:/Users/ahmad/project/onnx-voice-changer/requirements.txt) — Lista de dependencias de Python.
|
||||
* [frontend/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend) — El espacio de trabajo del cliente frontend construido con Next.js (TypeScript, Tailwind CSS).
|
||||
* [frontend-deprecated/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend-deprecated) — Código frontend antiguo ya en desuso.
|
||||
* [lib/](file:///M:/Users/ahmad/project/onnx-voice-changer/lib) — Paquete central que contiene los modelos de inferencia, scripts de conversión a ONNX y herramientas de predicción.
|
||||
* [weights/](file:///M:/Users/ahmad/project/onnx-voice-changer/weights) — Directorio para los archivos de peso de los modelos de voz de personajes (ej. `weights/HuTao/`).
|
||||
* [pretrained/](file:///M:/Users/ahmad/project/onnx-voice-changer/pretrained) — Directorio que contiene los modelos base preentrenados.
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Instalación y Configuración
|
||||
|
||||
### 📋 Requisitos Previos
|
||||
* **Python 3.10+**
|
||||
* **FFmpeg** instalado y agregado al PATH del sistema (necesario para las utilidades de procesamiento de audio).
|
||||
* **Node.js 18+** y **npm** (necesario para ejecutar el cliente frontend Next.js).
|
||||
* (Opcional) **NVIDIA CUDA Toolkit** (v11.x/12.x) y **cuDNN** para aceleración de ejecución en GPU.
|
||||
|
||||
---
|
||||
|
||||
### 📦 1. Instalación del Backend de Python
|
||||
1. Clone este repositorio en su directorio local.
|
||||
2. Inicialice y active un entorno virtual:
|
||||
```bash
|
||||
python -m venv venv
|
||||
# En Windows:
|
||||
.\venv\Scripts\activate
|
||||
# En Linux/macOS:
|
||||
source venv/bin/activate
|
||||
```
|
||||
3. Instale las dependencias requeridas:
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 📥 2. Descargar ContentVec Preentrenado (Requerido)
|
||||
El modelo requiere un modelo base ContentVec para generar las características del hablante a partir de los fragmentos de voz.
|
||||
1. Descargue el modelo `vec-768-layer-12.onnx` desde Hugging Face:
|
||||
👉 **[Descargar vec-768-layer-12.onnx](https://huggingface.co/DogManTC/test-rvc-onnx/blob/main/vec-768-layer-12.onnx)**
|
||||
2. Guarde el archivo descargado dentro del directorio `pretrained/`:
|
||||
```
|
||||
pretrained/
|
||||
└── vec-768-layer-12.onnx
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 🔄 3. Configuración y Exportación de Modelos RVC a ONNX
|
||||
Para ejecutar modelos de personajes en ONNX Runtime, debe colocar sus modelos RVC estándar de PyTorch (`.pth`) bajo el directorio `weights/` y convertirlos.
|
||||
|
||||
1. Cree una subcarpeta bajo `weights/` con el nombre de su personaje (ej. `HuTao`):
|
||||
```
|
||||
weights/
|
||||
└── HuTao/
|
||||
└── HuTao.pth
|
||||
```
|
||||
2. Ejecute el script de conversión a ONNX pasando el nombre de la carpeta del modelo:
|
||||
```bash
|
||||
python lib/export_onnx.py --model_name HuTao
|
||||
```
|
||||
3. El script buscará automáticamente el archivo `.pth` dentro de `weights/HuTao/` y exportará el archivo `HuTao.onnx` correspondiente en el mismo directorio:
|
||||
```
|
||||
weights/
|
||||
└── HuTao/
|
||||
├── HuTao.pth
|
||||
└── HuTao.onnx
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 🖥️ 4. Ejecución del Cliente Frontend
|
||||
El cliente frontend se ejecuta como un servidor de desarrollo independiente de Next.js o como un servidor de producción compilado.
|
||||
|
||||
1. Navegue al directorio frontend:
|
||||
```bash
|
||||
cd frontend
|
||||
```
|
||||
2. Instale las dependencias de npm:
|
||||
```bash
|
||||
npm install
|
||||
```
|
||||
3. Inicie el servidor de desarrollo:
|
||||
```bash
|
||||
npm run dev
|
||||
```
|
||||
Abra su navegador y acceda a **`http://localhost:3000`**.
|
||||
|
||||
Alternativamente, para compilar y ejecutar el servidor de producción:
|
||||
```bash
|
||||
npm run build
|
||||
npm run start
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🏃 Funcionamiento del Cambiador de Voz
|
||||
|
||||
### Paso 1: Iniciar el Backend WebSocket de Python
|
||||
Ejecute el servidor desde su terminal (puerto predeterminado `8765`):
|
||||
```bash
|
||||
python server.py --host 127.0.0.1 --port 8765 --device cuda
|
||||
```
|
||||
|
||||
#### ⚙️ Argumentos de Línea de Comandos
|
||||
| Argumento | Descripción | Predeterminado |
|
||||
|---|---|---|
|
||||
| `--host` | La dirección a la que se enlaza el servidor WebSocket. | `127.0.0.1` |
|
||||
| `--port` | Puerto de comunicación WebSocket. | `8765` |
|
||||
| `--device` | El dispositivo de ejecución de ONNX Runtime (`cpu`, `cuda`, `dml`). | `cuda` |
|
||||
| `--model` | Nombre de la carpeta de destino en `weights/` para cargar directamente al inicio. | `None` |
|
||||
|
||||
### Paso 2: Abrir el Panel de Control Frontend
|
||||
Asegúrese de que su cliente frontend esté ejecutándose (a través de `npm run dev` o `npm run start` en `http://localhost:3000`), ábralo en su navegador y este se conectará automáticamente al backend de la API WebSocket.
|
||||
|
||||
---
|
||||
|
||||
## 🔊 Detalles de DSP de Audio
|
||||
Para lograr una latencia baja sin artefactos de salida, el procesamiento de audio utiliza:
|
||||
1. **Búfer de Contexto de Ventana Deslizante:** Mantiene un búfer histórico corto de audio para alimentar al modelo con los fragmentos de contexto necesarios, minimizando el retraso de la salida de audio.
|
||||
2. **Desvanecimiento por Relleno de Convolución:** Se añade temporalmente un relleno silencioso de 120 ms al final de los segmentos de entrada para evitar anomalías de desvanecimiento en los bordes, inherentes a los pasos de convolución RVC.
|
||||
3. **Remuestreo Lineal:** Interpolación lineal de bajo costo computacional para una rápida adaptación de la frecuencia de muestreo.
|
||||
|
||||
---
|
||||
|
||||
## 🤝 Créditos y Agradecimientos
|
||||
* **Hecho con ❤️ por [Kanara Technology](https://github.com/kanaratechnologyindonesia)** (Espejo: [git.kanara.tech](https://git.kanara.tech/kanara))
|
||||
* Impulsado por [ONNX Runtime](https://onnxruntime.ai/) y [Retrieval-based Voice Conversion (RVC)](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI)
|
||||
|
||||
---
|
||||
|
||||
## 📄 Licencia
|
||||
Este proyecto está licenciado bajo la Apache License 2.0. Consulte el archivo [LICENSE](file:///M:/Users/ahmad/project/onnx-voice-changer/LICENSE) para más detalles.
|
||||
@@ -0,0 +1,176 @@
|
||||
# 🎙️ ONNX VC - Standalone Real-Time Voice Changer
|
||||
|
||||
🌐 **Bahasa:** [English](../README.md) | [Bahasa Indonesia](README.id.md) | [Español](README.es.md) | [日本語](README.ja.md) | [简体中文](README.zh.md)
|
||||
|
||||
Sistem konversi suara AI real-time berkinerja tinggi dan latensi rendah yang ditenagai oleh **ONNX Runtime** dan **Retrieval-based Voice Conversion (RVC)**. Dilengkapi dengan dashboard premium yang dibuat menggunakan **Next.js App Router**, **TypeScript**, dan **Tailwind CSS**, serta mendukung internasionalisasi penuh.
|
||||
|
||||
---
|
||||
|
||||
## ✨ Fitur Utama
|
||||
* **🚀 WebSocket Audio Pipeline:** Pengiriman audio streaming menggunakan koneksi WebSocket biner (raw PCM float32) untuk overhead minimal.
|
||||
* **⚡ Akselerasi ONNX Multi-Backend:** Mendukung execution providers termasuk NVIDIA `CUDA`, AMD/Intel `DirectML`, dan fallback `CPU`.
|
||||
* **🌐 Universal Localisation:** Antarmuka yang dapat diterjemahkan sepenuhnya, mendukung Bahasa Inggris, Indonesia, Jepang, Mandarin, dan Spanyol.
|
||||
* **🎨 Dashboard Premium**: Halaman kerja responsif yang dibangun menggunakan React 19, Radix UI, Framer Motion, dan Tailwind CSS.
|
||||
* **🎼 DSP Pipeline dengan Kualitas Tinggi:**
|
||||
* **Low-Cut Filter:** Butterworth high-pass filter orde pertama aktif pada frekuensi 80Hz untuk menghilangkan hum AC dan gemuruh.
|
||||
* **Noise Gate:** Penekanan derau berbasis ambang batas (threshold) untuk melewati inferensi saat hening (menghemat siklus CPU/GPU).
|
||||
* **Gain Controls:** Pengaturan gain digital input/output independen.
|
||||
* **🧠 Ekstraksi Pitch Canggih:** Prediksi pitch 16kHz yang dioptimalkan menggunakan model RMVPE (Retrieval-based Minimum Vocal Pitch Estimation).
|
||||
* **🌐 Arsitektur Dual Routing:** Mendukung perutean audio melalui browser web (Web Audio API) atau langsung melalui perangkat keras audio lokal server (menggunakan `sounddevice`).
|
||||
|
||||
---
|
||||
|
||||
## 🛠️ Arsitektur Sistem
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
A[Mikrofon / Browser Web] -->|Web Audio API| B(Koneksi WebSocket)
|
||||
B -->|Chunk PCM Float32 Mentah| C[Backend server.py]
|
||||
C -->|1. High-Pass Filter 80Hz| D[Tahap DSP]
|
||||
D -->|2. Gain & Noise Gate| D
|
||||
D -->|3. Resample ke 16kHz| E[Hubert/ContentVec ONNX]
|
||||
D -->|4. Estimasi Pitch RMVPE| F[Prediktor Pitch]
|
||||
E --> G[Inferensi Model ONNX RVC]
|
||||
F --> G
|
||||
G -->|Chunk Audio Target| H(Koneksi WebSocket)
|
||||
H -->|Putar Audio| I[Speaker Browser / Perangkat Audio]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📁 Struktur Repositori
|
||||
* [server.py](file:///M:/Users/ahmad/project/onnx-voice-changer/server.py) — Server backend WebSocket utama yang mengelola loop koneksi, resampling audio, dan eksekusi model.
|
||||
* [start.bat](file:///M:/Users/ahmad/project/onnx-voice-changer/start.bat) — File batch peluncur Windows yang secara otomatis menyiapkan environment virtual Python dan menjalankan server.
|
||||
* [requirements.txt](file:///M:/Users/ahmad/project/onnx-voice-changer/requirements.txt) — Daftar dependensi Python.
|
||||
* [frontend/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend) — Ruang kerja klien frontend yang dibangun dengan Next.js (TypeScript, Tailwind CSS).
|
||||
* [frontend-deprecated/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend-deprecated) — Kode frontend lama yang sudah usang.
|
||||
* [lib/](file:///M:/Users/ahmad/project/onnx-voice-changer/lib) — Paket inti yang berisi model inferensi, skrip konversi ONNX, dan alat prediksi.
|
||||
* [weights/](file:///M:/Users/ahmad/project/onnx-voice-changer/weights) — Direktori untuk bobot model suara karakter (contoh: `weights/HuTao/`).
|
||||
* [pretrained/](file:///M:/Users/ahmad/project/onnx-voice-changer/pretrained) — Direktori yang berisi model dasar pra-terlatih.
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Instalasi & Pengaturan
|
||||
|
||||
### 📋 Prasyarat
|
||||
* **Python 3.10+**
|
||||
* **FFmpeg** terinstal dan ditambahkan ke PATH sistem (Diperlukan untuk pemrosesan audio).
|
||||
* **Node.js 18+** & **npm** (Diperlukan untuk menjalankan klien frontend Next.js).
|
||||
* (Opsional) **NVIDIA CUDA Toolkit** (v11.x/12.x) dan **cuDNN** untuk akselerasi eksekusi GPU.
|
||||
|
||||
---
|
||||
|
||||
### 📦 1. Instalasi Backend Python
|
||||
1. Klon repositori ini ke direktori lokal Anda.
|
||||
2. Inisialisasi dan aktifkan virtual environment:
|
||||
```bash
|
||||
python -m venv venv
|
||||
# Di Windows:
|
||||
.\venv\Scripts\activate
|
||||
# Di Linux/macOS:
|
||||
source venv/bin/activate
|
||||
```
|
||||
3. Instal dependensi yang diperlukan:
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 📥 2. Unduh Pre-trained ContentVec (Diperlukan)
|
||||
Model ini memerlukan model dasar ContentVec untuk menghasilkan fitur pembicara dari potongan suara.
|
||||
1. Unduh model `vec-768-layer-12.onnx` dari Hugging Face:
|
||||
👉 **[Unduh vec-768-layer-12.onnx](https://huggingface.co/DogManTC/test-rvc-onnx/blob/main/vec-768-layer-12.onnx)**
|
||||
2. Simpan file yang diunduh di dalam direktori `pretrained/`:
|
||||
```
|
||||
pretrained/
|
||||
└── vec-768-layer-12.onnx
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 🔄 3. Siapkan & Ekspor Model RVC ke ONNX
|
||||
Untuk menjalankan model karakter pada ONNX Runtime, Anda harus menempatkan model RVC PyTorch standar Anda (`.pth`) di bawah direktori `weights/` dan mengonversinya.
|
||||
|
||||
1. Buat sub-folder di bawah `weights/` yang dinamai sesuai karakter Anda (contoh: `HuTao`):
|
||||
```
|
||||
weights/
|
||||
└── HuTao/
|
||||
└── HuTao.pth
|
||||
```
|
||||
2. Jalankan skrip konversi ONNX dengan memasukkan nama folder model:
|
||||
```bash
|
||||
python lib/export_onnx.py --model_name HuTao
|
||||
```
|
||||
3. Skrip akan secara otomatis mencari file `.pth` di dalam `weights/HuTao/` dan mengekspor file `HuTao.onnx` yang sesuai di dalam direktori yang sama:
|
||||
```
|
||||
weights/
|
||||
└── HuTao/
|
||||
├── HuTao.pth
|
||||
└── HuTao.onnx
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 🖥️ 4. Menjalankan Klien Frontend
|
||||
Klien frontend berjalan sebagai server pengembangan Next.js mandiri atau server produksi yang telah di-build.
|
||||
|
||||
1. Navigasi ke direktori frontend:
|
||||
```bash
|
||||
cd frontend
|
||||
```
|
||||
2. Instal dependensi npm:
|
||||
```bash
|
||||
npm install
|
||||
```
|
||||
3. Jalankan server pengembangan:
|
||||
```bash
|
||||
npm run dev
|
||||
```
|
||||
Buka browser Anda dan arahkan ke **`http://localhost:3000`**.
|
||||
|
||||
Atau, untuk membuat build dan menjalankan server produksi:
|
||||
```bash
|
||||
npm run build
|
||||
npm run start
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🏃 Menjalankan Pengubah Suara
|
||||
|
||||
### Langkah 1: Mulai Backend WebSocket Python
|
||||
Jalankan server menggunakan terminal Anda (default ke port `8765`):
|
||||
```bash
|
||||
python server.py --host 127.0.0.1 --port 8765 --device cuda
|
||||
```
|
||||
|
||||
#### ⚙️ Argumen Baris Perintah
|
||||
| Argumen | Deskripsi | Default |
|
||||
|---|---|---|
|
||||
| `--host` | Alamat yang diikat oleh server WebSocket. | `127.0.0.1` |
|
||||
| `--port` | Port komunikasi WebSocket. | `8765` |
|
||||
| `--device` | Perangkat eksekusi ONNX Runtime (`cpu`, `cuda`, `dml`). | `cuda` |
|
||||
| `--model` | Nama folder target di `weights/` untuk dimuat langsung saat memulai. | `None` |
|
||||
|
||||
### Langkah 2: Buka Dashboard Frontend
|
||||
Pastikan klien frontend Anda berjalan (via `npm run dev` atau `npm run start` pada `http://localhost:3000`), buka di browser Anda, dan klien akan terhubung secara otomatis ke backend WebSocket API.
|
||||
|
||||
---
|
||||
|
||||
## 🔊 Detail DSP Audio
|
||||
Untuk mencapai latensi rendah tanpa artifak output, pemrosesan audio menggunakan:
|
||||
1. **Buffer Konteks Sliding Window:** Mempertahankan buffer historis pendek dari audio untuk memberikan frame konteks yang diperlukan ke model sambil meminimalkan penundaan audio output.
|
||||
2. **Fadeout Padding Konvolusi:** Padding senyap trailing sebesar 120ms ditambahkan sementara ke segmen input untuk menghindari anomali memudar di tepi yang melekat pada langkah konvolusional RVC.
|
||||
3. **Resampling Linear:** Penggunaan overhead rendah resampling linear untuk adaptasi laju sampel yang cepat.
|
||||
|
||||
---
|
||||
|
||||
## 🤝 Kredit & Penghargaan
|
||||
* **Dibuat dengan ❤️ oleh [Kanara Technology](https://github.com/kanaratechnologyindonesia)** (Mirror: [git.kanara.tech](https://git.kanara.tech/kanara))
|
||||
* Ditenagai oleh [ONNX Runtime](https://onnxruntime.ai/) dan [Retrieval-based Voice Conversion (RVC)](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI)
|
||||
|
||||
---
|
||||
|
||||
## 📄 Lisensi
|
||||
Proyek ini dilisensikan di bawah Apache License 2.0. Lihat file [LICENSE](file:///M:/Users/ahmad/project/onnx-voice-changer/LICENSE) untuk informasi lebih lanjut.
|
||||
@@ -0,0 +1,176 @@
|
||||
# 🎙️ ONNX VC - Standalone Real-Time Voice Changer
|
||||
|
||||
🌐 **言語:** [English](../README.md) | [Bahasa Indonesia](README.id.md) | [Español](README.es.md) | [日本語](README.ja.md) | [简体中文](README.zh.md)
|
||||
|
||||
**ONNX Runtime**と**Retrieval-based Voice Conversion (RVC)**を搭載した、高性能・低遅延のリアルタイムAI音声変換システム。**Next.js App Router**、**TypeScript**、**Tailwind CSS**で構築されたプレミアムなダッシュボードを備え、完全な多言語対応をサポートしています。
|
||||
|
||||
---
|
||||
|
||||
## ✨ 主な機能
|
||||
* **🚀 WebSocketオーディオパイプライン:** 遅延を最小限に抑えるため、バイナリWebSocket接続(生のPCM float32)を使用したストリーミングオーディオ転送。
|
||||
* **⚡ マルチバックエンドONNXアクセラレーション:** NVIDIA `CUDA`、AMD/Intel `DirectML`、およびフォールバック用の`CPU`を含む実行プロバイダーをサポート。
|
||||
* **🌐 ユニバーサルローカライズ:** 英語、インドネシア語、日本語、中国語、スペイン語をサポートする完全翻訳可能なインターフェース。
|
||||
* **🎨 プレミアムダッシュボード:** React 19、Radix UI、Framer Motion、Tailwind CSSを使用して構築された完全レスポンシブなワークスペース。
|
||||
* **🎼 高忠実度DSPパイプライン:**
|
||||
* **ローカットフィルター:** ACハム音や低周波ノイズを除去する80Hzのアクティブ1次バターワースハイパスフィルター。
|
||||
* **ノイズゲート:** 無音時のインフェレンスをスキップ(CPU/GPUサイクルを節約)するためのしきい値ベースのノイズ抑制。
|
||||
* **ゲインコントロール:** 独立した入力/出力デジタルゲインスタージ。
|
||||
* **🧠 高度なピッチ抽出:** RMVPE(Retrieval-based Minimum Vocal Pitch Estimation)モデルを使用した、最適化された16kHzピッチ予測。
|
||||
* **🌐 デュアルルーティングアーキテクチャ:** Webブラウザ(Web Audio API)経由、またはサーバーのローカルオーディオハードウェア(`sounddevice`を使用)経由でのオーディオルーティングをサポート。
|
||||
|
||||
---
|
||||
|
||||
## 🛠️ システムアーキテクチャ
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
A[マイク / Webブラウザ] -->|Web Audio API| B(WebSocket接続)
|
||||
B -->|生のFloat32 PCMチャンク| C[server.py バックエンド]
|
||||
C -->|1. ハイパスフィルター 80Hz| D[DSP段階]
|
||||
D -->|2. ゲイン&ノイズゲート| D
|
||||
D -->|3. 16kHzへのリサンプリング| E[Hubert/ContentVec ONNX]
|
||||
D -->|4. ピッチ推定 RMVPE| F[ピッチ予測器]
|
||||
E --> G[RVC ONNXモデル推論]
|
||||
F --> G
|
||||
G -->|ターゲットオーディオチャンク| H(WebSocket接続)
|
||||
H -->|オーディオ再生| I[ブラウザスピーカー / オーディオデバイス]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📁 リポジトリ構成
|
||||
* [server.py](file:///M:/Users/ahmad/project/onnx-voice-changer/server.py) — 接続ループ、オーディオリサンプリング、およびモデル実行を管理するメインのWebSocketバックエンドサーバー。
|
||||
* [start.bat](file:///M:/Users/ahmad/project/onnx-voice-changer/start.bat) — Pythonの仮想環境を自動的に解決し、サーバーを実行するWindowsランチャーバッチファイル。
|
||||
* [requirements.txt](file:///M:/Users/ahmad/project/onnx-voice-changer/requirements.txt) — Pythonの依存関係リスト。
|
||||
* [frontend/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend) — Next.js(TypeScript、Tailwind CSS)で構築されたフロントエンドクライアントワークスペース。
|
||||
* [frontend-deprecated/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend-deprecated) — 非推奨の古いフロントエンドコード。
|
||||
* [lib/](file:///M:/Users/ahmad/project/onnx-voice-changer/lib) — 推論モデル、ONNX変換スクリプト、および予測ツールを含むコアパッケージ。
|
||||
* [weights/](file:///M:/Users/ahmad/project/onnx-voice-changer/weights) — キャラクターボイスモデルの重み用のディレクトリ(例: `weights/HuTao/`)。
|
||||
* [pretrained/](file:///M:/Users/ahmad/project/onnx-voice-changer/pretrained) — ベースとなる事前学習済みモデルを含むディレクトリ。
|
||||
|
||||
---
|
||||
|
||||
## 🚀 インストールとセットアップ
|
||||
|
||||
### 📋 前提条件
|
||||
* **Python 3.10+**
|
||||
* **FFmpeg** がインストールされ、システム環境変数 PATH に追加されていること(音声処理ユーティリティに必要)。
|
||||
* **Node.js 18+** & **npm**(Next.jsフロントエンドクライアントの実行に必要)。
|
||||
* (オプション)GPU実行アクセラレーション用の **NVIDIA CUDA Toolkit** (v11.x/12.x) および **cuDNN**。
|
||||
|
||||
---
|
||||
|
||||
### 📦 1. Pythonバックエンドのインストール
|
||||
1. このリポジトリをローカルディレクトリにクローンします。
|
||||
2. 仮想環境を初期化して有効化します:
|
||||
```bash
|
||||
python -m venv venv
|
||||
# Windowsの場合:
|
||||
.\venv\Scripts\activate
|
||||
# Linux/macOSの場合:
|
||||
source venv/bin/activate
|
||||
```
|
||||
3. 必要な依存関係をインストールします:
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 📥 2. 事前学習済みContentVecのダウンロード(必須)
|
||||
音声チャンクから話者特徴を生成するために、ContentVecベースモデルが必要です。
|
||||
1. Hugging Faceから`vec-768-layer-12.onnx`モデルをダウンロードします:
|
||||
👉 **[vec-768-layer-12.onnxをダウンロード](https://huggingface.co/DogManTC/test-rvc-onnx/blob/main/vec-768-layer-12.onnx)**
|
||||
2. ダウンロードしたファイルを`pretrained/`ディレクトリ内に保存します:
|
||||
```
|
||||
pretrained/
|
||||
└── vec-768-layer-12.onnx
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 🔄 3. RVCモデルのセットアップとONNXへのエクスポート
|
||||
ONNX Runtimeでキャラクターモデルを実行するには、標準のPyTorch RVCモデル(`.pth`)を`weights/`ディレクトリに配置し、変換する必要があります。
|
||||
|
||||
1. `weights/`の下にキャラクター名(例: `HuTao`)のサブフォルダを作成します:
|
||||
```
|
||||
weights/
|
||||
└── HuTao/
|
||||
└── HuTao.pth
|
||||
```
|
||||
2. モデルのフォルダ名を指定してONNX変換スクリプトを実行します:
|
||||
```bash
|
||||
python lib/export_onnx.py --model_name HuTao
|
||||
```
|
||||
3. スクリプトは`weights/HuTao/`内の`.pth`ファイルを自動的に探索し、同じディレクトリ内に対応する`HuTao.onnx`ファイルをエクスポートします:
|
||||
```
|
||||
weights/
|
||||
└── HuTao/
|
||||
├── HuTao.pth
|
||||
└── HuTao.onnx
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 🖥️ 4. フロントエンドクライアントの実行
|
||||
フロントエンドクライアントは、スタンドアロンのNext.js開発サーバーまたはビルド済みの本番サーバーとして実行されます。
|
||||
|
||||
1. フロントエンドディレクトリに移動します:
|
||||
```bash
|
||||
cd frontend
|
||||
```
|
||||
2. npm依存関係をインストールします:
|
||||
```bash
|
||||
npm install
|
||||
```
|
||||
3. 開発サーバーを起動します:
|
||||
```bash
|
||||
npm run dev
|
||||
```
|
||||
ブラウザを開き、**`http://localhost:3000`**にアクセスします。
|
||||
|
||||
または、本番サーバーをビルドして実行する場合:
|
||||
```bash
|
||||
npm run build
|
||||
npm run start
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🏃 ボイスチェンジャーの実行
|
||||
|
||||
### ステップ 1:Python WebSocketバックエンドの起動
|
||||
ターミナルを使用してサーバーを実行します(デフォルトポートは`8765`):
|
||||
```bash
|
||||
python server.py --host 127.0.0.1 --port 8765 --device cuda
|
||||
```
|
||||
|
||||
#### ⚙️ コマンドライン引数
|
||||
| 引数 | 説明 | デフォルト |
|
||||
|---|---|---|
|
||||
| `--host` | WebSocketサーバーがバインドするアドレス。 | `127.0.0.1` |
|
||||
| `--port` | WebSocket通信ポート。 | `8765` |
|
||||
| `--device` | ONNX Runtimeの実行デバイス(`cpu`、`cuda`、`dml`)。 | `cuda` |
|
||||
| `--model` | 起動時に直接読み込む`weights/`内のターゲットフォルダ名。 | `None` |
|
||||
|
||||
### ステップ 2:フロントエンドダッシュボードを開く
|
||||
フロントエンドクライアントが実行されていること(`npm run dev`または`http://localhost:3000`での`npm run start`経由)を確認し、ブラウザで開くと、自動的にWebSocket APIバックエンドに接続されます。
|
||||
|
||||
---
|
||||
|
||||
## 🔊 オーディオDSPの詳細
|
||||
出力アーティファクトなしで低遅延を実現するために、オーディオ処理では以下を使用しています:
|
||||
1. **スライディングウィンドウコンテキストバッファ:** 出力オーディオの遅延を最小限に抑えながら、モデルに必要なコンテキストフレームを供給するために、オーディオの短い履歴バッファを保持します。
|
||||
2. **畳み込みパディングフェードアウト:** RVCの畳み込みステップに固有のエッジフェード異常を回避するために、120msの末尾の無音パディングが入力セグメントに一時的に追加されます。
|
||||
3. **線形リサンプリング:** 素早いサンプリングレート適応のための低オーバーヘッドの線形補間。
|
||||
|
||||
---
|
||||
|
||||
## 🤝 クレジットと謝辞
|
||||
* **[Kanara Technology](https://github.com/kanaratechnologyindonesia)** (ミラー: [git.kanara.tech](https://git.kanara.tech/kanara)) **により ❤️ を込めて制作されました**
|
||||
* [ONNX Runtime](https://onnxruntime.ai/) および [Retrieval-based Voice Conversion (RVC)](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI) を使用しています
|
||||
|
||||
---
|
||||
|
||||
## 📄 ライセンス
|
||||
このプロジェクトは Apache License 2.0 の下でライセンスされています。詳細は [LICENSE](file:///M:/Users/ahmad/project/onnx-voice-changer/LICENSE) ファイルを参照してください。
|
||||
@@ -0,0 +1,176 @@
|
||||
# 🎙️ ONNX VC - Standalone Real-Time Voice Changer
|
||||
|
||||
🌐 **语言:** [English](../README.md) | [Bahasa Indonesia](README.id.md) | [Español](README.es.md) | [日本語](README.ja.md) | [简体中文](README.zh.md)
|
||||
|
||||
基于 **ONNX Runtime** 和 **检索式语音转换 (RVC)** 构建的高性能、低延迟实时 AI 变声系统。配有使用 **Next.js App Router**、**TypeScript** 和 **Tailwind CSS** 构建的高级仪表板,支持完整的国际化。
|
||||
|
||||
---
|
||||
|
||||
## ✨ 核心功能
|
||||
* **🚀 WebSocket 音频传输管道:** 使用二进制 WebSocket 连接(原始 PCM float32)进行流式音频传输,确保最低的系统开销。
|
||||
* **⚡ 多后端 ONNX 加速:** 支持包括 NVIDIA `CUDA`、AMD/Intel `DirectML` 以及备用 `CPU` 在内的多种执行提供程序。
|
||||
* **🌐 通用本地化:** 支持英文、印尼文、日文、中文和西班牙文的完全可翻译界面。
|
||||
* **🎨 高级仪表板:** 使用 React 19、Radix UI、Framer Motion 和 Tailwind CSS 构建的完全响应式工作区。
|
||||
* **🎼 高保真 DSP 处理管道:**
|
||||
* **低切滤波器:** 80Hz 处的主动一阶巴特沃斯高通滤波器,用以消除交流蜂鸣声和隆隆声。
|
||||
* **噪声门:** 基于阈值的噪声抑制,可在静音期间绕过推理(以节省 CPU/GPU 周期)。
|
||||
* **增益控制:** 独立输入/输出数字增益级。
|
||||
* **🧠 先进的基频提取:** 使用 RMVPE (Retrieval-based Minimum Vocal Pitch Estimation) 模型优化 16kHz 基频预测。
|
||||
* **🌐 双路由架构:** 支持通过 Web 浏览器(Web Audio API)或直接通过服务器的本地音频硬件(使用 `sounddevice`)进行音频路由。
|
||||
|
||||
---
|
||||
|
||||
## 🛠️ 系统架构
|
||||
|
||||
```mermaid
|
||||
graph TD
|
||||
A[麦克风 / Web 浏览器] -->|Web Audio API| B(WebSocket 连接)
|
||||
B -->|原始 Float32 PCM 块| C[server.py 后端]
|
||||
C -->|1. 高通滤波器 80Hz| D[DSP 阶段]
|
||||
D -->|2. 增益与噪声门| D
|
||||
D -->|3. 重采样至 16kHz| E[Hubert/ContentVec ONNX]
|
||||
D -->|4. 基频估计 RMVPE| F[基频预测器]
|
||||
E --> G[RVC ONNX 模型推理]
|
||||
F --> G
|
||||
G -->|目标音频块| H(WebSocket 连接)
|
||||
H -->|播放音频| I[浏览器扬声器 / 音频设备]
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 📁 仓库结构
|
||||
* [server.py](file:///M:/Users/ahmad/project/onnx-voice-changer/server.py) — 主要的 WebSocket 后端服务器,用于管理连接循环、音频重采样和模型执行。
|
||||
* [start.bat](file:///M:/Users/ahmad/project/onnx-voice-changer/start.bat) — Windows 启动批处理文件,可自动解析 Python 虚拟环境并执行服务器。
|
||||
* [requirements.txt](file:///M:/Users/ahmad/project/onnx-voice-changer/requirements.txt) — Python 依赖列表。
|
||||
* [frontend/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend) — 使用 Next.js(TypeScript, Tailwind CSS)构建的前端客户端工作区。
|
||||
* [frontend-deprecated/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend-deprecated) — 已弃用的旧前端代码。
|
||||
* [lib/](file:///M:/Users/ahmad/project/onnx-voice-changer/lib) — 核心包,包含推理模型、ONNX 转换脚本和预测工具。
|
||||
* [weights/](file:///M:/Users/ahmad/project/onnx-voice-changer/weights) — 角色声音模型权重目录(例如 `weights/HuTao/`)。
|
||||
* [pretrained/](file:///M:/Users/ahmad/project/onnx-voice-changer/pretrained) — 包含基础预训练模型的目录。
|
||||
|
||||
---
|
||||
|
||||
## 🚀 安装与设置
|
||||
|
||||
### 📋 准备工作
|
||||
* **Python 3.10+**
|
||||
* 已安装 **FFmpeg** 并添加到系统 PATH 中(音频处理工具所必需)。
|
||||
* **Node.js 18+** 和 **npm**(运行 Next.js 前端客户端所必需)。
|
||||
* (可选)用于 GPU 执行加速的 **NVIDIA CUDA Toolkit** (v11.x/12.x) 和 **cuDNN**。
|
||||
|
||||
---
|
||||
|
||||
### 📦 1. Python 后端安装
|
||||
1. 将此仓库克隆到您的本地目录。
|
||||
2. 初始化并激活虚拟环境:
|
||||
```bash
|
||||
python -m venv venv
|
||||
# 在 Windows 下:
|
||||
.\venv\Scripts\activate
|
||||
# 在 Linux/macOS 下:
|
||||
source venv/bin/activate
|
||||
```
|
||||
3. 安装所需依赖:
|
||||
```bash
|
||||
pip install -r requirements.txt
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 📥 2. 下载预训练 ContentVec(必需)
|
||||
该模型需要 ContentVec 基础模型以从声音块生成说话者特征。
|
||||
1. 从 Hugging Face 下载 `vec-768-layer-12.onnx` 模型:
|
||||
👉 **[下载 vec-768-layer-12.onnx](https://huggingface.co/DogManTC/test-rvc-onnx/blob/main/vec-768-layer-12.onnx)**
|
||||
2. 将下载的文件保存到 `pretrained/` 目录中:
|
||||
```
|
||||
pretrained/
|
||||
└── vec-768-layer-12.onnx
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 🔄 3. 设置并导出 RVC 模型为 ONNX
|
||||
要在 ONNX Runtime 上运行角色模型,您必须将标准 PyTorch RVC模型(`.pth`)放入 `weights/` 目录并进行转换。
|
||||
|
||||
1. 在 `weights/` 下创建一个以您的角色命名的子文件夹(例如 `HuTao`):
|
||||
```
|
||||
weights/
|
||||
└── HuTao/
|
||||
└── HuTao.pth
|
||||
```
|
||||
2. 通过传递模型的文件夹名称来运行 ONNX 转换脚本:
|
||||
```bash
|
||||
python lib/export_onnx.py --model_name HuTao
|
||||
```
|
||||
3. 脚本将自动在 `weights/HuTao/` 中搜索 `.pth` 文件,并在同一目录下导出相应的 `HuTao.onnx` 文件:
|
||||
```
|
||||
weights/
|
||||
└── HuTao/
|
||||
├── HuTao.pth
|
||||
└── HuTao.onnx
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### 🖥️ 4. 运行前端客户端
|
||||
前端客户端可以作为独立的 Next.js 开发服务器或编译后的生产服务器运行。
|
||||
|
||||
1. 进入前端目录:
|
||||
```bash
|
||||
cd frontend
|
||||
```
|
||||
2. 安装 npm 依赖项:
|
||||
```bash
|
||||
npm install
|
||||
```
|
||||
3. 启动开发服务器:
|
||||
```bash
|
||||
npm run dev
|
||||
```
|
||||
打开浏览器并访问 **`http://localhost:3000`**。
|
||||
|
||||
或者,构建并运行生产服务器:
|
||||
```bash
|
||||
npm run build
|
||||
npm run start
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🏃 运行变声器
|
||||
|
||||
### 步骤 1:启动 Python WebSocket 后端
|
||||
使用终端运行服务器(默认为端口 `8765`):
|
||||
```bash
|
||||
python server.py --host 127.0.0.1 --port 8765 --device cuda
|
||||
```
|
||||
|
||||
#### ⚙️ 命令行参数
|
||||
| 参数 | 说明 | 默认值 |
|
||||
|---|---|---|
|
||||
| `--host` | WebSocket 服务器绑定的地址。 | `127.0.0.1` |
|
||||
| `--port` | WebSocket 通信端口。 | `8765` |
|
||||
| `--device` | ONNX Runtime 执行设备(`cpu`、`cuda`、`dml`)。 | `cuda` |
|
||||
| `--model` | 启动时直接加载的 `weights/` 中的目标文件夹名称。 | `None` |
|
||||
|
||||
### 步骤 2:打开前端仪表板
|
||||
确保您的前端客户端正在运行(通过 `npm run dev` 或在 `http://localhost:3000` 上运行 `npm run start`),在浏览器中打开它,它将自动连接到 WebSocket API 后端。
|
||||
|
||||
---
|
||||
|
||||
## 音频 DSP 细节
|
||||
为了在没有输出伪影的情况下实现低延迟,音频处理利用了:
|
||||
1. **滑动窗口上下文缓冲区:** 保持较短的音频历史缓冲区,以向模型提供所需的上下文帧,同时最小化输出音频延迟。
|
||||
2. **卷积填充淡出:** 在输入片段中临时追加 120ms 的尾随静音填充,以避免 RVC 卷积步骤中固有的边缘淡入淡出异常。
|
||||
3. **线性重采样:** 低开销的线性插值,可快速适应采样率。
|
||||
|
||||
---
|
||||
|
||||
## 🤝 鸣谢与贡献
|
||||
* **由 [Kanara Technology](https://github.com/kanaratechnologyindonesia)** (镜像: [git.kanara.tech](https://git.kanara.tech/kanara)) **用 ❤️ 制作**
|
||||
* 基于 [ONNX Runtime](https://onnxruntime.ai/) 和 [Retrieval-based Voice Conversion (RVC)](https://github.com/RVC-Project/Retrieval-based-Voice-Conversion-WebUI)
|
||||
|
||||
---
|
||||
|
||||
## 📄 许可证
|
||||
本项目采用 Apache 2.0 许可证进行授权。详情请参阅 [LICENSE](file:///M:/Users/ahmad/project/onnx-voice-changer/LICENSE) 文件。
|
||||
@@ -0,0 +1,41 @@
|
||||
# ⚠️ Deprecated Frontend
|
||||
|
||||
This directory contains the original, legacy single-page static HTML/CSS/JS frontend application for the ONNX Voice Changer.
|
||||
|
||||
---
|
||||
|
||||
## 🚫 Status: Deprecated
|
||||
This frontend is **no longer maintained or active**.
|
||||
|
||||
### Why?
|
||||
1. **Upgraded Dashboard:** The voice changer client has been completely refactored and rewritten as a premium **Next.js 15 (TypeScript & Tailwind CSS)** application, which is now located in the main [/frontend](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend) workspace.
|
||||
2. **Pure API Backend:** The Python backend (`server.py`) has been simplified to run as a pure WebSocket API backend and no longer hosts static files.
|
||||
|
||||
---
|
||||
|
||||
## 📂 Files Included
|
||||
* `index.html` — The legacy single-page layout.
|
||||
* `styles.css` — Legacy stylesheets.
|
||||
* `app.js` — Legacy Audio DSP processing and WebSocket integration.
|
||||
|
||||
---
|
||||
|
||||
## 🏃 Running the Deprecated Client (Reference Only)
|
||||
If you still want to run this legacy frontend for reference, you can serve this directory using any static file server:
|
||||
|
||||
### Option A: Using Python
|
||||
```bash
|
||||
python -m http.server 3000
|
||||
```
|
||||
|
||||
### Option B: Using Node (npx)
|
||||
```bash
|
||||
npx serve . -p 3000
|
||||
```
|
||||
|
||||
After serving, open **`http://localhost:3000`** in your browser. Ensure the Python WebSocket backend (`server.py`) is running on port `8765`.
|
||||
|
||||
---
|
||||
|
||||
## 🤝 Credits & Acknowledgements
|
||||
* **Made with ❤️ by [Kanara Technology](https://github.com/kanaratechnologyindonesia)** (Mirror: [git.kanara.tech](https://git.kanara.tech/kanara))
|
||||
Binary file not shown.
Binary file not shown.
Binary file not shown.
@@ -0,0 +1,41 @@
|
||||
# See https://help.github.com/articles/ignoring-files/ for more about ignoring files.
|
||||
|
||||
# dependencies
|
||||
/node_modules
|
||||
/.pnp
|
||||
.pnp.*
|
||||
.yarn/*
|
||||
!.yarn/patches
|
||||
!.yarn/plugins
|
||||
!.yarn/releases
|
||||
!.yarn/versions
|
||||
|
||||
# testing
|
||||
/coverage
|
||||
|
||||
# next.js
|
||||
/.next/
|
||||
/out/
|
||||
|
||||
# production
|
||||
/build
|
||||
|
||||
# misc
|
||||
.DS_Store
|
||||
*.pem
|
||||
|
||||
# debug
|
||||
npm-debug.log*
|
||||
yarn-debug.log*
|
||||
yarn-error.log*
|
||||
.pnpm-debug.log*
|
||||
|
||||
# env files (can opt-in for committing if needed)
|
||||
.env*
|
||||
|
||||
# vercel
|
||||
.vercel
|
||||
|
||||
# typescript
|
||||
*.tsbuildinfo
|
||||
next-env.d.ts
|
||||
@@ -0,0 +1,5 @@
|
||||
<!-- BEGIN:nextjs-agent-rules -->
|
||||
# This is NOT the Next.js you know
|
||||
|
||||
This version has breaking changes — APIs, conventions, and file structure may all differ from your training data. Read the relevant guide in `node_modules/next/dist/docs/` before writing any code. Heed deprecation notices.
|
||||
<!-- END:nextjs-agent-rules -->
|
||||
@@ -0,0 +1 @@
|
||||
@AGENTS.md
|
||||
@@ -0,0 +1,54 @@
|
||||
# 🎨 ONNX VC - Frontend Client Dashboard
|
||||
|
||||
The premium user dashboard for the **ONNX VC (Real-Time Voice Changer)**. Built using **Next.js 15 (App Router)**, **React 19**, **TypeScript**, and **Tailwind CSS**, providing a high-fidelity control panel for real-time AI voice conversion.
|
||||
|
||||
---
|
||||
|
||||
## ✨ Key Features
|
||||
* **🌐 Complete Internationalization (i18n):** Supports English, Indonesian, Spanish, Japanese, and Chinese.
|
||||
* **🌓 Dark Mode & Custom Themes:** Seamless dark/light theme switching with custom accent colors (Purple, Blue, Emerald, Rose, Amber).
|
||||
* **📊 Dual Waveform Visualizer:** Displays real-time input and output audio waveform graphs side-by-side in a single row for compact, effective monitoring.
|
||||
* **📱 Collapsible Sidebar:** Optimized UI layout with a smooth collapsible sidebar for managing settings.
|
||||
* **🎛️ Interactive DSP Controls:** Easily adjust input/output gain staging, active 80Hz low-cut filters, noise gates, and pitch transposition.
|
||||
* **⚙️ Dual Routing Toggle:** Switch between browser-side routing (Web Audio API) and server-side hardware routing (`sounddevice`).
|
||||
|
||||
---
|
||||
|
||||
## 🚀 Getting Started
|
||||
|
||||
### 📋 Prerequisites
|
||||
* **Node.js 18+**
|
||||
* **npm**, **yarn**, or **pnpm** package manager
|
||||
|
||||
### 📦 Installation
|
||||
Navigate to this directory and install the project dependencies:
|
||||
```bash
|
||||
npm install
|
||||
```
|
||||
|
||||
### 🏃 Running the Development Server
|
||||
Start the Next.js local development server:
|
||||
```bash
|
||||
npm run dev
|
||||
```
|
||||
Open **[http://localhost:3000](http://localhost:3000)** in your web browser.
|
||||
|
||||
### 🏗️ Building for Production
|
||||
To build the application for optimized production performance:
|
||||
```bash
|
||||
npm run build
|
||||
npm run start
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 🔌 WebSocket Connection
|
||||
The frontend connects to the Python WebSocket backend to stream binary audio chunks and receive converted audio.
|
||||
* **Default backend URL:** `ws://127.0.0.1:8765`
|
||||
* Ensure your backend `server.py` is running before starting voice conversion from the dashboard.
|
||||
|
||||
---
|
||||
|
||||
## 🤝 Credits & Acknowledgements
|
||||
* **Made with ❤️ by [Kanara Technology](https://github.com/kanaratechnologyindonesia)** (Mirror: [git.kanara.tech](https://git.kanara.tech/kanara))
|
||||
* Powered by [Next.js](https://nextjs.org/) and [Tailwind CSS](https://tailwindcss.com/)
|
||||
-744
@@ -1,744 +0,0 @@
|
||||
/**
|
||||
* Omni Real-Time Voice Changer - Client App
|
||||
* High-performance browser-based mic streaming and RVC playback.
|
||||
*/
|
||||
|
||||
// UI Elements
|
||||
const wsUrlInput = document.getElementById('ws_url');
|
||||
const connectionStatus = document.getElementById('connection_status');
|
||||
const connectBtn = document.getElementById('connect_btn');
|
||||
const streamBtn = document.getElementById('stream_btn');
|
||||
const playToggleBtn = document.getElementById('play_toggle_btn');
|
||||
|
||||
const modelSelect = document.getElementById('model_select');
|
||||
const deviceSelect = document.getElementById('device_select');
|
||||
const transposeSlider = document.getElementById('transpose_slider');
|
||||
const transposeVal = document.getElementById('transpose_val');
|
||||
const gateSlider = document.getElementById('gate_slider');
|
||||
const gateVal = document.getElementById('gate_val');
|
||||
const inputGainSlider = document.getElementById('input_gain_slider');
|
||||
const inputGainVal = document.getElementById('input_gain_val');
|
||||
const outputGainSlider = document.getElementById('output_gain_slider');
|
||||
const outputGainVal = document.getElementById('output_gain_val');
|
||||
const chunkSelect = document.getElementById('chunk_select');
|
||||
const noiseCancelCheckbox = document.getElementById('noise_cancel_checkbox');
|
||||
const routingModeSelect = document.getElementById('routing_mode_select');
|
||||
const hardwareDevicesPanel = document.getElementById('hardware_devices_panel');
|
||||
const serverInputSelect = document.getElementById('server_input_select');
|
||||
const serverOutputSelect = document.getElementById('server_output_select');
|
||||
const browserNoiseCancelGroup = document.getElementById('browser_noise_cancel_group');
|
||||
|
||||
const presetLatencyBtn = document.getElementById('preset_latency_btn');
|
||||
const presetQualityBtn = document.getElementById('preset_quality_btn');
|
||||
|
||||
const inputCanvas = document.getElementById('input_canvas');
|
||||
const outputCanvas = document.getElementById('output_canvas');
|
||||
|
||||
const hudLatency = document.getElementById('hud_latency');
|
||||
const hudTime = document.getElementById('hud_time');
|
||||
const hudGateStatus = document.getElementById('hud_gate_status');
|
||||
const hudSr = document.getElementById('hud_sr');
|
||||
|
||||
// Audio Visualizer Contexts
|
||||
const inputCtx = inputCanvas.getContext('2d');
|
||||
const outputCtx = outputCanvas.getContext('2d');
|
||||
|
||||
// Web Audio State
|
||||
let audioContext = null;
|
||||
let micStream = null;
|
||||
let micSourceNode = null;
|
||||
let scriptProcessorNode = null;
|
||||
let micAccumulator = new Float32Array(0); // Accumulates audio for large/custom chunk sizes
|
||||
|
||||
// WebSocket State
|
||||
let socket = null;
|
||||
let isStreaming = false;
|
||||
let playOutput = true;
|
||||
let targetSampleRate = 40000; // RVC Model default, updated dynamically
|
||||
|
||||
// Playback Sync State
|
||||
let nextPlaybackTime = 0;
|
||||
const safetyDelay = 0.10; // 100ms buffer to absorb network/websocket jitter (increased for perfect smoothness!)
|
||||
|
||||
// Latency Tracking Queues
|
||||
let sentTimestamps = [];
|
||||
const maxSentLogs = 50;
|
||||
|
||||
// --- SMOOTH VISUALIZER (Rolling Display Buffers + RAF loop) ---
|
||||
// Fixed display buffer size: ~85ms window looks great at all chunk sizes.
|
||||
const VIS_DISPLAY_SIZE = 4096;
|
||||
let inputDisplayBuf = new Float32Array(VIS_DISPLAY_SIZE); // rolling input (updated ~85ms)
|
||||
let outputDisplayBuf = new Float32Array(VIS_DISPLAY_SIZE); // fallback for hardware mode
|
||||
let rafHandle = null;
|
||||
|
||||
// Time-synced output queue: each entry = { data: Float32Array, startTime: number (audioCtx seconds) }
|
||||
let outputChunkQueue = [];
|
||||
|
||||
function pushToDisplayBuf(displayBuf, newSamples) {
|
||||
if (newSamples.length >= VIS_DISPLAY_SIZE) {
|
||||
displayBuf.set(newSamples.slice(newSamples.length - VIS_DISPLAY_SIZE));
|
||||
} else {
|
||||
displayBuf.copyWithin(0, newSamples.length);
|
||||
displayBuf.set(newSamples, VIS_DISPLAY_SIZE - newSamples.length);
|
||||
}
|
||||
}
|
||||
|
||||
// Build a VIS_DISPLAY_SIZE window of output samples ending at audioContext.currentTime
|
||||
function buildTimeSyncedOutputBuf() {
|
||||
if (!audioContext || outputChunkQueue.length === 0) return outputDisplayBuf;
|
||||
|
||||
const now = audioContext.currentTime;
|
||||
const windowDuration = VIS_DISPLAY_SIZE / targetSampleRate;
|
||||
const windowStart = now - windowDuration;
|
||||
|
||||
// Drop chunks that ended before our window start
|
||||
while (outputChunkQueue.length > 0) {
|
||||
const c = outputChunkQueue[0];
|
||||
if (c.startTime + c.data.length / targetSampleRate < windowStart) {
|
||||
outputChunkQueue.shift();
|
||||
} else break;
|
||||
}
|
||||
|
||||
const out = new Float32Array(VIS_DISPLAY_SIZE);
|
||||
for (const chunk of outputChunkQueue) {
|
||||
const chunkEnd = chunk.startTime + chunk.data.length / targetSampleRate;
|
||||
// Overlap between [windowStart, now] and [chunk.startTime, chunkEnd]
|
||||
const overlapStart = Math.max(windowStart, chunk.startTime);
|
||||
const overlapEnd = Math.min(now, chunkEnd);
|
||||
if (overlapStart >= overlapEnd) continue;
|
||||
|
||||
const srcOffset = Math.floor((overlapStart - chunk.startTime) * targetSampleRate);
|
||||
const destOffset = Math.floor((overlapStart - windowStart) * targetSampleRate);
|
||||
const count = Math.floor((overlapEnd - overlapStart) * targetSampleRate);
|
||||
const safeCount = Math.min(count,
|
||||
chunk.data.length - srcOffset,
|
||||
VIS_DISPLAY_SIZE - destOffset);
|
||||
if (safeCount > 0) out.set(chunk.data.subarray(srcOffset, srcOffset + safeCount), destOffset);
|
||||
}
|
||||
return out;
|
||||
}
|
||||
|
||||
function startVisualizerLoop() {
|
||||
if (rafHandle) return;
|
||||
function frame() {
|
||||
drawWaveform(inputDisplayBuf, inputCanvas, '#6366f1');
|
||||
// Time-synced output: scrub through queued chunks using audioContext clock
|
||||
drawWaveform(buildTimeSyncedOutputBuf(), outputCanvas, '#a855f7');
|
||||
rafHandle = requestAnimationFrame(frame);
|
||||
}
|
||||
rafHandle = requestAnimationFrame(frame);
|
||||
}
|
||||
|
||||
function stopVisualizerLoop() {
|
||||
if (rafHandle) {
|
||||
cancelAnimationFrame(rafHandle);
|
||||
rafHandle = null;
|
||||
}
|
||||
outputChunkQueue = [];
|
||||
}
|
||||
|
||||
// Setup Canvas Sizes dynamically
|
||||
function resizeCanvases() {
|
||||
inputCanvas.width = inputCanvas.clientWidth * window.devicePixelRatio;
|
||||
inputCanvas.height = inputCanvas.clientHeight * window.devicePixelRatio;
|
||||
outputCanvas.width = outputCanvas.clientWidth * window.devicePixelRatio;
|
||||
outputCanvas.height = outputCanvas.clientHeight * window.devicePixelRatio;
|
||||
}
|
||||
resizeCanvases();
|
||||
window.addEventListener('resize', resizeCanvases);
|
||||
|
||||
// Connect / Disconnect WebSocket
|
||||
connectBtn.addEventListener('click', () => {
|
||||
if (socket && (socket.readyState === WebSocket.OPEN || socket.readyState === WebSocket.CONNECTING)) {
|
||||
disconnectServer();
|
||||
} else {
|
||||
connectServer();
|
||||
}
|
||||
});
|
||||
|
||||
function connectServer() {
|
||||
const url = wsUrlInput.value.trim();
|
||||
updateConnectionStatus('connecting');
|
||||
|
||||
try {
|
||||
socket = new WebSocket(url);
|
||||
socket.binaryType = 'arraybuffer';
|
||||
|
||||
socket.onopen = () => {
|
||||
console.log('Connected to RVC Server');
|
||||
updateConnectionStatus('connected');
|
||||
sendConfigToServer(); // Send initial configurations
|
||||
streamBtn.disabled = false;
|
||||
playToggleBtn.disabled = false;
|
||||
};
|
||||
|
||||
socket.onclose = () => {
|
||||
console.log('WebSocket Connection Closed');
|
||||
disconnectServer();
|
||||
};
|
||||
|
||||
socket.onerror = (err) => {
|
||||
console.error('WebSocket Error:', err);
|
||||
disconnectServer();
|
||||
};
|
||||
|
||||
socket.onmessage = (event) => {
|
||||
if (typeof event.data === 'string') {
|
||||
// Config or control response
|
||||
try {
|
||||
const response = JSON.parse(event.data);
|
||||
if (response.type === 'config_success') {
|
||||
targetSampleRate = response.target_sr;
|
||||
console.log('Server configuration synced successfully:', response);
|
||||
} else if (response.type === 'init_devices') {
|
||||
populateServerDevices(response.devices, response.default_input, response.default_output);
|
||||
} else if (response.type === 'visualizer') {
|
||||
// Feed rolling display buffers — RAF loop handles drawing at 60fps
|
||||
pushToDisplayBuf(inputDisplayBuf, new Float32Array(response.input));
|
||||
pushToDisplayBuf(outputDisplayBuf, new Float32Array(response.output));
|
||||
if (!rafHandle) startVisualizerLoop();
|
||||
} else if (response.type === 'error') {
|
||||
alert('Server Error: ' + response.message);
|
||||
}
|
||||
} catch (e) {
|
||||
console.error('Error parsing text message:', e);
|
||||
}
|
||||
} else if (event.data instanceof ArrayBuffer) {
|
||||
// Binary processed PCM audio chunk returned from server (Browser Mode only)
|
||||
handleServerAudioChunk(event.data);
|
||||
}
|
||||
};
|
||||
|
||||
} catch (e) {
|
||||
console.error('Connection failed:', e);
|
||||
disconnectServer();
|
||||
}
|
||||
}
|
||||
|
||||
function disconnectServer() {
|
||||
if (isStreaming) {
|
||||
stopStreaming();
|
||||
}
|
||||
|
||||
if (socket) {
|
||||
try {
|
||||
socket.close();
|
||||
} catch (e) {}
|
||||
socket = null;
|
||||
}
|
||||
|
||||
updateConnectionStatus('disconnected');
|
||||
streamBtn.disabled = true;
|
||||
playToggleBtn.disabled = true;
|
||||
}
|
||||
|
||||
function updateConnectionStatus(status) {
|
||||
connectionStatus.className = 'status-badge ' + status;
|
||||
if (status === 'connected') {
|
||||
connectionStatus.textContent = 'Terhubung';
|
||||
connectBtn.textContent = 'Putuskan Server';
|
||||
connectBtn.className = 'btn btn-primary';
|
||||
} else if (status === 'connecting') {
|
||||
connectionStatus.textContent = 'Menghubungkan';
|
||||
connectBtn.textContent = 'Batal';
|
||||
} else {
|
||||
connectionStatus.textContent = 'Terputus';
|
||||
connectBtn.textContent = 'Hubungkan Server';
|
||||
connectBtn.className = 'btn btn-primary';
|
||||
}
|
||||
}
|
||||
|
||||
// Config synchronization
|
||||
function sendConfigToServer() {
|
||||
if (!socket || socket.readyState !== WebSocket.OPEN) return;
|
||||
|
||||
const activeF0 = document.querySelector('input[name="f0_method"]:checked').value;
|
||||
|
||||
const config = {
|
||||
type: 'config',
|
||||
model_name: modelSelect.value,
|
||||
device: deviceSelect.value,
|
||||
f0_method: activeF0,
|
||||
f0_up_key: parseInt(transposeSlider.value),
|
||||
noise_gate: parseFloat(gateSlider.value),
|
||||
input_gain: parseFloat(inputGainSlider.value),
|
||||
output_gain: parseFloat(outputGainSlider.value),
|
||||
input_sr: audioContext ? audioContext.sampleRate : 44100,
|
||||
routing_mode: routingModeSelect.value,
|
||||
input_device: serverInputSelect.value ? parseInt(serverInputSelect.value) : null,
|
||||
output_device: serverOutputSelect.value ? parseInt(serverOutputSelect.value) : null,
|
||||
chunk_size: parseInt(chunkSelect.value)
|
||||
};
|
||||
|
||||
socket.send(jsonEncode(config));
|
||||
console.log('Sent configuration change:', config);
|
||||
}
|
||||
|
||||
// Populate Server Audio Devices dropdowns
|
||||
function populateServerDevices(devices, defaultInput, defaultOutput) {
|
||||
serverInputSelect.innerHTML = '';
|
||||
serverOutputSelect.innerHTML = '';
|
||||
|
||||
if (devices.length === 0) {
|
||||
const optIn = document.createElement('option');
|
||||
optIn.textContent = 'Tidak ada mic terdeteksi di server';
|
||||
serverInputSelect.appendChild(optIn);
|
||||
|
||||
const optOut = document.createElement('option');
|
||||
optOut.textContent = 'Tidak ada output terdeteksi di server';
|
||||
serverOutputSelect.appendChild(optOut);
|
||||
return;
|
||||
}
|
||||
|
||||
devices.forEach(device => {
|
||||
if (device.max_input_channels > 0) {
|
||||
const opt = document.createElement('option');
|
||||
opt.value = device.id;
|
||||
opt.textContent = `[ID ${device.id}] ${device.name}`;
|
||||
if (device.id === defaultInput) opt.selected = true;
|
||||
serverInputSelect.appendChild(opt);
|
||||
}
|
||||
|
||||
if (device.max_output_channels > 0) {
|
||||
const opt = document.createElement('option');
|
||||
opt.value = device.id;
|
||||
opt.textContent = `[ID ${device.id}] ${device.name}`;
|
||||
if (device.id === defaultOutput) opt.selected = true;
|
||||
serverOutputSelect.appendChild(opt);
|
||||
}
|
||||
});
|
||||
|
||||
console.log('Successfully populated server hardware devices in UI.');
|
||||
}
|
||||
|
||||
// UI Event Listeners to trigger instant sync
|
||||
modelSelect.addEventListener('change', sendConfigToServer);
|
||||
deviceSelect.addEventListener('change', sendConfigToServer);
|
||||
document.querySelectorAll('input[name="f0_method"]').forEach(radio => {
|
||||
radio.addEventListener('change', sendConfigToServer);
|
||||
});
|
||||
|
||||
transposeSlider.addEventListener('input', () => {
|
||||
transposeVal.textContent = (transposeSlider.value >= 0 ? '+' : '') + transposeSlider.value + ' semitone';
|
||||
});
|
||||
transposeSlider.addEventListener('change', sendConfigToServer);
|
||||
|
||||
gateSlider.addEventListener('input', () => {
|
||||
gateVal.textContent = gateSlider.value + ' dB';
|
||||
});
|
||||
gateSlider.addEventListener('change', sendConfigToServer);
|
||||
|
||||
inputGainSlider.addEventListener('input', () => {
|
||||
inputGainVal.textContent = parseFloat(inputGainSlider.value).toFixed(1) + 'x';
|
||||
});
|
||||
inputGainSlider.addEventListener('change', sendConfigToServer);
|
||||
|
||||
outputGainSlider.addEventListener('input', () => {
|
||||
outputGainVal.textContent = parseFloat(outputGainSlider.value).toFixed(1) + 'x';
|
||||
});
|
||||
outputGainSlider.addEventListener('change', sendConfigToServer);
|
||||
|
||||
chunkSelect.addEventListener('change', () => {
|
||||
// Reinitialize stream if buffer size is changed during active streaming
|
||||
if (isStreaming) {
|
||||
stopStreaming();
|
||||
startStreaming();
|
||||
}
|
||||
});
|
||||
|
||||
noiseCancelCheckbox.addEventListener('change', () => {
|
||||
// Reinitialize microphone with new noise cancellation constraints if streaming
|
||||
if (isStreaming) {
|
||||
stopStreaming();
|
||||
startStreaming();
|
||||
}
|
||||
});
|
||||
|
||||
// Helper to dynamically adjust UI layout based on Routing Mode
|
||||
function applyAudioRoutingUI() {
|
||||
if (routingModeSelect.value === 'hardware') {
|
||||
hardwareDevicesPanel.style.display = 'block';
|
||||
playToggleBtn.style.display = 'none'; // Hide browser-only "Mendengarkan" button
|
||||
browserNoiseCancelGroup.style.display = 'none'; // Hide browser-only Noise Cancel checkbox
|
||||
} else {
|
||||
hardwareDevicesPanel.style.display = 'none';
|
||||
playToggleBtn.style.display = 'inline-block'; // Show browser-only "Mendengarkan" button
|
||||
browserNoiseCancelGroup.style.display = 'block'; // Show browser-only Noise Cancel checkbox
|
||||
}
|
||||
}
|
||||
|
||||
// Routing Mode Event Listeners
|
||||
routingModeSelect.addEventListener('change', () => {
|
||||
applyAudioRoutingUI();
|
||||
sendConfigToServer();
|
||||
|
||||
if (isStreaming) {
|
||||
stopStreaming();
|
||||
startStreaming();
|
||||
}
|
||||
});
|
||||
|
||||
serverInputSelect.addEventListener('change', sendConfigToServer);
|
||||
serverOutputSelect.addEventListener('change', sendConfigToServer);
|
||||
|
||||
// Quick Presets Event Listeners
|
||||
presetLatencyBtn.addEventListener('click', () => {
|
||||
const radioPM = document.querySelector('input[name="f0_method"][value="pm"]');
|
||||
if (radioPM) radioPM.checked = true;
|
||||
chunkSelect.value = "8192";
|
||||
|
||||
console.log("Preset loaded: Latency (PM + 8192)");
|
||||
sendConfigToServer();
|
||||
|
||||
if (isStreaming) {
|
||||
stopStreaming();
|
||||
startStreaming();
|
||||
}
|
||||
});
|
||||
|
||||
presetQualityBtn.addEventListener('click', () => {
|
||||
const radioRMVPE = document.querySelector('input[name="f0_method"][value="rmvpe"]');
|
||||
if (radioRMVPE) radioRMVPE.checked = true;
|
||||
chunkSelect.value = "16384";
|
||||
|
||||
console.log("Preset loaded: Quality (RMVPE + 16384)");
|
||||
sendConfigToServer();
|
||||
|
||||
if (isStreaming) {
|
||||
stopStreaming();
|
||||
startStreaming();
|
||||
}
|
||||
});
|
||||
|
||||
// Helper functions for UI JSON safely
|
||||
function jsonEncode(obj) {
|
||||
return JSON.stringify(obj);
|
||||
}
|
||||
|
||||
playToggleBtn.addEventListener('click', () => {
|
||||
playOutput = !playOutput;
|
||||
if (playOutput) {
|
||||
playToggleBtn.textContent = '🔊 Mendengarkan: AKTIF';
|
||||
playToggleBtn.className = 'btn btn-primary';
|
||||
} else {
|
||||
playToggleBtn.textContent = '🔇 Mendengarkan: SENYAP';
|
||||
playToggleBtn.className = 'btn btn-accent';
|
||||
}
|
||||
});
|
||||
|
||||
// Stream Toggle
|
||||
streamBtn.addEventListener('click', () => {
|
||||
if (isStreaming) {
|
||||
stopStreaming();
|
||||
} else {
|
||||
startStreaming();
|
||||
}
|
||||
});
|
||||
|
||||
async function startStreaming() {
|
||||
isStreaming = true;
|
||||
streamBtn.textContent = 'Hentikan Pengubah Suara';
|
||||
streamBtn.className = 'btn btn-primary';
|
||||
|
||||
const isHardwareMode = (routingModeSelect.value === 'hardware');
|
||||
|
||||
if (isHardwareMode) {
|
||||
// --- SERVER HARDWARE ROUTING MODE ---
|
||||
inputDisplayBuf = new Float32Array(VIS_DISPLAY_SIZE);
|
||||
outputDisplayBuf = new Float32Array(VIS_DISPLAY_SIZE);
|
||||
startVisualizerLoop();
|
||||
sendConfigToServer(); // Sends config with routing_mode: 'hardware' which triggers stream start on server
|
||||
console.log('Server Hardware Mode initialized.');
|
||||
return;
|
||||
}
|
||||
|
||||
// --- CLIENT BROWSER MODE ---
|
||||
// 1. Create AudioContext if not active
|
||||
if (!audioContext) {
|
||||
audioContext = new (window.AudioContext || window.webkitAudioContext)({
|
||||
latencyHint: 'interactive'
|
||||
});
|
||||
}
|
||||
|
||||
if (audioContext.state === 'suspended') {
|
||||
await audioContext.resume();
|
||||
}
|
||||
|
||||
hudSr.textContent = audioContext.sampleRate + ' Hz';
|
||||
sendConfigToServer(); // sync actual input sample rate
|
||||
|
||||
// 2. Request user microphone with high-fidelity, lowest possible latency constraints
|
||||
try {
|
||||
const useNoiseCancel = noiseCancelCheckbox.checked;
|
||||
micStream = await navigator.mediaDevices.getUserMedia({
|
||||
audio: {
|
||||
echoCancellation: useNoiseCancel,
|
||||
noiseSuppression: useNoiseCancel,
|
||||
autoGainControl: useNoiseCancel
|
||||
}
|
||||
});
|
||||
|
||||
micSourceNode = audioContext.createMediaStreamSource(micStream);
|
||||
|
||||
// 3. Create Audio Processing Loop Node (ScriptProcessorNode)
|
||||
// BaseAudioContext's createScriptProcessor buffer size MUST be a power of two between 256 and 16384.
|
||||
// We use a fixed, highly supported buffer size of 4096 for recording, and accumulate samples in-memory
|
||||
// to support ANY arbitrary or extremely large chunk size (like 12288, 24576, 32768) selected by the user!
|
||||
const recordBufferSize = 4096;
|
||||
scriptProcessorNode = audioContext.createScriptProcessor(recordBufferSize, 1, 1);
|
||||
|
||||
scriptProcessorNode.onaudioprocess = (event) => {
|
||||
if (!isStreaming) return;
|
||||
|
||||
const inputBuffer = event.inputBuffer;
|
||||
const inputData = inputBuffer.getChannelData(0); // 4096 samples
|
||||
|
||||
// Push latest mic samples into the rolling display buffer every callback (~85ms)
|
||||
pushToDisplayBuf(inputDisplayBuf, inputData);
|
||||
|
||||
// Append incoming recorded samples to our accumulator
|
||||
const temp = new Float32Array(micAccumulator.length + inputData.length);
|
||||
temp.set(micAccumulator);
|
||||
temp.set(inputData, micAccumulator.length);
|
||||
micAccumulator = temp;
|
||||
|
||||
const targetChunkSize = parseInt(chunkSelect.value);
|
||||
|
||||
// Process and send chunks of the user's selected target size
|
||||
while (micAccumulator.length >= targetChunkSize) {
|
||||
const chunkToSend = micAccumulator.slice(0, targetChunkSize);
|
||||
micAccumulator = micAccumulator.slice(targetChunkSize); // Keep remainder
|
||||
|
||||
// Voice Activity Detection for gate status badge
|
||||
let maxVal = 0;
|
||||
for (let i = 0; i < chunkToSend.length; i++) maxVal = Math.max(maxVal, Math.abs(chunkToSend[i]));
|
||||
if (maxVal > 0.005) {
|
||||
hudGateStatus.textContent = 'Bicara';
|
||||
hudGateStatus.className = 'hud-value active-badge';
|
||||
} else {
|
||||
hudGateStatus.textContent = 'Berdiam';
|
||||
hudGateStatus.className = 'hud-value text-muted';
|
||||
}
|
||||
|
||||
// Send binary PCM Float32 audio chunk of target size to Python Server
|
||||
if (socket && socket.readyState === WebSocket.OPEN) {
|
||||
const packetTime = performance.now();
|
||||
sentTimestamps.push({ id: packetTime, sent: packetTime });
|
||||
if (sentTimestamps.length > maxSentLogs) {
|
||||
sentTimestamps.shift();
|
||||
}
|
||||
|
||||
socket.send(chunkToSend.buffer); // Send direct array buffer
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
micSourceNode.connect(scriptProcessorNode);
|
||||
scriptProcessorNode.connect(audioContext.destination); // Required to trigger onaudioprocess
|
||||
|
||||
// Reset playback sync clock
|
||||
nextPlaybackTime = 0;
|
||||
micAccumulator = new Float32Array(0); // Reset accumulator
|
||||
inputDisplayBuf = new Float32Array(VIS_DISPLAY_SIZE);
|
||||
outputDisplayBuf = new Float32Array(VIS_DISPLAY_SIZE);
|
||||
startVisualizerLoop();
|
||||
|
||||
console.log('Browser Streaming active. Recording buffer size: 4096 | Target chunk size:', chunkSelect.value);
|
||||
} catch (e) {
|
||||
console.error('Failed to access microphone:', e);
|
||||
alert('Gagal mengakses mikrofon Anda: ' + e.message);
|
||||
stopStreaming();
|
||||
}
|
||||
}
|
||||
|
||||
function stopStreaming() {
|
||||
isStreaming = false;
|
||||
streamBtn.textContent = 'Mulai Mengubah Suara';
|
||||
streamBtn.className = 'btn btn-accent';
|
||||
|
||||
playOutput = true;
|
||||
playToggleBtn.textContent = '🔊 Mendengarkan: AKTIF';
|
||||
playToggleBtn.className = 'btn btn-primary';
|
||||
|
||||
const isHardwareMode = (routingModeSelect.value === 'hardware');
|
||||
|
||||
if (isHardwareMode) {
|
||||
// --- SERVER HARDWARE ROUTING MODE ---
|
||||
if (socket && socket.readyState === WebSocket.OPEN) {
|
||||
const config = {
|
||||
type: 'config',
|
||||
routing_mode: 'browser' // Tells server to stop local hardware stream
|
||||
};
|
||||
socket.send(jsonEncode(config));
|
||||
}
|
||||
console.log('Server Hardware Mode stopped.');
|
||||
|
||||
hudGateStatus.textContent = 'Berdiam';
|
||||
hudGateStatus.className = 'hud-value text-muted';
|
||||
hudLatency.textContent = '-- ms';
|
||||
hudTime.textContent = '-- ms';
|
||||
|
||||
stopVisualizerLoop();
|
||||
inputDisplayBuf = new Float32Array(VIS_DISPLAY_SIZE);
|
||||
outputDisplayBuf = new Float32Array(VIS_DISPLAY_SIZE);
|
||||
clearCanvas(inputCanvas);
|
||||
clearCanvas(outputCanvas);
|
||||
return;
|
||||
}
|
||||
|
||||
// --- CLIENT BROWSER MODE ---
|
||||
// Stop microphone stream tracks
|
||||
if (micStream) {
|
||||
micStream.getTracks().forEach(track => track.stop());
|
||||
micStream = null;
|
||||
}
|
||||
|
||||
// Disconnect Web Audio nodes
|
||||
if (micSourceNode) {
|
||||
micSourceNode.disconnect();
|
||||
micSourceNode = null;
|
||||
}
|
||||
if (scriptProcessorNode) {
|
||||
scriptProcessorNode.disconnect();
|
||||
scriptProcessorNode = null;
|
||||
}
|
||||
|
||||
micAccumulator = new Float32Array(0); // Reset accumulator
|
||||
|
||||
stopVisualizerLoop();
|
||||
inputDisplayBuf = new Float32Array(VIS_DISPLAY_SIZE);
|
||||
outputDisplayBuf = new Float32Array(VIS_DISPLAY_SIZE);
|
||||
|
||||
hudGateStatus.textContent = 'Berdiam';
|
||||
hudGateStatus.className = 'hud-value text-muted';
|
||||
hudLatency.textContent = '-- ms';
|
||||
hudTime.textContent = '-- ms';
|
||||
|
||||
clearCanvas(inputCanvas);
|
||||
clearCanvas(outputCanvas);
|
||||
}
|
||||
|
||||
// Seamless Audio Playback Scheduler (Absorbs WebSocket & processing jitter)
|
||||
function handleServerAudioChunk(arrayBuffer) {
|
||||
if (!isStreaming) return;
|
||||
|
||||
// 1. Measure Round-Trip Time Latency (RTT)
|
||||
const now = performance.now();
|
||||
let rtt = 0;
|
||||
if (sentTimestamps.length > 0) {
|
||||
const oldestSent = sentTimestamps.shift();
|
||||
rtt = now - oldestSent.sent;
|
||||
hudLatency.textContent = Math.round(rtt) + ' ms';
|
||||
}
|
||||
|
||||
// Convert arrayBuffer to Float32 samples
|
||||
const payload = new Float32Array(arrayBuffer);
|
||||
const processingTime = payload[0]; // first float32 is the server processing time in ms
|
||||
const pcmData = payload.subarray(1); // the rest is the audio
|
||||
|
||||
// 2. Schedule chunk smoothly inside the AudioContext timeline
|
||||
const audioBuf = audioContext.createBuffer(1, pcmData.length, targetSampleRate);
|
||||
audioBuf.getChannelData(0).set(pcmData);
|
||||
|
||||
const source = audioContext.createBufferSource();
|
||||
source.buffer = audioBuf;
|
||||
|
||||
if (playOutput) {
|
||||
source.connect(audioContext.destination);
|
||||
}
|
||||
|
||||
// Calculate precise playback clock scheduling
|
||||
const currentTime = audioContext.currentTime;
|
||||
const chunkDuration = audioBuf.duration; // actual chunk duration in seconds
|
||||
// Adaptive buffer: enough headroom so next chunk always arrives before this one ends.
|
||||
// 2.5× chunk or 500ms cap — absorbs even 300ms+ processing spikes.
|
||||
const adaptiveBuf = Math.min(chunkDuration * 2.5, 0.50);
|
||||
|
||||
if (nextPlaybackTime < currentTime) {
|
||||
// Clock behind — first chunk or dropout recovery.
|
||||
// Use full adaptiveBuf on BOTH cases so recovery fully rebuilds headroom.
|
||||
// (0.5× recovery was causing cascading dropouts: one late chunk → the next also late)
|
||||
nextPlaybackTime = currentTime + adaptiveBuf;
|
||||
} else if (nextPlaybackTime > currentTime + chunkDuration * 5.0) {
|
||||
// --- ADAPTIVE LATENCY BUSTER ---
|
||||
// Only snap when queue is >5 chunk-durations ahead (genuine backlog, not normal look-ahead).
|
||||
// At 8192 (170ms): threshold = 850ms
|
||||
// At 65536 (1.6s): threshold = 8s
|
||||
const snapTarget = currentTime + adaptiveBuf;
|
||||
console.log(`Latency Buster: ${Math.round((nextPlaybackTime-currentTime)*1000)}ms → ${Math.round(adaptiveBuf*1000)}ms`);
|
||||
nextPlaybackTime = snapTarget;
|
||||
}
|
||||
|
||||
// Record schedule start time BEFORE advancing the clock (for time-synced visualizer)
|
||||
const scheduleStartTime = nextPlaybackTime;
|
||||
|
||||
// Schedule play
|
||||
source.start(nextPlaybackTime);
|
||||
|
||||
hudTime.textContent = Math.max(0, Math.round(processingTime)) + ' ms';
|
||||
|
||||
// Advance playback sync clock
|
||||
nextPlaybackTime += audioBuf.duration;
|
||||
|
||||
// Push to time-synced output queue for visualizer (keyed by when audio actually plays)
|
||||
outputChunkQueue.push({ data: pcmData, startTime: scheduleStartTime });
|
||||
// Keep queue bounded to ~10 seconds of audio max
|
||||
while (outputChunkQueue.length > 0) {
|
||||
const c = outputChunkQueue[0];
|
||||
if (c.startTime + c.data.length / targetSampleRate < audioContext.currentTime - 2.0) {
|
||||
outputChunkQueue.shift();
|
||||
} else break;
|
||||
}
|
||||
}
|
||||
|
||||
// --- VISUALIZATION / DRAWING ROUTINES ---
|
||||
function drawWaveform(dataArray, canvas, strokeColor) {
|
||||
const ctx = canvas.getContext('2d');
|
||||
const width = canvas.width;
|
||||
const height = canvas.height;
|
||||
|
||||
// Dark transparent redraw for trace/motion-blur effect
|
||||
ctx.fillStyle = 'rgba(11, 12, 19, 0.4)';
|
||||
ctx.fillRect(0, 0, width, height);
|
||||
|
||||
ctx.lineWidth = 2 * window.devicePixelRatio;
|
||||
ctx.strokeStyle = strokeColor;
|
||||
ctx.beginPath();
|
||||
|
||||
const sliceWidth = width / dataArray.length;
|
||||
let x = 0;
|
||||
|
||||
for (let i = 0; i < dataArray.length; i++) {
|
||||
// Center the wave around half-height and scale scale amplitude
|
||||
const v = dataArray[i] * 1.5;
|
||||
const y = (v * (height / 2)) + (height / 2);
|
||||
|
||||
if (i === 0) {
|
||||
ctx.moveTo(x, y);
|
||||
} else {
|
||||
ctx.lineTo(x, y);
|
||||
}
|
||||
|
||||
x += sliceWidth;
|
||||
}
|
||||
|
||||
ctx.lineTo(width, height / 2);
|
||||
ctx.stroke();
|
||||
|
||||
// Draw a subtle baseline center glowing path
|
||||
ctx.strokeStyle = 'rgba(255, 255, 255, 0.05)';
|
||||
ctx.lineWidth = 1;
|
||||
ctx.beginPath();
|
||||
ctx.moveTo(0, height / 2);
|
||||
ctx.lineTo(width, height / 2);
|
||||
ctx.stroke();
|
||||
}
|
||||
|
||||
function clearCanvas(canvas) {
|
||||
const ctx = canvas.getContext('2d');
|
||||
ctx.fillStyle = '#0b0c13';
|
||||
ctx.fillRect(0, 0, canvas.width, canvas.height);
|
||||
}
|
||||
|
||||
// Apply initial UI layout on startup
|
||||
applyAudioRoutingUI();
|
||||
@@ -0,0 +1,18 @@
|
||||
import { defineConfig, globalIgnores } from "eslint/config";
|
||||
import nextVitals from "eslint-config-next/core-web-vitals";
|
||||
import nextTs from "eslint-config-next/typescript";
|
||||
|
||||
const eslintConfig = defineConfig([
|
||||
...nextVitals,
|
||||
...nextTs,
|
||||
// Override default ignores of eslint-config-next.
|
||||
globalIgnores([
|
||||
// Default ignores of eslint-config-next:
|
||||
".next/**",
|
||||
"out/**",
|
||||
"build/**",
|
||||
"next-env.d.ts",
|
||||
]),
|
||||
]);
|
||||
|
||||
export default eslintConfig;
|
||||
@@ -1,243 +0,0 @@
|
||||
<!DOCTYPE html>
|
||||
<html lang="id">
|
||||
<head>
|
||||
<meta charset="UTF-8">
|
||||
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||
<meta name="description" content="Omni Real-time Voice Changer - Pengubah suara real-time berbasis AI berlatensi sangat rendah dengan ONNX Runtime.">
|
||||
<title>🎙️ Omni Real-Time Voice Changer - High-Performance AI Audio</title>
|
||||
|
||||
<!-- Modern Typography: Inter & Outfit from Google Fonts -->
|
||||
<link rel="preconnect" href="https://fonts.googleapis.com">
|
||||
<link rel="preconnect" href="https://fonts.gstatic.com" crossorigin>
|
||||
<link href="https://fonts.googleapis.com/css2?family=Inter:wght@300;400;500;600;700&family=Outfit:wght@400;600;800&display=swap" rel="stylesheet">
|
||||
|
||||
<!-- Link to premium Vanilla CSS -->
|
||||
<link rel="stylesheet" href="styles.css">
|
||||
</head>
|
||||
<body>
|
||||
<div class="glow-backdrop"></div>
|
||||
|
||||
<div class="dashboard-container">
|
||||
<!-- HEADER -->
|
||||
<header class="app-header">
|
||||
<div class="logo-area">
|
||||
<span class="pulse-indicator active"></span>
|
||||
<h1>🎙️ OMNI VOICE CHANGER</h1>
|
||||
</div>
|
||||
<p class="tagline">Pengubah Suara Real-Time AI Berlatensi Ultra Rendah menggunakan Akselerasi ONNX Runtime</p>
|
||||
</header>
|
||||
|
||||
<!-- CONNECTION BAR -->
|
||||
<div class="connection-bar card glassmorphism">
|
||||
<div class="form-row">
|
||||
<div class="input-group">
|
||||
<label for="ws_url">URL Server WebSocket</label>
|
||||
<input type="text" id="ws_url" value="ws://127.0.0.1:8765" placeholder="ws://localhost:8765">
|
||||
</div>
|
||||
<div class="connection-status-container">
|
||||
<span id="connection_status" class="status-badge disconnected">Terputus</span>
|
||||
</div>
|
||||
<div class="btn-group-row">
|
||||
<button id="connect_btn" class="btn btn-primary">Hubungkan Server</button>
|
||||
<button id="stream_btn" class="btn btn-accent" disabled>Mulai Mengubah Suara</button>
|
||||
<button id="play_toggle_btn" class="btn btn-primary" disabled>🔊 Mendengarkan: AKTIF</button>
|
||||
</div>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<!-- MAIN DASHBOARD CONTENT -->
|
||||
<main class="dashboard-grid">
|
||||
|
||||
<!-- MODEL CONFIGURATION -->
|
||||
<section class="card glassmorphism col-span-1" aria-labelledby="model-config-title">
|
||||
<h2 id="model-config-title" class="card-title">⚙️ Konfigurasi Model & Perangkat</h2>
|
||||
|
||||
<!-- QUICK PRESETS PANEL -->
|
||||
<div class="control-group">
|
||||
<label>⚡ Quick Presets (Profil Performa)</label>
|
||||
<div class="btn-group-row" style="width: 100%; display: grid; grid-template-columns: repeat(2, 1fr); gap: 0.5rem; height: auto; margin-bottom: 0.75rem;">
|
||||
<button id="preset_latency_btn" class="btn btn-primary" style="font-size: 0.8rem; padding: 0.65rem 0.5rem;">⚡ Respon Kilat (PM)</button>
|
||||
<button id="preset_quality_btn" class="btn btn-accent" style="font-size: 0.8rem; padding: 0.65rem 0.5rem;">🎙️ Kualitas Tinggi (RMVPE)</button>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="control-group">
|
||||
<label for="model_select">Pilih Model Suara (RVC ONNX)</label>
|
||||
<select id="model_select" class="custom-select">
|
||||
<option value="HuTao">HuTao (Genshin Impact)</option>
|
||||
<option value="HuoHuo">HuoHuo (Honkai Star Rail)</option>
|
||||
</select>
|
||||
</div>
|
||||
|
||||
<div class="control-group">
|
||||
<label for="device_select">Execution Provider (Akselerasi GPU)</label>
|
||||
<select id="device_select" class="custom-select">
|
||||
<option value="cpu">CPU (Sangat Stabil)</option>
|
||||
<option value="cuda" selected>CUDA (NVIDIA GPU - Super Cepat)</option>
|
||||
<option value="dml">DirectML (AMD/Intel GPU Windows)</option>
|
||||
</select>
|
||||
</div>
|
||||
|
||||
<!-- DUAL AUDIO ROUTING MODE (SERVER VS CLIENT) -->
|
||||
<div class="control-group" style="border-top: 1px solid rgba(255, 255, 255, 0.05); padding-top: 0.75rem; margin-top: 0.75rem;">
|
||||
<label for="routing_mode_select">Mode Audio (Routing Mode)</label>
|
||||
<select id="routing_mode_select" class="custom-select">
|
||||
<option value="browser" selected>Client Mode (Browser Streaming - Portabel)</option>
|
||||
<option value="hardware">Server Mode (Hardware Direct - Latensi Nol)</option>
|
||||
</select>
|
||||
</div>
|
||||
|
||||
<div id="hardware_devices_panel" class="control-group" style="display: none; border: 1px solid rgba(99, 102, 241, 0.2); padding: 0.75rem; border-radius: 8px; background: rgba(11, 12, 19, 0.5); box-shadow: 0 0 10px rgba(99, 102, 241, 0.05);">
|
||||
<div style="margin-bottom: 0.75rem;">
|
||||
<label for="server_input_select" style="font-size: 0.75rem; margin-bottom: 0.25rem; color: var(--primary); text-transform: uppercase; font-weight: 600;">🎙️ Input Mikrofon Server</label>
|
||||
<select id="server_input_select" class="custom-select" style="font-size: 0.8rem; padding: 0.4rem;"></select>
|
||||
</div>
|
||||
<div>
|
||||
<label for="server_output_select" style="font-size: 0.75rem; margin-bottom: 0.25rem; color: var(--accent); text-transform: uppercase; font-weight: 600;">🔊 Output Speaker/Kabel Server</label>
|
||||
<select id="server_output_select" class="custom-select" style="font-size: 0.8rem; padding: 0.4rem;"></select>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="control-group">
|
||||
<label>Metode Deteksi Nada (Pitch Extraction)</label>
|
||||
<div class="radio-group-modern">
|
||||
<label class="radio-tile">
|
||||
<input type="radio" name="f0_method" value="pm" checked>
|
||||
<span class="tile-label">PM (Tercepat)</span>
|
||||
</label>
|
||||
<label class="radio-tile">
|
||||
<input type="radio" name="f0_method" value="dio">
|
||||
<span class="tile-label">DIO (Ringan)</span>
|
||||
</label>
|
||||
<label class="radio-tile">
|
||||
<input type="radio" name="f0_method" value="harvest">
|
||||
<span class="tile-label">Harvest (Stabil)</span>
|
||||
</label>
|
||||
<label class="radio-tile">
|
||||
<input type="radio" name="f0_method" value="rmvpe">
|
||||
<span class="tile-label">RMVPE (Fidelitas Tinggi)</span>
|
||||
</label>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="control-group">
|
||||
<div class="slider-header">
|
||||
<label for="transpose_slider">Transpose (Pengubah Nada)</label>
|
||||
<span id="transpose_val" class="slider-value">0 semitone</span>
|
||||
</div>
|
||||
<input type="range" id="transpose_slider" min="-24" max="24" value="0" step="1" class="custom-slider">
|
||||
<div class="slider-ticks">
|
||||
<span>-24 (Pria Berat)</span>
|
||||
<span>0 (Asli)</span>
|
||||
<span>+24 (Wanita/Anime)</span>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
|
||||
<!-- AUDIO DSP & PROCESSING -->
|
||||
<section class="card glassmorphism col-span-1" aria-labelledby="dsp-title">
|
||||
<h2 id="dsp-title" class="card-title">🎛️ Pemrosesan Audio (DSP)</h2>
|
||||
|
||||
<div class="control-group">
|
||||
<div class="slider-header">
|
||||
<label for="gate_slider">Noise Gate (Threshold)</label>
|
||||
<span id="gate_val" class="slider-value">-40 dB</span>
|
||||
</div>
|
||||
<input type="range" id="gate_slider" min="-60" max="-10" value="-40" step="1" class="custom-slider">
|
||||
<div class="slider-ticks">
|
||||
<span>-60 dB (Sensitif)</span>
|
||||
<span>-40 dB (Default)</span>
|
||||
<span>-10 dB (Ketat)</span>
|
||||
</div>
|
||||
</div>
|
||||
|
||||
<div class="control-group">
|
||||
<div class="slider-header">
|
||||
<label for="input_gain_slider">Input Gain (Penguat Mic)</label>
|
||||
<span id="input_gain_val" class="slider-value">1.0x</span>
|
||||
</div>
|
||||
<input type="range" id="input_gain_slider" min="0" max="3" value="1" step="0.1" class="custom-slider">
|
||||
</div>
|
||||
|
||||
<div class="control-group">
|
||||
<div class="slider-header">
|
||||
<label for="output_gain_slider">Output Gain (Volume Suara)</label>
|
||||
<span id="output_gain_val" class="slider-value">1.0x</span>
|
||||
</div>
|
||||
<input type="range" id="output_gain_slider" min="0" max="3" value="1" step="0.1" class="custom-slider">
|
||||
</div>
|
||||
|
||||
<div id="browser_noise_cancel_group" class="control-group">
|
||||
<label class="checkbox-container" style="display: flex; align-items: center; gap: 0.5rem; cursor: pointer; user-select: none;">
|
||||
<input type="checkbox" id="noise_cancel_checkbox" checked style="width: 18px; height: 18px; cursor: pointer; accent-color: var(--primary);">
|
||||
<span class="checkbox-label" style="font-size: 0.85rem; font-weight: 500; color: var(--text-muted); text-transform: uppercase;">🚫 Peredam Bising (Noise Cancel)</span>
|
||||
</label>
|
||||
</div>
|
||||
|
||||
<div class="control-group">
|
||||
<label for="chunk_select">Ukuran Buffer (Chunk Size - Latensi vs Stabilitas)</label>
|
||||
<select id="chunk_select" class="custom-select">
|
||||
<option value="8192" selected>8192 sampel (~170ms - Rekomendasi Minim Distorsi)</option>
|
||||
<option value="12288">12288 sampel (~250ms - Sangat Halus & Merdu)</option>
|
||||
<option value="16384">16384 sampel (~340ms - Kualitas Studio Sangat Stabil)</option>
|
||||
<option value="24576">24576 sampel (~510ms - Super Halus & Kokoh)</option>
|
||||
<option value="32768">32768 sampel (~680ms - Fidelitas Maksimal)</option>
|
||||
<option value="49152">49152 sampel (~1.0 detik - Ultra Smooth Cinema)</option>
|
||||
<option value="65536">65536 sampel (~1.3 detik - Kestabilan Maksimal)</option>
|
||||
<option value="98304">98304 sampel (~2.0 detik - Mode Penyiaran/Broadcasting)</option>
|
||||
</select>
|
||||
</div>
|
||||
</section>
|
||||
|
||||
<!-- OSCILLOSCOPES / WAVEFORM VISUALIZERS -->
|
||||
<section class="card glassmorphism col-span-2" aria-labelledby="visualizer-title">
|
||||
<h2 id="visualizer-title" class="card-title">📊 Live Audio Waveform & Visualizer</h2>
|
||||
|
||||
<div class="visualizer-row">
|
||||
<div class="visualizer-container">
|
||||
<div class="vis-label">
|
||||
<span class="dot input-dot"></span>
|
||||
<span>Sinyal Mikrofon (Input)</span>
|
||||
</div>
|
||||
<canvas id="input_canvas" class="waveform-canvas"></canvas>
|
||||
</div>
|
||||
<div class="visualizer-container">
|
||||
<div class="vis-label">
|
||||
<span class="dot output-dot"></span>
|
||||
<span>Hasil AI Voice (Output)</span>
|
||||
</div>
|
||||
<canvas id="output_canvas" class="waveform-canvas"></canvas>
|
||||
</div>
|
||||
</div>
|
||||
</section>
|
||||
|
||||
</main>
|
||||
|
||||
<!-- PERFORMANCE HUD FOOTER -->
|
||||
<footer class="performance-hud card glassmorphism">
|
||||
<div class="hud-item">
|
||||
<span class="hud-label">Latensi Bulat (RTT)</span>
|
||||
<span id="hud_latency" class="hud-value italic">-- ms</span>
|
||||
</div>
|
||||
<div class="hud-separator"></div>
|
||||
<div class="hud-item">
|
||||
<span class="hud-label">Rasio Pemrosesan</span>
|
||||
<span id="hud_time" class="hud-value text-accent">-- ms</span>
|
||||
</div>
|
||||
<div class="hud-separator"></div>
|
||||
<div class="hud-item">
|
||||
<span class="hud-label">Sinyal Suara</span>
|
||||
<span id="hud_gate_status" class="hud-value active-badge">Berdiam</span>
|
||||
</div>
|
||||
<div class="hud-separator"></div>
|
||||
<div class="hud-item">
|
||||
<span class="hud-label">Frekuensi Audio</span>
|
||||
<span id="hud_sr" class="hud-value">44100 Hz</span>
|
||||
</div>
|
||||
</footer>
|
||||
</div>
|
||||
|
||||
<!-- Link to premium Javascript logic -->
|
||||
<script src="app.js"></script>
|
||||
</body>
|
||||
</html>
|
||||
@@ -0,0 +1,10 @@
|
||||
import type { NextConfig } from "next";
|
||||
|
||||
const nextConfig: NextConfig = {
|
||||
output: 'export',
|
||||
images: {
|
||||
unoptimized: true,
|
||||
},
|
||||
};
|
||||
|
||||
export default nextConfig;
|
||||
Generated
+6829
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,30 @@
|
||||
{
|
||||
"name": "frontend",
|
||||
"version": "0.1.0",
|
||||
"private": true,
|
||||
"scripts": {
|
||||
"dev": "next dev",
|
||||
"build": "next build",
|
||||
"start": "next start",
|
||||
"lint": "eslint"
|
||||
},
|
||||
"dependencies": {
|
||||
"clsx": "^2.1.1",
|
||||
"framer-motion": "^12.40.0",
|
||||
"lucide-react": "^1.17.0",
|
||||
"next": "16.2.6",
|
||||
"react": "19.2.4",
|
||||
"react-dom": "19.2.4",
|
||||
"tailwind-merge": "^3.6.0"
|
||||
},
|
||||
"devDependencies": {
|
||||
"@tailwindcss/postcss": "^4",
|
||||
"@types/node": "^20",
|
||||
"@types/react": "^19",
|
||||
"@types/react-dom": "^19",
|
||||
"eslint": "^9",
|
||||
"eslint-config-next": "16.2.6",
|
||||
"tailwindcss": "^4",
|
||||
"typescript": "^5"
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,7 @@
|
||||
const config = {
|
||||
plugins: {
|
||||
"@tailwindcss/postcss": {},
|
||||
},
|
||||
};
|
||||
|
||||
export default config;
|
||||
@@ -0,0 +1 @@
|
||||
<svg fill="none" viewBox="0 0 16 16" xmlns="http://www.w3.org/2000/svg"><path d="M14.5 13.5V5.41a1 1 0 0 0-.3-.7L9.8.29A1 1 0 0 0 9.08 0H1.5v13.5A2.5 2.5 0 0 0 4 16h8a2.5 2.5 0 0 0 2.5-2.5m-1.5 0v-7H8v-5H3v12a1 1 0 0 0 1 1h8a1 1 0 0 0 1-1M9.5 5V2.12L12.38 5zM5.13 5h-.62v1.25h2.12V5zm-.62 3h7.12v1.25H4.5zm.62 3h-.62v1.25h7.12V11z" clip-rule="evenodd" fill="#666" fill-rule="evenodd"/></svg>
|
||||
|
After Width: | Height: | Size: 391 B |
@@ -0,0 +1 @@
|
||||
<svg fill="none" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16"><g clip-path="url(#a)"><path fill-rule="evenodd" clip-rule="evenodd" d="M10.27 14.1a6.5 6.5 0 0 0 3.67-3.45q-1.24.21-2.7.34-.31 1.83-.97 3.1M8 16A8 8 0 1 0 8 0a8 8 0 0 0 0 16m.48-1.52a7 7 0 0 1-.96 0H7.5a4 4 0 0 1-.84-1.32q-.38-.89-.63-2.08a40 40 0 0 0 3.92 0q-.25 1.2-.63 2.08a4 4 0 0 1-.84 1.31zm2.94-4.76q1.66-.15 2.95-.43a7 7 0 0 0 0-2.58q-1.3-.27-2.95-.43a18 18 0 0 1 0 3.44m-1.27-3.54a17 17 0 0 1 0 3.64 39 39 0 0 1-4.3 0 17 17 0 0 1 0-3.64 39 39 0 0 1 4.3 0m1.1-1.17q1.45.13 2.69.34a6.5 6.5 0 0 0-3.67-3.44q.65 1.26.98 3.1M8.48 1.5l.01.02q.41.37.84 1.31.38.89.63 2.08a40 40 0 0 0-3.92 0q.25-1.2.63-2.08a4 4 0 0 1 .85-1.32 7 7 0 0 1 .96 0m-2.75.4a6.5 6.5 0 0 0-3.67 3.44 29 29 0 0 1 2.7-.34q.31-1.83.97-3.1M4.58 6.28q-1.66.16-2.95.43a7 7 0 0 0 0 2.58q1.3.27 2.95.43a18 18 0 0 1 0-3.44m.17 4.71q-1.45-.12-2.69-.34a6.5 6.5 0 0 0 3.67 3.44q-.65-1.27-.98-3.1" fill="#666"/></g><defs><clipPath id="a"><path fill="#fff" d="M0 0h16v16H0z"/></clipPath></defs></svg>
|
||||
|
After Width: | Height: | Size: 1.0 KiB |
@@ -0,0 +1 @@
|
||||
<svg xmlns="http://www.w3.org/2000/svg" fill="none" viewBox="0 0 394 80"><path fill="#000" d="M262 0h68.5v12.7h-27.2v66.6h-13.6V12.7H262V0ZM149 0v12.7H94v20.4h44.3v12.6H94v21h55v12.6H80.5V0h68.7zm34.3 0h-17.8l63.8 79.4h17.9l-32-39.7 32-39.6h-17.9l-23 28.6-23-28.6zm18.3 56.7-9-11-27.1 33.7h17.8l18.3-22.7z"/><path fill="#000" d="M81 79.3 17 0H0v79.3h13.6V17l50.2 62.3H81Zm252.6-.4c-1 0-1.8-.4-2.5-1s-1.1-1.6-1.1-2.6.3-1.8 1-2.5 1.6-1 2.6-1 1.8.3 2.5 1a3.4 3.4 0 0 1 .6 4.3 3.7 3.7 0 0 1-3 1.8zm23.2-33.5h6v23.3c0 2.1-.4 4-1.3 5.5a9.1 9.1 0 0 1-3.8 3.5c-1.6.8-3.5 1.3-5.7 1.3-2 0-3.7-.4-5.3-1s-2.8-1.8-3.7-3.2c-.9-1.3-1.4-3-1.4-5h6c.1.8.3 1.6.7 2.2s1 1.2 1.6 1.5c.7.4 1.5.5 2.4.5 1 0 1.8-.2 2.4-.6a4 4 0 0 0 1.6-1.8c.3-.8.5-1.8.5-3V45.5zm30.9 9.1a4.4 4.4 0 0 0-2-3.3 7.5 7.5 0 0 0-4.3-1.1c-1.3 0-2.4.2-3.3.5-.9.4-1.6 1-2 1.6a3.5 3.5 0 0 0-.3 4c.3.5.7.9 1.3 1.2l1.8 1 2 .5 3.2.8c1.3.3 2.5.7 3.7 1.2a13 13 0 0 1 3.2 1.8 8.1 8.1 0 0 1 3 6.5c0 2-.5 3.7-1.5 5.1a10 10 0 0 1-4.4 3.5c-1.8.8-4.1 1.2-6.8 1.2-2.6 0-4.9-.4-6.8-1.2-2-.8-3.4-2-4.5-3.5a10 10 0 0 1-1.7-5.6h6a5 5 0 0 0 3.5 4.6c1 .4 2.2.6 3.4.6 1.3 0 2.5-.2 3.5-.6 1-.4 1.8-1 2.4-1.7a4 4 0 0 0 .8-2.4c0-.9-.2-1.6-.7-2.2a11 11 0 0 0-2.1-1.4l-3.2-1-3.8-1c-2.8-.7-5-1.7-6.6-3.2a7.2 7.2 0 0 1-2.4-5.7 8 8 0 0 1 1.7-5 10 10 0 0 1 4.3-3.5c2-.8 4-1.2 6.4-1.2 2.3 0 4.4.4 6.2 1.2 1.8.8 3.2 2 4.3 3.4 1 1.4 1.5 3 1.5 5h-5.8z"/></svg>
|
||||
|
After Width: | Height: | Size: 1.3 KiB |
@@ -0,0 +1 @@
|
||||
<svg fill="none" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 1155 1000"><path d="m577.3 0 577.4 1000H0z" fill="#fff"/></svg>
|
||||
|
After Width: | Height: | Size: 128 B |
@@ -0,0 +1 @@
|
||||
<svg fill="none" xmlns="http://www.w3.org/2000/svg" viewBox="0 0 16 16"><path fill-rule="evenodd" clip-rule="evenodd" d="M1.5 2.5h13v10a1 1 0 0 1-1 1h-11a1 1 0 0 1-1-1zM0 1h16v11.5a2.5 2.5 0 0 1-2.5 2.5h-11A2.5 2.5 0 0 1 0 12.5zm3.75 4.5a.75.75 0 1 0 0-1.5.75.75 0 0 0 0 1.5M7 4.75a.75.75 0 1 1-1.5 0 .75.75 0 0 1 1.5 0m1.75.75a.75.75 0 1 0 0-1.5.75.75 0 0 0 0 1.5" fill="#666"/></svg>
|
||||
|
After Width: | Height: | Size: 385 B |
Binary file not shown.
|
After Width: | Height: | Size: 25 KiB |
@@ -0,0 +1,82 @@
|
||||
@import "tailwindcss";
|
||||
|
||||
@custom-variant dark (&:where(.dark, .dark *));
|
||||
|
||||
:root {
|
||||
--track-bg: #e4e4e7;
|
||||
}
|
||||
|
||||
.dark {
|
||||
--track-bg: #27272a;
|
||||
}
|
||||
|
||||
@theme {
|
||||
--color-background: #fafafa;
|
||||
--color-foreground: #18181b;
|
||||
|
||||
--color-primary: #84cc16; /* Lime-500 */
|
||||
--color-hover: #65a30d; /* Lime-600 */
|
||||
--color-soft-accent: #d9f99d; /* Lime-200 */
|
||||
--color-success: #10b981; /* Emerald-500 */
|
||||
|
||||
--color-text-primary: #18181b;
|
||||
--color-text-secondary: #52525b;
|
||||
|
||||
--radius-lg: 1rem; /* rounded-2xl */
|
||||
--radius-md: 0.75rem; /* rounded-xl */
|
||||
--radius-sm: 0.5rem; /* rounded-lg */
|
||||
}
|
||||
|
||||
/* Custom styling presets */
|
||||
body {
|
||||
background-color: #fafafa;
|
||||
color: #18181b;
|
||||
font-family: 'Inter', system-ui, -apple-system, sans-serif;
|
||||
overflow-x: hidden;
|
||||
}
|
||||
|
||||
/* Glowing Aura Background */
|
||||
.glow-backdrop {
|
||||
position: fixed;
|
||||
top: 0;
|
||||
left: 0;
|
||||
right: 0;
|
||||
bottom: 0;
|
||||
z-index: -10;
|
||||
pointer-events: none;
|
||||
background-image:
|
||||
radial-gradient(circle at 10% 20%, rgba(132, 204, 22, 0.04) 0%, transparent 40%),
|
||||
radial-gradient(circle at 90% 80%, rgba(16, 185, 129, 0.04) 0%, transparent 40%);
|
||||
}
|
||||
|
||||
/* Glassmorphism panel overlay */
|
||||
.glass-panel {
|
||||
background: rgba(255, 255, 255, 0.85);
|
||||
backdrop-filter: blur(12px);
|
||||
-webkit-backdrop-filter: blur(12px);
|
||||
border: 1px solid rgba(24, 24, 27, 0.05);
|
||||
}
|
||||
|
||||
/* Custom pulse animations for recording signals */
|
||||
@keyframes signal-pulse {
|
||||
0%, 100% {
|
||||
transform: scale(1);
|
||||
box-shadow: 0 0 0 0 rgba(132, 204, 22, 0.4);
|
||||
}
|
||||
50% {
|
||||
transform: scale(1.1);
|
||||
box-shadow: 0 0 10px 4px rgba(132, 204, 22, 0.2);
|
||||
}
|
||||
}
|
||||
|
||||
.pulse-indicator {
|
||||
display: inline-block;
|
||||
width: 10px;
|
||||
height: 10px;
|
||||
border-radius: 9999px;
|
||||
background-color: #84cc16;
|
||||
}
|
||||
|
||||
.pulse-indicator.active {
|
||||
animation: signal-pulse 2s infinite ease-in-out;
|
||||
}
|
||||
@@ -0,0 +1,33 @@
|
||||
import type { Metadata } from "next";
|
||||
import { Geist, Geist_Mono } from "next/font/google";
|
||||
import "./globals.css";
|
||||
|
||||
const geistSans = Geist({
|
||||
variable: "--font-geist-sans",
|
||||
subsets: ["latin"],
|
||||
});
|
||||
|
||||
const geistMono = Geist_Mono({
|
||||
variable: "--font-geist-mono",
|
||||
subsets: ["latin"],
|
||||
});
|
||||
|
||||
export const metadata: Metadata = {
|
||||
title: "🎙️ ONNX VC - Real-Time AI Voice Changer",
|
||||
description: "ONNX VC - Pengubah suara real-time berbasis AI berlatensi ultra-rendah dengan ONNX Runtime.",
|
||||
};
|
||||
|
||||
export default function RootLayout({
|
||||
children,
|
||||
}: Readonly<{
|
||||
children: React.ReactNode;
|
||||
}>) {
|
||||
return (
|
||||
<html
|
||||
lang="en"
|
||||
className={`${geistSans.variable} ${geistMono.variable} h-full antialiased`}
|
||||
>
|
||||
<body className="min-h-full flex flex-col">{children}</body>
|
||||
</html>
|
||||
);
|
||||
}
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,32 @@
|
||||
import * as React from "react";
|
||||
import { twMerge } from "tailwind-merge";
|
||||
|
||||
export interface BadgeProps extends React.HTMLAttributes<HTMLSpanElement> {
|
||||
variant?: 'default' | 'secondary' | 'outline' | 'success' | 'warning' | 'danger' | 'info';
|
||||
}
|
||||
|
||||
const Badge = React.forwardRef<HTMLSpanElement, BadgeProps>(
|
||||
({ className, variant = "default", ...props }, ref) => {
|
||||
return (
|
||||
<span
|
||||
ref={ref}
|
||||
className={twMerge(
|
||||
"inline-flex items-center rounded-full px-2.5 py-0.5 text-xs font-semibold select-none border tracking-wide uppercase",
|
||||
variant === 'default' && "bg-[var(--accent-color)] text-white border-transparent",
|
||||
variant === 'secondary' && "bg-[var(--accent-soft)] text-[var(--accent-text)] border-transparent",
|
||||
variant === 'success' && "bg-[#10b981]/10 text-[#059669] border-[#10b981]/20",
|
||||
variant === 'warning' && "bg-amber-500/10 text-amber-700 border-amber-500/20",
|
||||
variant === 'danger' && "bg-red-500/10 text-red-700 border-red-500/20",
|
||||
variant === 'info' && "bg-sky-500/10 text-sky-700 border-sky-500/20",
|
||||
variant === 'outline' && "text-zinc-600 dark:text-zinc-400 border-zinc-200 dark:border-zinc-800 bg-white dark:bg-zinc-900",
|
||||
className
|
||||
)}
|
||||
{...props}
|
||||
/>
|
||||
);
|
||||
}
|
||||
);
|
||||
Badge.displayName = "Badge";
|
||||
|
||||
export { Badge };
|
||||
export default Badge;
|
||||
@@ -0,0 +1,40 @@
|
||||
import * as React from "react";
|
||||
import { clsx } from "clsx";
|
||||
import { twMerge } from "tailwind-merge";
|
||||
|
||||
export interface ButtonProps extends React.ButtonHTMLAttributes<HTMLButtonElement> {
|
||||
variant?: 'primary' | 'secondary' | 'outline' | 'ghost' | 'accent' | 'success' | 'danger';
|
||||
size?: 'default' | 'sm' | 'lg' | 'icon';
|
||||
}
|
||||
|
||||
const Button = React.forwardRef<HTMLButtonElement, ButtonProps>(
|
||||
({ className, variant = "primary", size = "default", ...props }, ref) => {
|
||||
return (
|
||||
<button
|
||||
ref={ref}
|
||||
className={twMerge(
|
||||
"inline-flex items-center justify-center font-medium rounded-xl transition-all duration-200 focus-visible:outline-none focus-visible:ring-2 focus-visible:ring-lime-500 focus-visible:ring-offset-2 disabled:pointer-events-none disabled:opacity-50 active:scale-[0.98] cursor-pointer",
|
||||
// Sizing
|
||||
size === 'default' && "h-11 px-5 py-2.5 text-sm",
|
||||
size === 'sm' && "h-9 px-3.5 text-xs rounded-lg",
|
||||
size === 'lg' && "h-12 px-7 py-3 text-base rounded-2xl",
|
||||
size === 'icon' && "h-10 w-10 rounded-xl",
|
||||
// Variants
|
||||
variant === 'primary' && "bg-[var(--accent-color)] hover:bg-[var(--accent-hover)] text-white shadow-sm shadow-lime-500/10 font-semibold",
|
||||
variant === 'secondary' && "bg-[var(--accent-soft)] hover:bg-[var(--accent-soft)]/80 text-[var(--accent-text)] font-semibold",
|
||||
variant === 'accent' && "bg-zinc-900 dark:bg-[var(--accent-color)] hover:bg-zinc-850 dark:hover:bg-[var(--accent-hover)] text-white dark:text-zinc-950 shadow-sm font-semibold",
|
||||
variant === 'outline' && "border border-zinc-200 dark:border-zinc-800 bg-white dark:bg-zinc-900 hover:bg-zinc-50 dark:hover:bg-zinc-800 text-zinc-700 dark:text-zinc-300",
|
||||
variant === 'ghost' && "hover:bg-zinc-50 dark:hover:bg-zinc-800 text-zinc-700 dark:text-zinc-300",
|
||||
variant === 'success' && "bg-[#10b981] hover:bg-[#059669] text-white shadow-sm font-semibold",
|
||||
variant === 'danger' && "bg-red-500 hover:bg-red-600 text-white shadow-sm font-semibold",
|
||||
className
|
||||
)}
|
||||
{...props}
|
||||
/>
|
||||
);
|
||||
}
|
||||
);
|
||||
Button.displayName = "Button";
|
||||
|
||||
export { Button };
|
||||
export default Button;
|
||||
@@ -0,0 +1,78 @@
|
||||
import * as React from "react";
|
||||
import { motion, AnimatePresence } from "framer-motion";
|
||||
import { X } from "lucide-react";
|
||||
import { twMerge } from "tailwind-merge";
|
||||
|
||||
export interface DialogProps {
|
||||
isOpen: boolean;
|
||||
onClose: () => void;
|
||||
title: string;
|
||||
children: React.ReactNode;
|
||||
className?: string;
|
||||
}
|
||||
|
||||
export const Dialog: React.FC<DialogProps> = ({
|
||||
isOpen,
|
||||
onClose,
|
||||
title,
|
||||
children,
|
||||
className
|
||||
}) => {
|
||||
// Close dialog on escape key press
|
||||
React.useEffect(() => {
|
||||
if (!isOpen) return;
|
||||
const handleKeyDown = (e: KeyboardEvent) => {
|
||||
if (e.key === "Escape") onClose();
|
||||
};
|
||||
window.addEventListener("keydown", handleKeyDown);
|
||||
return () => window.removeEventListener("keydown", handleKeyDown);
|
||||
}, [isOpen, onClose]);
|
||||
|
||||
return (
|
||||
<AnimatePresence>
|
||||
{isOpen && (
|
||||
<div className="fixed inset-0 z-50 flex items-center justify-center p-4">
|
||||
{/* Backdrop Overlay */}
|
||||
<motion.div
|
||||
className="fixed inset-0 bg-zinc-950/40 backdrop-blur-sm"
|
||||
initial={{ opacity: 0 }}
|
||||
animate={{ opacity: 1 }}
|
||||
exit={{ opacity: 0 }}
|
||||
onClick={onClose}
|
||||
/>
|
||||
|
||||
{/* Modal content */}
|
||||
<motion.div
|
||||
className={twMerge(
|
||||
"relative bg-white dark:bg-zinc-900 w-full max-w-lg rounded-2xl p-6 shadow-xl border border-zinc-200/50 dark:border-zinc-800/80 z-10 overflow-hidden flex flex-col max-h-[85vh] text-zinc-800 dark:text-zinc-100",
|
||||
className
|
||||
)}
|
||||
initial={{ opacity: 0, scale: 0.95, y: 10 }}
|
||||
animate={{ opacity: 1, scale: 1, y: 0 }}
|
||||
exit={{ opacity: 0, scale: 0.95, y: 10 }}
|
||||
transition={{ duration: 0.2, ease: "easeOut" }}
|
||||
>
|
||||
{/* Header */}
|
||||
<div className="flex justify-between items-center mb-4 pb-2 border-b border-zinc-100 dark:border-zinc-800">
|
||||
<h2 className="text-lg font-bold text-zinc-900 dark:text-zinc-100 leading-none">
|
||||
{title}
|
||||
</h2>
|
||||
<button
|
||||
onClick={onClose}
|
||||
className="p-1.5 rounded-lg text-zinc-400 dark:text-zinc-500 hover:bg-zinc-50 dark:hover:bg-zinc-800 hover:text-zinc-700 dark:hover:text-zinc-300 transition-colors focus:outline-none focus:ring-2 focus:ring-[var(--accent-color)] cursor-pointer"
|
||||
>
|
||||
<X className="w-4 h-4" />
|
||||
</button>
|
||||
</div>
|
||||
|
||||
{/* Body */}
|
||||
<div className="overflow-y-auto pr-1 flex-1">
|
||||
{children}
|
||||
</div>
|
||||
</motion.div>
|
||||
</div>
|
||||
)}
|
||||
</AnimatePresence>
|
||||
);
|
||||
};
|
||||
export default Dialog;
|
||||
@@ -0,0 +1,50 @@
|
||||
import * as React from "react";
|
||||
import { twMerge } from "tailwind-merge";
|
||||
|
||||
export interface SelectProps extends React.SelectHTMLAttributes<HTMLSelectElement> {
|
||||
options: { value: string | number; label: string }[];
|
||||
}
|
||||
|
||||
const Select = React.forwardRef<HTMLSelectElement, SelectProps>(
|
||||
({ className, options, ...props }, ref) => {
|
||||
return (
|
||||
<div className="relative w-full">
|
||||
<select
|
||||
ref={ref}
|
||||
className={twMerge(
|
||||
"w-full h-11 px-4 text-sm bg-white dark:bg-zinc-900 border border-zinc-200 dark:border-zinc-800 rounded-xl focus:outline-none focus:ring-2 focus:ring-[var(--accent-color)] focus:border-[var(--accent-color)] transition-all cursor-pointer appearance-none text-zinc-800 dark:text-zinc-100 pr-10",
|
||||
className
|
||||
)}
|
||||
{...props}
|
||||
>
|
||||
{options.map((opt) => (
|
||||
<option key={opt.value} value={opt.value}>
|
||||
{opt.label}
|
||||
</option>
|
||||
))}
|
||||
</select>
|
||||
|
||||
{/* Custom Arrow Icon */}
|
||||
<div className="pointer-events-none absolute right-4 top-1/2 -translate-y-1/2 flex items-center justify-center text-zinc-400">
|
||||
<svg
|
||||
className="w-4 h-4"
|
||||
fill="none"
|
||||
stroke="currentColor"
|
||||
viewBox="0 0 24 24"
|
||||
>
|
||||
<path
|
||||
strokeLinecap="round"
|
||||
strokeLinejoin="round"
|
||||
strokeWidth="2"
|
||||
d="M19 9l-7 7-7-7"
|
||||
/>
|
||||
</svg>
|
||||
</div>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
);
|
||||
Select.displayName = "Select";
|
||||
|
||||
export { Select };
|
||||
export default Select;
|
||||
@@ -0,0 +1,46 @@
|
||||
import * as React from "react";
|
||||
import { clsx } from "clsx";
|
||||
import { twMerge } from "tailwind-merge";
|
||||
|
||||
export interface SliderProps extends Omit<React.InputHTMLAttributes<HTMLInputElement>, 'type'> {
|
||||
value: number;
|
||||
min: number;
|
||||
max: number;
|
||||
step?: number;
|
||||
onValueChange?: (val: number) => void;
|
||||
}
|
||||
|
||||
const Slider = React.forwardRef<HTMLInputElement, SliderProps>(
|
||||
({ className, value, min, max, step = 1, onValueChange, ...props }, ref) => {
|
||||
const percentage = ((value - min) / (max - min)) * 100;
|
||||
|
||||
return (
|
||||
<div className="relative w-full flex items-center select-none group">
|
||||
<input
|
||||
type="range"
|
||||
ref={ref}
|
||||
min={min}
|
||||
max={max}
|
||||
step={step}
|
||||
value={value}
|
||||
onChange={(e) => onValueChange?.(parseFloat(e.target.value))}
|
||||
className={twMerge(
|
||||
"w-full h-2 rounded-lg bg-zinc-200 dark:bg-zinc-800 appearance-none cursor-pointer outline-none focus:outline-none",
|
||||
// custom thumb styles
|
||||
"[&::-webkit-slider-thumb]:appearance-none [&::-webkit-slider-thumb]:w-5 [&::-webkit-slider-thumb]:h-5 [&::-webkit-slider-thumb]:rounded-full [&::-webkit-slider-thumb]:bg-white [&::-webkit-slider-thumb]:border-2 [&::-webkit-slider-thumb]:border-[var(--accent-color)] [&::-webkit-slider-thumb]:shadow-md [&::-webkit-slider-thumb]:transition-all [&::-webkit-slider-thumb]:active:scale-110",
|
||||
"[&::-moz-range-thumb]:w-5 [&::-moz-range-thumb]:h-5 [&::-moz-range-thumb]:rounded-full [&::-moz-range-thumb]:bg-white [&::-moz-range-thumb]:border-2 [&::-moz-range-thumb]:border-[var(--accent-color)] [&::-moz-range-thumb]:shadow-md [&::-moz-range-thumb]:transition-all [&::-moz-range-thumb]:active:scale-110",
|
||||
className
|
||||
)}
|
||||
style={{
|
||||
background: `linear-gradient(to right, var(--accent-color) 0%, var(--accent-color) ${percentage}%, var(--track-bg, #e4e4e7) ${percentage}%, var(--track-bg, #e4e4e7) 100%)`
|
||||
}}
|
||||
{...props}
|
||||
/>
|
||||
</div>
|
||||
);
|
||||
}
|
||||
);
|
||||
Slider.displayName = "Slider";
|
||||
|
||||
export { Slider };
|
||||
export default Slider;
|
||||
@@ -0,0 +1,52 @@
|
||||
import * as React from "react";
|
||||
import { twMerge } from "tailwind-merge";
|
||||
|
||||
export interface SwitchProps extends Omit<React.InputHTMLAttributes<HTMLInputElement>, 'type'> {
|
||||
checked: boolean;
|
||||
onCheckedChange?: (checked: boolean) => void;
|
||||
label?: string;
|
||||
variant?: 'default' | 'dark';
|
||||
}
|
||||
|
||||
const Switch = React.forwardRef<HTMLInputElement, SwitchProps>(
|
||||
({ className, checked, onCheckedChange, label, variant = 'default', ...props }, ref) => {
|
||||
return (
|
||||
<label className="flex items-center gap-3 cursor-pointer select-none group">
|
||||
<div className="relative">
|
||||
<input
|
||||
type="checkbox"
|
||||
ref={ref}
|
||||
checked={checked}
|
||||
onChange={(e) => onCheckedChange?.(e.target.checked)}
|
||||
className="sr-only"
|
||||
{...props}
|
||||
/>
|
||||
<div
|
||||
className={twMerge(
|
||||
"w-10 h-6 bg-zinc-200 dark:bg-zinc-800 rounded-full transition-all duration-200 group-focus-within:ring-2 group-focus-within:ring-offset-2",
|
||||
variant === 'dark' ? "group-focus-within:ring-zinc-500" : "group-focus-within:ring-[var(--accent-color)]",
|
||||
checked && (variant === 'dark' ? "bg-zinc-800 dark:bg-zinc-700 border border-zinc-700/50 dark:border-zinc-650/80" : "bg-[var(--accent-color)]"),
|
||||
className
|
||||
)}
|
||||
/>
|
||||
<div
|
||||
className={twMerge(
|
||||
"absolute left-0.5 top-0.5 w-5 h-5 bg-white dark:bg-zinc-900 rounded-full transition-all duration-200 shadow-sm border border-zinc-200/50 dark:border-zinc-800/80",
|
||||
checked && "translate-x-4",
|
||||
checked && (variant === 'dark' ? "border-zinc-500" : "border-[var(--accent-color)]")
|
||||
)}
|
||||
/>
|
||||
</div>
|
||||
{label && (
|
||||
<span className="text-sm font-medium text-zinc-700 dark:text-zinc-300 select-none">
|
||||
{label}
|
||||
</span>
|
||||
)}
|
||||
</label>
|
||||
);
|
||||
}
|
||||
);
|
||||
Switch.displayName = "Switch";
|
||||
|
||||
export { Switch };
|
||||
export default Switch;
|
||||
@@ -0,0 +1,126 @@
|
||||
'use client';
|
||||
|
||||
import React, { useEffect } from 'react';
|
||||
import { motion, AnimatePresence } from 'framer-motion';
|
||||
import { useWaveformCanvas } from '../../../hooks/useWaveformCanvas';
|
||||
import { usePictureInPicture } from '../../../hooks/usePictureInPicture';
|
||||
import { Button } from '../../../components/ui/button';
|
||||
import { Maximize2, MonitorOff, Activity } from 'lucide-react';
|
||||
import { translations, Language } from '../../../utils/translations';
|
||||
|
||||
interface WaveformPanelProps {
|
||||
title: string;
|
||||
buffer: React.MutableRefObject<Float32Array>;
|
||||
strokeColor: string;
|
||||
isTalking?: boolean;
|
||||
lineWidth?: number;
|
||||
traceFade?: number;
|
||||
isDark?: boolean;
|
||||
lang?: Language;
|
||||
}
|
||||
|
||||
export const WaveformPanel: React.FC<WaveformPanelProps> = ({
|
||||
title,
|
||||
buffer,
|
||||
strokeColor,
|
||||
isTalking = false,
|
||||
lineWidth = 2,
|
||||
traceFade = 0.4,
|
||||
isDark = false,
|
||||
lang = 'en',
|
||||
}) => {
|
||||
const t = translations[lang];
|
||||
// Background clear color: white for light mode, dark zinc-950 for dark mode
|
||||
const canvasBgColor = isDark ? `rgba(9, 9, 11, ${traceFade})` : `rgba(255, 255, 255, ${traceFade})`;
|
||||
|
||||
const { canvasRef, updateData } = useWaveformCanvas({
|
||||
strokeColor,
|
||||
fillColor: canvasBgColor, // dynamic trail alpha blending
|
||||
scaleAmplitude: 2.0,
|
||||
lineWidth,
|
||||
});
|
||||
|
||||
const { togglePip, isPipActive, isSupported } = usePictureInPicture();
|
||||
|
||||
// Draw buffer updates
|
||||
useEffect(() => {
|
||||
let active = true;
|
||||
const loop = () => {
|
||||
if (!active) return;
|
||||
updateData(buffer.current);
|
||||
requestAnimationFrame(loop);
|
||||
};
|
||||
loop();
|
||||
return () => {
|
||||
active = false;
|
||||
};
|
||||
}, [buffer, updateData]);
|
||||
|
||||
return (
|
||||
<motion.div
|
||||
className="bg-white dark:bg-zinc-900 border border-zinc-200/50 dark:border-zinc-800/80 shadow-sm rounded-2xl p-5 relative overflow-hidden flex flex-col h-full transition-colors"
|
||||
initial={{ opacity: 0, y: 15 }}
|
||||
animate={{ opacity: 1, y: 0 }}
|
||||
transition={{ duration: 0.4 }}
|
||||
>
|
||||
<div className="flex justify-between items-center mb-3">
|
||||
<div className="flex items-center gap-2">
|
||||
{isTalking && (
|
||||
<motion.span
|
||||
className="w-2.5 h-2.5 rounded-full bg-[var(--accent-color)]"
|
||||
animate={{ scale: [1, 1.4, 1], opacity: [1, 0.4, 1] }}
|
||||
transition={{ repeat: Infinity, duration: 1.2 }}
|
||||
/>
|
||||
)}
|
||||
<h3 className="font-bold text-zinc-800 dark:text-zinc-200 text-xs tracking-wider uppercase flex items-center gap-1.5">
|
||||
<Activity className="w-4 h-4 text-[var(--accent-color)]" />
|
||||
{title}
|
||||
</h3>
|
||||
</div>
|
||||
|
||||
{isSupported && (
|
||||
<Button
|
||||
variant="outline"
|
||||
size="sm"
|
||||
onClick={() => togglePip(canvasRef.current)}
|
||||
className="text-[10px] h-8 px-2.5 flex items-center gap-1 border-zinc-200 dark:border-zinc-800 hover:bg-[var(--accent-soft)] dark:hover:bg-[var(--accent-soft)]/20 hover:text-[var(--accent-text)] dark:hover:text-[var(--accent-color)] text-zinc-700 dark:text-zinc-300 transition-colors"
|
||||
>
|
||||
{isPipActive ? (
|
||||
<>
|
||||
<MonitorOff className="w-3 h-3" />
|
||||
{t.pipClose}
|
||||
</>
|
||||
) : (
|
||||
<>
|
||||
<Maximize2 className="w-3 h-3" />
|
||||
{t.pipStream}
|
||||
</>
|
||||
)}
|
||||
</Button>
|
||||
)}
|
||||
</div>
|
||||
|
||||
<div className="relative flex-1 min-h-[140px] bg-zinc-50 dark:bg-zinc-950 border border-zinc-100 dark:border-zinc-900 rounded-xl overflow-hidden shadow-inner transition-colors">
|
||||
<canvas
|
||||
ref={canvasRef}
|
||||
className="w-full h-full block cursor-pointer"
|
||||
/>
|
||||
|
||||
<AnimatePresence>
|
||||
{isTalking && (
|
||||
<motion.div
|
||||
className="absolute top-2.5 right-2.5 bg-[var(--accent-color)]/90 text-white text-[9px] font-bold px-2 py-0.5 rounded-full shadow-sm uppercase tracking-wider backdrop-blur-sm"
|
||||
initial={{ opacity: 0, scale: 0.8 }}
|
||||
animate={{ opacity: 1, scale: 1 }}
|
||||
exit={{ opacity: 0, scale: 0.8 }}
|
||||
transition={{ duration: 0.2 }}
|
||||
>
|
||||
{t.activeSignal}
|
||||
</motion.div>
|
||||
)}
|
||||
</AnimatePresence>
|
||||
</div>
|
||||
</motion.div>
|
||||
);
|
||||
};
|
||||
export default WaveformPanel;
|
||||
@@ -0,0 +1,320 @@
|
||||
import { useEffect, useRef, useState, useCallback } from 'react';
|
||||
import { AudioConfig, ConnectionStatus, HardwareDevice } from '../types/audio';
|
||||
|
||||
export const useAudioPipeline = (
|
||||
wsUrl: string,
|
||||
config: AudioConfig,
|
||||
onConfigSync: (sr: number, list: HardwareDevice[]) => void
|
||||
) => {
|
||||
const [status, setStatus] = useState<ConnectionStatus>('disconnected');
|
||||
const [rtt, setRtt] = useState<number | null>(null);
|
||||
const [processingTime, setProcessingTime] = useState<number | null>(null);
|
||||
const [isTalking, setIsTalking] = useState<boolean>(false);
|
||||
const [isStreaming, setIsStreaming] = useState<boolean>(false);
|
||||
const [playOutput, setPlayOutput] = useState<boolean>(true);
|
||||
|
||||
const socketRef = useRef<WebSocket | null>(null);
|
||||
const audioCtxRef = useRef<AudioContext | null>(null);
|
||||
const micStreamRef = useRef<MediaStream | null>(null);
|
||||
const micSourceRef = useRef<MediaStreamAudioSourceNode | null>(null);
|
||||
const processorRef = useRef<ScriptProcessorNode | null>(null);
|
||||
const sampleRateRef = useRef<number>(40000);
|
||||
|
||||
// High-performance canvas rolling buffers
|
||||
const inputDisplayBuf = useRef<Float32Array>(new Float32Array(4096));
|
||||
const outputDisplayBuf = useRef<Float32Array>(new Float32Array(4096));
|
||||
const micAccumulator = useRef<Float32Array>(new Float32Array(0));
|
||||
|
||||
// Playback scheduling & timing
|
||||
const sentTimestamps = useRef<{ id: number; sent: number }[]>([]);
|
||||
const nextPlaybackTime = useRef<number>(0);
|
||||
const outputChunkQueue = useRef<{ data: Float32Array; startTime: number }[]>([]);
|
||||
|
||||
// Function to stringify and sync configs
|
||||
const sendConfig = useCallback(() => {
|
||||
const socket = socketRef.current;
|
||||
if (!socket || socket.readyState !== WebSocket.OPEN) return;
|
||||
|
||||
socket.send(JSON.stringify({
|
||||
type: 'config',
|
||||
model_name: config.model_name,
|
||||
device: config.device,
|
||||
f0_method: config.f0_method,
|
||||
f0_up_key: config.f0_up_key,
|
||||
noise_gate: config.noise_gate,
|
||||
input_gain: config.input_gain,
|
||||
output_gain: config.output_gain,
|
||||
input_sr: audioCtxRef.current ? audioCtxRef.current.sampleRate : 44100,
|
||||
routing_mode: config.routing_mode,
|
||||
input_device: config.input_device,
|
||||
output_device: config.output_device,
|
||||
chunk_size: config.chunk_size
|
||||
}));
|
||||
}, [config]);
|
||||
|
||||
// Decodes array buffers from Python server
|
||||
const handleServerAudio = useCallback((arrayBuffer: ArrayBuffer) => {
|
||||
if (!audioCtxRef.current) return;
|
||||
|
||||
const now = performance.now();
|
||||
if (sentTimestamps.current.length > 0) {
|
||||
const oldest = sentTimestamps.current.shift();
|
||||
if (oldest) {
|
||||
setRtt(Math.round(now - oldest.sent));
|
||||
}
|
||||
}
|
||||
|
||||
const payload = new Float32Array(arrayBuffer);
|
||||
const procTime = payload[0];
|
||||
const pcmData = payload.subarray(1);
|
||||
|
||||
setProcessingTime(Math.max(0, Math.round(procTime)));
|
||||
|
||||
const ctx = audioCtxRef.current;
|
||||
const audioBuf = ctx.createBuffer(1, pcmData.length, sampleRateRef.current);
|
||||
audioBuf.getChannelData(0).set(pcmData);
|
||||
|
||||
const source = ctx.createBufferSource();
|
||||
source.buffer = audioBuf;
|
||||
|
||||
// Only route node to speaker output if user didn't mute local listening
|
||||
if (playOutput) {
|
||||
source.connect(ctx.destination);
|
||||
}
|
||||
|
||||
// Precise schedule timelines
|
||||
const currentTime = ctx.currentTime;
|
||||
const duration = audioBuf.duration;
|
||||
const adaptiveBuf = Math.min(duration * 2.5, 0.50);
|
||||
|
||||
if (nextPlaybackTime.current < currentTime) {
|
||||
nextPlaybackTime.current = currentTime + adaptiveBuf;
|
||||
} else if (nextPlaybackTime.current > currentTime + duration * 5.0) {
|
||||
nextPlaybackTime.current = currentTime + adaptiveBuf; // Latency Buster
|
||||
}
|
||||
|
||||
const startSchedule = nextPlaybackTime.current;
|
||||
source.start(startSchedule);
|
||||
nextPlaybackTime.current += duration;
|
||||
|
||||
// Queue for syncing waveform outputs
|
||||
outputChunkQueue.current.push({ data: pcmData, startTime: startSchedule });
|
||||
while (outputChunkQueue.current.length > 0) {
|
||||
const c = outputChunkQueue.current[0];
|
||||
if (c.startTime + c.data.length / sampleRateRef.current < ctx.currentTime - 2.0) {
|
||||
outputChunkQueue.current.shift();
|
||||
} else break;
|
||||
}
|
||||
|
||||
// Push output PCM samples to rolling display buffers
|
||||
const size = 4096;
|
||||
const display = outputDisplayBuf.current;
|
||||
if (pcmData.length >= size) {
|
||||
display.set(pcmData.slice(pcmData.length - size));
|
||||
} else {
|
||||
display.copyWithin(0, pcmData.length);
|
||||
display.set(pcmData, size - pcmData.length);
|
||||
}
|
||||
}, [playOutput]);
|
||||
|
||||
const disconnect = useCallback(() => {
|
||||
if (socketRef.current) {
|
||||
try {
|
||||
socketRef.current.close();
|
||||
} catch (e) {}
|
||||
socketRef.current = null;
|
||||
}
|
||||
setStatus('disconnected');
|
||||
}, []);
|
||||
|
||||
const connect = useCallback(() => {
|
||||
disconnect();
|
||||
setStatus('connecting');
|
||||
|
||||
try {
|
||||
const ws = new WebSocket(wsUrl);
|
||||
ws.binaryType = 'arraybuffer';
|
||||
|
||||
ws.onopen = () => {
|
||||
setStatus('connected');
|
||||
socketRef.current = ws;
|
||||
sendConfig();
|
||||
};
|
||||
|
||||
ws.onclose = () => {
|
||||
setStatus('disconnected');
|
||||
socketRef.current = null;
|
||||
};
|
||||
|
||||
ws.onerror = () => {
|
||||
setStatus('disconnected');
|
||||
socketRef.current = null;
|
||||
};
|
||||
|
||||
ws.onmessage = (event) => {
|
||||
if (typeof event.data === 'string') {
|
||||
try {
|
||||
const data = JSON.parse(event.data);
|
||||
if (data.type === 'config_success') {
|
||||
sampleRateRef.current = data.target_sr;
|
||||
} else if (data.type === 'init_devices') {
|
||||
onConfigSync(data.target_sr || 40000, data.devices || []);
|
||||
} else if (data.type === 'visualizer') {
|
||||
// Hardware mode visualizer data stream
|
||||
inputDisplayBuf.current.set(new Float32Array(data.input));
|
||||
outputDisplayBuf.current.set(new Float32Array(data.output));
|
||||
}
|
||||
} catch (e) {
|
||||
console.error('WS JSON parse error:', e);
|
||||
}
|
||||
} else if (event.data instanceof ArrayBuffer) {
|
||||
handleServerAudio(event.data);
|
||||
}
|
||||
};
|
||||
} catch (e) {
|
||||
console.error('WS Connection failed:', e);
|
||||
setStatus('disconnected');
|
||||
}
|
||||
}, [wsUrl, sendConfig, handleServerAudio, onConfigSync, disconnect]);
|
||||
|
||||
const stopStream = useCallback(() => {
|
||||
setIsStreaming(false);
|
||||
setIsTalking(false);
|
||||
|
||||
if (config.routing_mode === 'hardware') {
|
||||
const socket = socketRef.current;
|
||||
if (socket && socket.readyState === WebSocket.OPEN) {
|
||||
socket.send(JSON.stringify({
|
||||
type: 'config',
|
||||
routing_mode: 'browser' // tells server hardware stream to stop
|
||||
}));
|
||||
}
|
||||
}
|
||||
|
||||
if (micStreamRef.current) {
|
||||
micStreamRef.current.getTracks().forEach(t => t.stop());
|
||||
micStreamRef.current = null;
|
||||
}
|
||||
if (micSourceRef.current) {
|
||||
micSourceRef.current.disconnect();
|
||||
micSourceRef.current = null;
|
||||
}
|
||||
if (processorRef.current) {
|
||||
processorRef.current.disconnect();
|
||||
processorRef.current = null;
|
||||
}
|
||||
|
||||
micAccumulator.current = new Float32Array(0);
|
||||
setRtt(null);
|
||||
setProcessingTime(null);
|
||||
}, [config.routing_mode]);
|
||||
|
||||
const startStream = useCallback(async () => {
|
||||
if (config.routing_mode === 'hardware') {
|
||||
setIsStreaming(true);
|
||||
sendConfig();
|
||||
return;
|
||||
}
|
||||
|
||||
if (!audioCtxRef.current) {
|
||||
audioCtxRef.current = new (window.AudioContext || (window as any).webkitAudioContext)({
|
||||
latencyHint: 'interactive',
|
||||
});
|
||||
}
|
||||
|
||||
const ctx = audioCtxRef.current;
|
||||
if (ctx.state === 'suspended') {
|
||||
await ctx.resume();
|
||||
}
|
||||
|
||||
try {
|
||||
micStreamRef.current = await navigator.mediaDevices.getUserMedia({
|
||||
audio: {
|
||||
echoCancellation: true,
|
||||
noiseSuppression: true,
|
||||
autoGainControl: true,
|
||||
},
|
||||
});
|
||||
|
||||
micSourceRef.current = ctx.createMediaStreamSource(micStreamRef.current);
|
||||
processorRef.current = ctx.createScriptProcessor(4096, 1, 1);
|
||||
|
||||
processorRef.current.onaudioprocess = (e) => {
|
||||
const inputData = e.inputBuffer.getChannelData(0);
|
||||
|
||||
// Update input waveform display buffer
|
||||
const display = inputDisplayBuf.current;
|
||||
display.copyWithin(0, inputData.length);
|
||||
display.set(inputData, display.length - inputData.length);
|
||||
|
||||
// Append to local accumulator
|
||||
const nextAcc = new Float32Array(micAccumulator.current.length + inputData.length);
|
||||
nextAcc.set(micAccumulator.current);
|
||||
nextAcc.set(inputData, micAccumulator.current.length);
|
||||
micAccumulator.current = nextAcc;
|
||||
|
||||
const size = config.chunk_size;
|
||||
while (micAccumulator.current.length >= size) {
|
||||
const chunk = micAccumulator.current.slice(0, size);
|
||||
micAccumulator.current = micAccumulator.current.slice(size);
|
||||
|
||||
// Simple RMS for Voice Activity Badge
|
||||
let sum = 0;
|
||||
for (let i = 0; i < chunk.length; i++) sum += chunk[i] * chunk[i];
|
||||
const rms = Math.sqrt(sum / chunk.length);
|
||||
setIsTalking(rms > 0.005);
|
||||
|
||||
// Stream raw float PCM bytes
|
||||
const ws = socketRef.current;
|
||||
if (ws && ws.readyState === WebSocket.OPEN) {
|
||||
const time = performance.now();
|
||||
sentTimestamps.current.push({ id: time, sent: time });
|
||||
ws.send(chunk.buffer);
|
||||
}
|
||||
}
|
||||
};
|
||||
|
||||
micSourceRef.current.connect(processorRef.current);
|
||||
processorRef.current.connect(ctx.destination);
|
||||
nextPlaybackTime.current = 0;
|
||||
setIsStreaming(true);
|
||||
} catch (e) {
|
||||
console.error('Failed to start microphone streaming:', e);
|
||||
alert('Microphone access failed: ' + (e instanceof Error ? e.message : String(e)));
|
||||
stopStream();
|
||||
}
|
||||
}, [config.routing_mode, config.chunk_size, sendConfig, stopStream]);
|
||||
|
||||
// Sync config whenever React props config changes
|
||||
useEffect(() => {
|
||||
sendConfig();
|
||||
}, [config, sendConfig]);
|
||||
|
||||
// Lifecycle cleanups
|
||||
useEffect(() => {
|
||||
return () => {
|
||||
disconnect();
|
||||
stopStream();
|
||||
if (audioCtxRef.current) {
|
||||
audioCtxRef.current.close().catch(() => {});
|
||||
}
|
||||
};
|
||||
}, [disconnect, stopStream]);
|
||||
|
||||
return {
|
||||
status,
|
||||
rtt,
|
||||
processingTime,
|
||||
isTalking,
|
||||
isStreaming,
|
||||
playOutput,
|
||||
setPlayOutput,
|
||||
connect,
|
||||
disconnect,
|
||||
startStream,
|
||||
stopStream,
|
||||
inputBuffer: inputDisplayBuf,
|
||||
outputBuffer: outputDisplayBuf
|
||||
};
|
||||
};
|
||||
export default useAudioPipeline;
|
||||
@@ -0,0 +1,64 @@
|
||||
import { useEffect, useRef, useCallback } from 'react';
|
||||
|
||||
export interface ShortcutBinding {
|
||||
keys: string; // e.g. "Control+k", " ", "m", "alt+1"
|
||||
description: string;
|
||||
action: () => void;
|
||||
}
|
||||
|
||||
export const useKeyboardShortcuts = (bindings: ShortcutBinding[], enabled: boolean = true) => {
|
||||
const bindingsRef = useRef<ShortcutBinding[]>(bindings);
|
||||
|
||||
useEffect(() => {
|
||||
bindingsRef.current = bindings;
|
||||
}, [bindings]);
|
||||
|
||||
const handleKeyDown = useCallback((e: KeyboardEvent) => {
|
||||
if (!enabled) return;
|
||||
|
||||
// Avoid hijacking keystrokes when editing inputs
|
||||
const active = document.activeElement;
|
||||
if (active) {
|
||||
const name = active.tagName.toLowerCase();
|
||||
if (name === 'input' || name === 'textarea' || active.getAttribute('contenteditable') === 'true') {
|
||||
return;
|
||||
}
|
||||
}
|
||||
|
||||
const pressedKey = e.key.toLowerCase();
|
||||
const isCtrl = e.ctrlKey || e.metaKey;
|
||||
const isAlt = e.altKey;
|
||||
const isShift = e.shiftKey;
|
||||
|
||||
for (const binding of bindingsRef.current) {
|
||||
const keys = binding.keys.toLowerCase().split('+');
|
||||
|
||||
const requiresCtrl = keys.includes('control') || keys.includes('ctrl');
|
||||
const requiresAlt = keys.includes('alt');
|
||||
const requiresShift = keys.includes('shift');
|
||||
|
||||
const baseKey = keys.filter(k => !['control', 'ctrl', 'alt', 'shift'].includes(k))[0];
|
||||
|
||||
const matchesCtrl = requiresCtrl ? isCtrl : !isCtrl;
|
||||
const matchesAlt = requiresAlt ? isAlt : !isAlt;
|
||||
const matchesShift = requiresShift ? isShift : !isShift;
|
||||
|
||||
const normalizedBaseKey = baseKey === 'space' ? ' ' : baseKey;
|
||||
const matchesBase = pressedKey === normalizedBaseKey;
|
||||
|
||||
if (matchesCtrl && matchesAlt && matchesShift && matchesBase) {
|
||||
e.preventDefault();
|
||||
binding.action();
|
||||
break;
|
||||
}
|
||||
}
|
||||
}, [enabled]);
|
||||
|
||||
useEffect(() => {
|
||||
window.addEventListener('keydown', handleKeyDown);
|
||||
return () => {
|
||||
window.removeEventListener('keydown', handleKeyDown);
|
||||
};
|
||||
}, [handleKeyDown]);
|
||||
};
|
||||
export default useKeyboardShortcuts;
|
||||
@@ -0,0 +1,71 @@
|
||||
import { useState, useCallback, useRef, useEffect } from 'react';
|
||||
|
||||
export const usePictureInPicture = () => {
|
||||
const [isPipActive, setIsPipActive] = useState(false);
|
||||
const videoRef = useRef<HTMLVideoElement | null>(null);
|
||||
|
||||
useEffect(() => {
|
||||
if (typeof window === 'undefined') return;
|
||||
|
||||
const video = document.createElement('video');
|
||||
video.muted = true;
|
||||
video.playsInline = true;
|
||||
videoRef.current = video;
|
||||
|
||||
const handleLeavePip = () => {
|
||||
setIsPipActive(false);
|
||||
};
|
||||
|
||||
video.addEventListener('leavepictureinpicture', handleLeavePip);
|
||||
|
||||
return () => {
|
||||
video.removeEventListener('leavepictureinpicture', handleLeavePip);
|
||||
if (document.pictureInPictureElement === video) {
|
||||
document.exitPictureInPicture().catch(() => {});
|
||||
}
|
||||
};
|
||||
}, []);
|
||||
|
||||
const togglePip = useCallback(async (canvas: HTMLCanvasElement | null) => {
|
||||
if (!canvas || !videoRef.current) return;
|
||||
|
||||
const video = videoRef.current;
|
||||
|
||||
try {
|
||||
if (document.pictureInPictureElement === video) {
|
||||
await document.exitPictureInPicture();
|
||||
setIsPipActive(false);
|
||||
} else {
|
||||
// Capture a stream of the Canvas at 30 fps
|
||||
const stream = canvas.captureStream
|
||||
? canvas.captureStream(30)
|
||||
: (canvas as any).mozCaptureStream
|
||||
? (canvas as any).mozCaptureStream(30)
|
||||
: null;
|
||||
|
||||
if (!stream) {
|
||||
throw new Error("Canvas.captureStream() is not supported on this browser.");
|
||||
}
|
||||
|
||||
video.srcObject = stream;
|
||||
|
||||
await new Promise<void>((resolve) => {
|
||||
video.onloadedmetadata = () => {
|
||||
video.play().then(() => resolve());
|
||||
};
|
||||
});
|
||||
|
||||
await video.requestPictureInPicture();
|
||||
setIsPipActive(true);
|
||||
}
|
||||
} catch (error) {
|
||||
console.error("Picture-in-Picture failed:", error);
|
||||
alert("Picture-in-Picture error: " + (error instanceof Error ? error.message : String(error)));
|
||||
}
|
||||
}, []);
|
||||
|
||||
const isSupported = typeof window !== 'undefined' && 'pictureInPictureEnabled' in document;
|
||||
|
||||
return { togglePip, isPipActive, isSupported };
|
||||
};
|
||||
export default usePictureInPicture;
|
||||
@@ -0,0 +1,101 @@
|
||||
import { useEffect, useRef, useCallback } from 'react';
|
||||
|
||||
interface UseWaveformCanvasOptions {
|
||||
strokeColor: string;
|
||||
fillColor?: string;
|
||||
scaleAmplitude?: number;
|
||||
lineWidth?: number;
|
||||
}
|
||||
|
||||
export const useWaveformCanvas = (options: UseWaveformCanvasOptions) => {
|
||||
const canvasRef = useRef<HTMLCanvasElement | null>(null);
|
||||
const animationFrameRef = useRef<number | null>(null);
|
||||
const bufferRef = useRef<Float32Array | null>(null);
|
||||
|
||||
const updateData = useCallback((data: Float32Array) => {
|
||||
bufferRef.current = data;
|
||||
}, []);
|
||||
|
||||
useEffect(() => {
|
||||
const canvas = canvasRef.current;
|
||||
if (!canvas) return;
|
||||
|
||||
const ctx = canvas.getContext('2d');
|
||||
if (!ctx) return;
|
||||
|
||||
const handleResize = () => {
|
||||
const rect = canvas.getBoundingClientRect();
|
||||
canvas.width = rect.width * window.devicePixelRatio;
|
||||
canvas.height = rect.height * window.devicePixelRatio;
|
||||
|
||||
const baseColor = (options.fillColor || 'rgba(10, 10, 10, 0.4)').replace(/[\d.]+\)$/, '1)');
|
||||
ctx.fillStyle = baseColor;
|
||||
ctx.fillRect(0, 0, canvas.width, canvas.height);
|
||||
};
|
||||
|
||||
handleResize();
|
||||
window.addEventListener('resize', handleResize);
|
||||
|
||||
// Initial canvas clear with solid color
|
||||
const baseColor = (options.fillColor || 'rgba(10, 10, 10, 0.4)').replace(/[\d.]+\)$/, '1)');
|
||||
ctx.fillStyle = baseColor;
|
||||
ctx.fillRect(0, 0, canvas.width, canvas.height);
|
||||
|
||||
const draw = () => {
|
||||
const width = canvas.width;
|
||||
const height = canvas.height;
|
||||
const dataArray = bufferRef.current;
|
||||
|
||||
// Dark transparent fill for trace/fade visual trails
|
||||
ctx.fillStyle = options.fillColor || 'rgba(10, 10, 10, 0.4)';
|
||||
ctx.fillRect(0, 0, width, height);
|
||||
|
||||
if (dataArray && dataArray.length > 0) {
|
||||
ctx.lineWidth = (options.lineWidth ?? 2) * window.devicePixelRatio;
|
||||
ctx.strokeStyle = options.strokeColor;
|
||||
ctx.lineJoin = 'round';
|
||||
ctx.beginPath();
|
||||
|
||||
const sliceWidth = width / dataArray.length;
|
||||
let x = 0;
|
||||
|
||||
for (let i = 0; i < dataArray.length; i++) {
|
||||
const v = dataArray[i] * (options.scaleAmplitude ?? 1.5);
|
||||
const y = (v * (height / 2)) + (height / 2);
|
||||
|
||||
if (i === 0) {
|
||||
ctx.moveTo(x, y);
|
||||
} else {
|
||||
ctx.lineTo(x, y);
|
||||
}
|
||||
x += sliceWidth;
|
||||
}
|
||||
|
||||
ctx.lineTo(width, height / 2);
|
||||
ctx.stroke();
|
||||
}
|
||||
|
||||
// Draw subtle zero amplitude baseline
|
||||
ctx.strokeStyle = options.fillColor?.includes('255') ? 'rgba(0, 0, 0, 0.06)' : 'rgba(255, 255, 255, 0.06)';
|
||||
ctx.lineWidth = 1;
|
||||
ctx.beginPath();
|
||||
ctx.moveTo(0, height / 2);
|
||||
ctx.lineTo(width, height / 2);
|
||||
ctx.stroke();
|
||||
|
||||
animationFrameRef.current = requestAnimationFrame(draw);
|
||||
};
|
||||
|
||||
animationFrameRef.current = requestAnimationFrame(draw);
|
||||
|
||||
return () => {
|
||||
window.removeEventListener('resize', handleResize);
|
||||
if (animationFrameRef.current) {
|
||||
cancelAnimationFrame(animationFrameRef.current);
|
||||
}
|
||||
};
|
||||
}, [options.strokeColor, options.fillColor, options.scaleAmplitude, options.lineWidth]);
|
||||
|
||||
return { canvasRef, updateData };
|
||||
};
|
||||
export default useWaveformCanvas;
|
||||
@@ -0,0 +1,23 @@
|
||||
export interface AudioConfig {
|
||||
model_name: string;
|
||||
device: 'cpu' | 'cuda' | 'dml';
|
||||
f0_method: 'pm' | 'dio' | 'harvest' | 'rmvpe';
|
||||
f0_up_key: number;
|
||||
noise_gate: number;
|
||||
input_gain: number;
|
||||
output_gain: number;
|
||||
input_sr: number;
|
||||
routing_mode: 'browser' | 'hardware';
|
||||
input_device: number | null;
|
||||
output_device: number | null;
|
||||
chunk_size: number;
|
||||
}
|
||||
|
||||
export interface HardwareDevice {
|
||||
id: number;
|
||||
name: string;
|
||||
max_input_channels: number;
|
||||
max_output_channels: number;
|
||||
}
|
||||
|
||||
export type ConnectionStatus = 'disconnected' | 'connecting' | 'connected';
|
||||
@@ -0,0 +1,592 @@
|
||||
export type Language = 'en' | 'ja' | 'zh' | 'es' | 'id';
|
||||
|
||||
export const languages: { code: Language; label: string; flag: string }[] = [
|
||||
{ code: 'en', label: 'English', flag: '🇺🇸' },
|
||||
{ code: 'id', label: 'Bahasa Indonesia', flag: '🇮🇩' },
|
||||
{ code: 'ja', label: '日本語', flag: '🇯🇵' },
|
||||
{ code: 'zh', label: '简体中文', flag: '🇨🇳' },
|
||||
{ code: 'es', label: 'Español', flag: '🇪🇸' }
|
||||
];
|
||||
|
||||
export const translations = {
|
||||
en: {
|
||||
appTitle: "🎙️ ONNX VC",
|
||||
appSubtitle: "Low-latency real-time AI voice conversion powered by ONNX Runtime acceleration.",
|
||||
wsServerUrl: "WebSocket Server URL",
|
||||
wsPlaceholder: "ws://localhost:8765",
|
||||
connectionStatus: "Connection Status",
|
||||
disconnected: "Disconnected",
|
||||
connecting: "Connecting",
|
||||
connected: "Connected",
|
||||
connect: "Connect Server",
|
||||
disconnect: "Disconnect Server",
|
||||
startChanger: "Start Voice Changer",
|
||||
stopChanger: "Stop Voice Changer",
|
||||
listeningActive: "Listening: ACTIVE",
|
||||
listeningMute: "Listening: MUTED",
|
||||
|
||||
// Tabs
|
||||
tabDashboard: "Workspace",
|
||||
tabModel: "Model Settings",
|
||||
tabDsp: "Audio DSP",
|
||||
tabShortcuts: "Shortcuts",
|
||||
|
||||
// Model Config
|
||||
modelConfigTitle: "Model & Device Configuration",
|
||||
quickPresets: "Quick Presets (Performance Profile)",
|
||||
latencyPreset: "⚡ Instant Response (PM)",
|
||||
qualityPreset: "🎙️ High Fidelity (RMVPE)",
|
||||
selectModel: "Select Character Model (RVC ONNX)",
|
||||
executionProvider: "Execution Provider (GPU Acceleration)",
|
||||
routingMode: "Audio Routing Mode",
|
||||
clientMode: "Client Mode (Browser Streaming)",
|
||||
serverMode: "Server Mode (Direct Sounddevice)",
|
||||
serverInput: "Server Input Microphone",
|
||||
serverOutput: "Server Output Speaker",
|
||||
pitchMethod: "Pitch Extraction Method",
|
||||
transpose: "Transpose (Pitch Modifier)",
|
||||
transposeMale: "-24 (Male Pitch)",
|
||||
transposeNormal: "0 (Original)",
|
||||
transposeFemale: "+24 (Female/Anime Pitch)",
|
||||
|
||||
// DSP
|
||||
dspTitle: "Audio Processing Settings (DSP)",
|
||||
noiseGate: "Noise Gate (Threshold)",
|
||||
noiseGateSens: "-60 dB (Sensitive)",
|
||||
noiseGateDefault: "-40 dB (Default)",
|
||||
noiseGateStrict: "-10 dB (Strict)",
|
||||
inputGain: "Input Gain (Microphone)",
|
||||
outputGain: "Output Gain (AI Volume)",
|
||||
noiseCancel: "Noise Cancellation (Filter)",
|
||||
noiseCancelDesc: "Filters browser echo & background hum",
|
||||
bufferSize: "Buffer Size (Chunk Size - Latency vs Stability)",
|
||||
|
||||
// Visualizers
|
||||
visualizerTitle: "Real-Time Audio Visualizer",
|
||||
micSignal: "Microphone Input Signal",
|
||||
aiSignal: "AI Voice Output Signal",
|
||||
activeSignal: "Active Signal",
|
||||
pipStream: "PiP Waveform",
|
||||
pipClose: "Close PiP",
|
||||
|
||||
// HUD
|
||||
hudLatency: "RTT Latency",
|
||||
hudInference: "Inference Speed",
|
||||
hudDetector: "Voice Detector",
|
||||
hudTalking: "Speaking",
|
||||
hudSilent: "Silent",
|
||||
hudSr: "Model Frequency",
|
||||
hudHelp: "Press ? to view hotkeys menu",
|
||||
|
||||
// Shortcuts Dialog
|
||||
shortcutsTitle: "Keyboard Shortcuts Guide",
|
||||
shortcutsDesc: "Use these keyboard shortcuts to navigate the dashboard without a mouse:",
|
||||
shortcutsClose: "Close",
|
||||
shortcutConnect: "Connect / Disconnect WebSocket Server",
|
||||
shortcutStream: "Start / Stop AI Voice Changer",
|
||||
shortcutMute: "Mute / Unmute Output Audio Local Listening",
|
||||
shortcutPreset1: "Apply Preset: Instant Response (PM)",
|
||||
shortcutPreset2: "Apply Preset: High Fidelity (RMVPE)",
|
||||
shortcutHelp: "Open / Close Shortcuts Help Dialog",
|
||||
|
||||
// Premium layouts
|
||||
characterCardTitle: "Active Voice Character",
|
||||
characterAvatarDesc: "Currently loaded voice weight profile.",
|
||||
welcomeBack: "Real-Time Audio Control Center",
|
||||
currentLang: "Language",
|
||||
themeSettings: "Interface Theme & Accent",
|
||||
themeMode: "Theme Mode",
|
||||
themeDark: "Dark Mode",
|
||||
themeLight: "Light Mode",
|
||||
accentColorLabel: "Global Accent Color",
|
||||
tabCredits: "Credits",
|
||||
creditsTitle: "💖 Open Source Credits",
|
||||
creditsDescription: "ONNX VC is made possible thanks to the following incredible open-source projects and libraries:",
|
||||
liveTuningTitle: "Live Settings Tuning",
|
||||
customCanvasTitle: "Custom Canvas Visualizer",
|
||||
showMicInput: "Show Mic Input",
|
||||
showAiOutput: "Show AI Output",
|
||||
lineWidthLabel: "Line Width",
|
||||
traceDecayLabel: "Trace Decay (Fading)",
|
||||
inputLineColorLabel: "Input Line Color",
|
||||
outputLineColorLabel: "Output Line Color",
|
||||
creditCreatorTitle: "Creator & Integrator",
|
||||
creditNeuralTitle: "Neural Conversion",
|
||||
creditEngineTitle: "Inference Engine",
|
||||
creditPitchTitle: "Pitch Extraction",
|
||||
creditPipelineTitle: "Streaming Pipeline",
|
||||
creditFrameworkTitle: "Frontend Framework",
|
||||
creditDesignTitle: "Design & Animation",
|
||||
creditCreatorDesc: "Creators of the ONNX VC client interface and low-latency audio control workspace integration layer.",
|
||||
creditNeuralDesc: "The core neural network architecture for real-time voice feature retrieval and vocal conversion.",
|
||||
creditEngineDesc: "Cross-platform accelerator for machine learning models running on CPU, NVIDIA CUDA, and DirectML GPU backends.",
|
||||
creditPitchDesc: "Robust Minimum Vocal Pitch Estimation model providing highly accurate vocals pitch tracking under ambient noise.",
|
||||
creditPipelineDesc: "High-speed binary data transfer loops passing raw PCM float32 frames between the client browser and backend.",
|
||||
creditFrameworkDesc: "Modern web framework compiling React client-side components to statically optimized static exports.",
|
||||
creditDesignDesc: "Utility-first styling utility and fluid declarative animation libraries for interactive visual user interfaces."
|
||||
},
|
||||
id: {
|
||||
appTitle: "🎙️ ONNX VC",
|
||||
appSubtitle: "Pengubah suara real-time berbasis AI berlatensi ultra-rendah dengan akselerasi ONNX Runtime.",
|
||||
wsServerUrl: "URL Server WebSocket",
|
||||
wsPlaceholder: "ws://localhost:8765",
|
||||
connectionStatus: "Status Koneksi",
|
||||
disconnected: "Terputus",
|
||||
connecting: "Menghubungkan",
|
||||
connected: "Terhubung",
|
||||
connect: "Hubungkan Server",
|
||||
disconnect: "Putuskan Server",
|
||||
startChanger: "Mulai Mengubah Suara",
|
||||
stopChanger: "Hentikan Mengubah",
|
||||
listeningActive: "Mendengarkan: AKTIF",
|
||||
listeningMute: "Mendengarkan: SENYAP",
|
||||
|
||||
// Tabs
|
||||
tabDashboard: "Ruang Kerja",
|
||||
tabModel: "Setelan Model",
|
||||
tabDsp: "Audio DSP",
|
||||
tabShortcuts: "Shortcut",
|
||||
|
||||
// Model Config
|
||||
modelConfigTitle: "Konfigurasi Model & Perangkat",
|
||||
quickPresets: "Quick Presets (Profil Performa)",
|
||||
latencyPreset: "⚡ Respon Kilat (PM)",
|
||||
qualityPreset: "🎙️ Kualitas Tinggi (RMVPE)",
|
||||
selectModel: "Pilih Model Suara (RVC ONNX)",
|
||||
executionProvider: "Execution Provider (Akselerasi GPU)",
|
||||
routingMode: "Mode Routing Audio",
|
||||
clientMode: "Client Mode (Streaming Browser)",
|
||||
serverMode: "Server Mode (Direct Sounddevice)",
|
||||
serverInput: "Input Mikrofon Server",
|
||||
serverOutput: "Output Speaker Server",
|
||||
pitchMethod: "Metode Deteksi Nada (Pitch Extraction)",
|
||||
transpose: "Transpose (Pengubah Nada)",
|
||||
transposeMale: "-24 (Pria Berat)",
|
||||
transposeNormal: "0 (Asli)",
|
||||
transposeFemale: "+24 (Wanita/Anime)",
|
||||
|
||||
// DSP
|
||||
dspTitle: "Pemrosesan Audio (DSP)",
|
||||
noiseGate: "Noise Gate (Threshold)",
|
||||
noiseGateSens: "-60 dB (Sensitif)",
|
||||
noiseGateDefault: "-40 dB (Default)",
|
||||
noiseGateStrict: "-10 dB (Ketat)",
|
||||
inputGain: "Input Gain (Microphone)",
|
||||
outputGain: "Output Gain (Volume AI)",
|
||||
noiseCancel: "Peredam Bising (Noise Cancel)",
|
||||
noiseCancelDesc: "Filter gema & desah di browser",
|
||||
bufferSize: "Ukuran Buffer (Chunk Size - Latensi vs Stabilitas)",
|
||||
|
||||
// Visualizers
|
||||
visualizerTitle: "Visualisasi Waveform Live",
|
||||
micSignal: "Sinyal Mikrofon (Input)",
|
||||
aiSignal: "Hasil AI Voice (Output)",
|
||||
activeSignal: "Signal Aktif",
|
||||
pipStream: "PiP Waveform",
|
||||
pipClose: "Batal PiP",
|
||||
|
||||
// HUD
|
||||
hudLatency: "Latensi Bulat (RTT)",
|
||||
hudInference: "Kecepatan Inference",
|
||||
hudDetector: "Detektor Suara",
|
||||
hudTalking: "Bicara",
|
||||
hudSilent: "Berdiam",
|
||||
hudSr: "Frekuensi Model",
|
||||
hudHelp: "Tekan ? untuk melihat menu hotkey",
|
||||
|
||||
// Shortcuts Dialog
|
||||
shortcutsTitle: "Panduan Keyboard Shortcut",
|
||||
shortcutsDesc: "Gunakan keyboard shortcuts berikut untuk navigasi dashboard tanpa mouse:",
|
||||
shortcutsClose: "Tutup",
|
||||
shortcutConnect: "Hubungkan / Putuskan Server WebSocket",
|
||||
shortcutStream: "Mulai / Hentikan Pengubah Suara AI",
|
||||
shortcutMute: "Bungkam / Dengarkan Audio Output Lokal",
|
||||
shortcutPreset1: "Terapkan Profil: Respon Kilat (PM)",
|
||||
shortcutPreset2: "Terapkan Profil: Kualitas Tinggi (RMVPE)",
|
||||
shortcutHelp: "Buka / Tutup Dialog Panduan Shortcut",
|
||||
|
||||
// Premium layouts
|
||||
characterCardTitle: "Karakter Suara Aktif",
|
||||
characterAvatarDesc: "Profil bobot suara yang sedang dimuat saat ini.",
|
||||
welcomeBack: "Pusat Kontrol Audio Real-Time",
|
||||
currentLang: "Bahasa",
|
||||
themeSettings: "Tema Antarmuka & Aksen",
|
||||
themeMode: "Mode Tema",
|
||||
themeDark: "Mode Gelap",
|
||||
themeLight: "Mode Terang",
|
||||
accentColorLabel: "Warna Aksen Global",
|
||||
tabCredits: "Kredit Open Source",
|
||||
creditsTitle: "💖 Kredit Lisensi & Open Source",
|
||||
creditsDescription: "ONNX VC dimungkinkan berkat proyek dan pustaka open source luar biasa berikut:",
|
||||
liveTuningTitle: "Setelan Cepat Pemrosesan",
|
||||
customCanvasTitle: "Kustomisasi Canvas Visualizer",
|
||||
showMicInput: "Tampilkan Input Mic",
|
||||
showAiOutput: "Tampilkan Output AI",
|
||||
lineWidthLabel: "Ketebalan Garis",
|
||||
traceDecayLabel: "Intensitas Ekor (Trail Fading)",
|
||||
inputLineColorLabel: "Warna Garis Input",
|
||||
outputLineColorLabel: "Warna Garis Output",
|
||||
creditCreatorTitle: "Pencipta & Integrator",
|
||||
creditNeuralTitle: "Konversi Neural",
|
||||
creditEngineTitle: "Mesin Inferensi",
|
||||
creditPitchTitle: "Ekstraksi Nada Vokal",
|
||||
creditPipelineTitle: "Streaming Pipeline",
|
||||
creditFrameworkTitle: "Framework Frontend",
|
||||
creditDesignTitle: "Desain & Animasi",
|
||||
creditCreatorDesc: "Pengembang antarmuka audio ONNX VC dan pengintegrasi workspace kontrol audio real-time berlatensi ultra-rendah.",
|
||||
creditNeuralDesc: "Kerangka kerja pengubah suara berbasis AI yang menggunakan fitur retrieval untuk transfer karakter suara berlatensi rendah.",
|
||||
creditEngineDesc: "Mesin akselerasi inferensi model lintas platform untuk CPU, CUDA GPU, dan Windows DirectML GPU.",
|
||||
creditPitchDesc: "Model deteksi pitch vokal berkinerja tinggi yang presisi terhadap desau latar belakang.",
|
||||
creditPipelineDesc: "Pipa transfer data audio biner mentah PCM float32 yang berjalan lancar antara peramban dan server python.",
|
||||
creditFrameworkDesc: "Kerangka kerja aplikasi web terstruktur yang dikompilasi ke statik HTML ekspor.",
|
||||
creditDesignDesc: "Mesin animasi layout deklaratif dan utilitas CSS presisi untuk tampilan premium."
|
||||
},
|
||||
ja: {
|
||||
appTitle: "🎙️ ONNX VC",
|
||||
appSubtitle: "ONNX Runtime高速化による低遅延リアルタイムAI音声変換システム。",
|
||||
wsServerUrl: "WebSocketサーバーURL",
|
||||
wsPlaceholder: "ws://localhost:8765",
|
||||
connectionStatus: "接続状態",
|
||||
disconnected: "切断",
|
||||
connecting: "接続中...",
|
||||
connected: "接続完了",
|
||||
connect: "サーバー接続",
|
||||
disconnect: "接続解除",
|
||||
startChanger: "音声変換開始",
|
||||
stopChanger: "音声変換停止",
|
||||
listeningActive: "モニター音:ON",
|
||||
listeningMute: "モニター音:OFF",
|
||||
|
||||
// Tabs
|
||||
tabDashboard: "ワークスペース",
|
||||
tabModel: "モデル設定",
|
||||
tabDsp: "オーディオDSP",
|
||||
tabShortcuts: "ショートカット",
|
||||
|
||||
// Model Config
|
||||
modelConfigTitle: "モデルとデバイスの構成",
|
||||
quickPresets: "クイックプリセット (パフォーマンス)",
|
||||
latencyPreset: "⚡ 低遅延優先 (PM)",
|
||||
qualityPreset: "🎙️ 高音質優先 (RMVPE)",
|
||||
selectModel: "キャラクターモデルの選択 (RVC ONNX)",
|
||||
executionProvider: "実行プロバイダー (GPUアクセラレーション)",
|
||||
routingMode: "音声ルーティングモード",
|
||||
clientMode: "クライアントモード (ブラウザ再生)",
|
||||
serverMode: "サーバーモード (ハードウェア直結)",
|
||||
serverInput: "サーバー入力マイク",
|
||||
serverOutput: "サーバー出力スピーカー",
|
||||
pitchMethod: "ピッチ検出アルゴリズム",
|
||||
transpose: "ピッチ変換 (トランスポーズ)",
|
||||
transposeMale: "-24 (男声向け)",
|
||||
transposeNormal: "0 (原音)",
|
||||
transposeFemale: "+24 (女声/アニメ声)",
|
||||
|
||||
// DSP
|
||||
dspTitle: "オーディオ処理設定 (DSP)",
|
||||
noiseGate: "ノイズゲート (閾値)",
|
||||
noiseGateSens: "-60 dB (高感度)",
|
||||
noiseGateDefault: "-40 dB (推奨)",
|
||||
noiseGateStrict: "-10 dB (厳格)",
|
||||
inputGain: "入力ゲイン (マイク)",
|
||||
outputGain: "出力ゲイン (AI音量)",
|
||||
noiseCancel: "ノイズキャンセリング",
|
||||
noiseCancelDesc: "ブラウザのエコーと環境音を除去します",
|
||||
bufferSize: "バッファサイズ (遅延時間 vs 安定性)",
|
||||
|
||||
// Visualizers
|
||||
visualizerTitle: "リアルタイム波形表示",
|
||||
micSignal: "マイク入力信号",
|
||||
aiSignal: "AI音声出力信号",
|
||||
activeSignal: "音声検出中",
|
||||
pipStream: "PiP波形ウィンドウ",
|
||||
pipClose: "PiPを閉じる",
|
||||
|
||||
// HUD
|
||||
hudLatency: "応答速度 (RTT)",
|
||||
hudInference: "推論速度",
|
||||
hudDetector: "音声検出",
|
||||
hudTalking: "発話中",
|
||||
hudSilent: "無音",
|
||||
hudSr: "モデルサンプリングレート",
|
||||
hudHelp: "?キーでショートカットヘルプを表示",
|
||||
|
||||
// Shortcuts Dialog
|
||||
shortcutsTitle: "キーボードショートカット一覧",
|
||||
shortcutsDesc: "キーボードを使ってマウスなしで素早く操作できます:",
|
||||
shortcutsClose: "閉じる",
|
||||
shortcutConnect: "WebSocketサーバーの接続 / 切断",
|
||||
shortcutStream: "AI音声変換の開始 / 停止",
|
||||
shortcutMute: "ローカル出力のミュート / 解除",
|
||||
shortcutPreset1: "プリセット適用:低遅延優先 (PM)",
|
||||
shortcutPreset2: "プリセット適用:高音質優先 (RMVPE)",
|
||||
shortcutHelp: "ショートカット一覧の表示 / 非表示",
|
||||
|
||||
// Premium layouts
|
||||
characterCardTitle: "現在のボイスモデル",
|
||||
characterAvatarDesc: "現在ロードされている音声のキャラクタープロファイルです。",
|
||||
welcomeBack: "リアルタイムオーディオコントロールセンター",
|
||||
currentLang: "言語",
|
||||
themeSettings: "テーマとアクセント",
|
||||
themeMode: "テーマモード",
|
||||
themeDark: "ダークモード",
|
||||
themeLight: "ライトモード",
|
||||
accentColorLabel: "グローバルアクセントカラー",
|
||||
tabCredits: "オープンソース",
|
||||
creditsTitle: "💖 オープンソースクレジット",
|
||||
creditsDescription: "ONNX VCは、以下の素晴らしいオープンソースプロジェクトとライブラリのおかげで実現しました。",
|
||||
liveTuningTitle: "常用パラメータ微調整",
|
||||
customCanvasTitle: "カスタムビジュアライザ",
|
||||
showMicInput: "マイク入力を表示",
|
||||
showAiOutput: "AI出力を表示",
|
||||
lineWidthLabel: "線の太さ",
|
||||
traceDecayLabel: "残像フェード率",
|
||||
inputLineColorLabel: "入力線の色",
|
||||
outputLineColorLabel: "出力線の色",
|
||||
creditCreatorTitle: "開発・統合元",
|
||||
creditNeuralTitle: "ニューラル音声変換",
|
||||
creditEngineTitle: "推推論エンジン",
|
||||
creditPitchTitle: "ピッチ検出",
|
||||
creditPipelineTitle: "ストリーミング・パイプライン",
|
||||
creditFrameworkTitle: "フロントエンドフレームワーク",
|
||||
creditDesignTitle: "デザインとアニメーション",
|
||||
creditCreatorDesc: "ONNX VCクライアントインターフェースおよび超低遅延リアルタイムオーディオ制御ワークスペースの統合開発チーム。",
|
||||
creditNeuralDesc: "リアルタイムの音声特徴抽出および声質変換のためのコアニューラルネットワークアーキテクチャ。",
|
||||
creditEngineDesc: "CPU、NVIDIA CUDA、およびWindows DirectML GPUバックエンド上で動作する、クロスプラットフォームの推論高速化エンジン。",
|
||||
creditPitchDesc: "周囲のノイズ下でも高精度にボーカルのピッチ追跡を行うことができる高性能ピッチ推定モデル。",
|
||||
creditPipelineDesc: "ブラウザクライアントとPythonサーバー間で生のPCM float32フレームを高速に送受信するバイナリデータパイプライン。",
|
||||
creditFrameworkDesc: "Reactクライアントコンポーネントを静的に最適化されたHTMLにエクスポートするモダンウェブフレームワーク。",
|
||||
creditDesignDesc: "インタラクティブで高品質なUIデザインのための、ユーティリティ優先CSSおよび宣言的アニメーションライブラリ。"
|
||||
},
|
||||
zh: {
|
||||
appTitle: "🎙️ ONNX VC",
|
||||
appSubtitle: "基于 ONNX 运行时加速的低延迟实时 AI 变声器系统。",
|
||||
wsServerUrl: "WebSocket 服务器地址",
|
||||
wsPlaceholder: "ws://localhost:8765",
|
||||
connectionStatus: "连接状态",
|
||||
disconnected: "已断开",
|
||||
connecting: "连接中...",
|
||||
connected: "已连接",
|
||||
connect: "连接服务器",
|
||||
disconnect: "断开连接",
|
||||
startChanger: "开启变声",
|
||||
stopChanger: "停止变声",
|
||||
listeningActive: "声音监听:开启",
|
||||
listeningMute: "声音监听:静音",
|
||||
|
||||
// Tabs
|
||||
tabDashboard: "控制工作台",
|
||||
tabModel: "模型设置",
|
||||
tabDsp: "音频 DSP",
|
||||
tabShortcuts: "快捷键",
|
||||
|
||||
// Model Config
|
||||
modelConfigTitle: "变声模型与硬件设备配置",
|
||||
quickPresets: "快速预设 (性能配置)",
|
||||
latencyPreset: "⚡ 极速响应 (PM)",
|
||||
qualityPreset: "🎙️ 高清音质 (RMVPE)",
|
||||
selectModel: "选择声音模型 (RVC ONNX)",
|
||||
executionProvider: "运行加速提供商 (GPU 加速)",
|
||||
routingMode: "音频路由模式",
|
||||
clientMode: "客户端模式 (浏览器音频流转换)",
|
||||
serverMode: "服务器模式 (直连服务端硬件)",
|
||||
serverInput: "服务器输入麦克风",
|
||||
serverOutput: "服务器输出扬声器",
|
||||
pitchMethod: "基频检测算法 (Pitch)",
|
||||
transpose: "变调参数 (Transpose)",
|
||||
transposeMale: "-24 (男声声调)",
|
||||
transposeNormal: "0 (原音)",
|
||||
transposeFemale: "+24 (女声/动漫声调)",
|
||||
|
||||
// DSP
|
||||
dspTitle: "音频效果器配置 (DSP)",
|
||||
noiseGate: "噪声门限阈值 (Noise Gate)",
|
||||
noiseGateSens: "-60 dB (灵敏)",
|
||||
noiseGateDefault: "-40 dB (默认)",
|
||||
noiseGateStrict: "-10 dB (严格)",
|
||||
inputGain: "输入增益 (麦克风音量)",
|
||||
outputGain: "输出增益 (变声后音量)",
|
||||
noiseCancel: "回声抑噪过滤",
|
||||
noiseCancelDesc: "过滤浏览器的回声和杂音",
|
||||
bufferSize: "缓冲区大小 (延迟时间 vs 稳定性)",
|
||||
|
||||
// Visualizers
|
||||
visualizerTitle: "实时音频波形图",
|
||||
micSignal: "麦克风输入波形",
|
||||
aiSignal: "AI变声输出波形",
|
||||
activeSignal: "正在输入",
|
||||
pipStream: "画中画波形图",
|
||||
pipClose: "关闭画中画",
|
||||
|
||||
// HUD
|
||||
hudLatency: "双向延迟 (RTT)",
|
||||
hudInference: "推理用时",
|
||||
hudDetector: "声控指示器",
|
||||
hudTalking: "检测到讲话",
|
||||
hudSilent: "静音中",
|
||||
hudSr: "模型音频采样率",
|
||||
hudHelp: "按 ? 键打开快捷键指南",
|
||||
|
||||
// Shortcuts Dialog
|
||||
shortcutsTitle: "键盘快捷键指南",
|
||||
shortcutsDesc: "使用键盘快捷键可以在没有鼠标的情况下极速控制工作台:",
|
||||
shortcutsClose: "关闭",
|
||||
shortcutConnect: "连接 / 断开 WebSocket 服务器",
|
||||
shortcutStream: "开启 / 停止 AI 变声器",
|
||||
shortcutMute: "静音 / 开启本地输出监听",
|
||||
shortcutPreset1: "加载预设:极速响应 (PM)",
|
||||
shortcutPreset2: "加载预设:高清音质 (RMVPE)",
|
||||
shortcutHelp: "打开 / 关闭快捷键帮助面板",
|
||||
|
||||
// Premium layouts
|
||||
characterCardTitle: "当前声音人物",
|
||||
characterAvatarDesc: "当前正在承载的音频权重包与神经网络特征。",
|
||||
welcomeBack: "实时音频变声控制台",
|
||||
currentLang: "语言",
|
||||
themeSettings: "界面主题与强调色",
|
||||
themeMode: "主题模式",
|
||||
themeDark: "深色模式",
|
||||
themeLight: "浅色模式",
|
||||
accentColorLabel: "全局强调颜色",
|
||||
tabCredits: "开源鸣谢",
|
||||
creditsTitle: "💖 开源软件鸣谢",
|
||||
creditsDescription: "ONNX VC 的诞生离不开以下优秀的开源项目与函数库的支持:",
|
||||
liveTuningTitle: "常用变声微调",
|
||||
customCanvasTitle: "画布自定设置",
|
||||
showMicInput: "显示麦克风输入",
|
||||
showAiOutput: "显示AI变声输出",
|
||||
lineWidthLabel: "线条宽度",
|
||||
traceDecayLabel: "余晖消退率 (渐变)",
|
||||
inputLineColorLabel: "输入线颜色",
|
||||
outputLineColorLabel: "输出线颜色",
|
||||
creditCreatorTitle: "核心集成开发商",
|
||||
creditNeuralTitle: "声线转换算法",
|
||||
creditEngineTitle: "深度学习推理引擎",
|
||||
creditPitchTitle: "基频音高提取",
|
||||
creditPipelineTitle: "数据流通通道",
|
||||
creditFrameworkTitle: "前端应用框架",
|
||||
creditDesignTitle: "界面设计与动效",
|
||||
creditCreatorDesc: "ONNX VC 客户端界面设计与超低延迟音频控制工作台的集成开发者。",
|
||||
creditNeuralDesc: "基于检索的神经网络架构,用于实现低延迟的实时声音特征提取与音色转换。",
|
||||
creditEngineDesc: "跨平台的机器学习模型推理加速引擎,支持 CPU、NVIDIA CUDA 以及 Windows DirectML GPU 后端。",
|
||||
creditPitchDesc: "高性能人声基频检测模型,在背景嘈杂的环境下仍能提供极高精度的音高跟踪。",
|
||||
creditPipelineDesc: "在浏览器客户端与 Python 服务端之间高速传输原始 PCM Float32 音频帧的双向二进制数据通道。",
|
||||
creditFrameworkDesc: "现代网页开发框架,支持将 React 客户端组件编译并打包为高度优化的静态资源导出。",
|
||||
creditDesignDesc: "功能类优先 CSS 框架与流式声明式动画库,用以打造流畅的高级交互式视觉界面。"
|
||||
},
|
||||
es: {
|
||||
appTitle: "🎙️ ONNX VC",
|
||||
appSubtitle: "Modulador de voz por IA en tiempo real y baja latencia acelerado por ONNX Runtime.",
|
||||
wsServerUrl: "URL del Servidor WebSocket",
|
||||
wsPlaceholder: "ws://localhost:8765",
|
||||
connectionStatus: "Estado de la Conexión",
|
||||
disconnected: "Desconectado",
|
||||
connecting: "Conectando...",
|
||||
connected: "Conectado",
|
||||
connect: "Conectar Servidor",
|
||||
disconnect: "Desconectar Servidor",
|
||||
startChanger: "Iniciar Modulador",
|
||||
stopChanger: "Detener Modulador",
|
||||
listeningActive: "Escucha: ACTIVA",
|
||||
listeningMute: "Escucha: SILENCIADO",
|
||||
|
||||
// Tabs
|
||||
tabDashboard: "Espacio Trabajo",
|
||||
tabModel: "Ajustes Modelo",
|
||||
tabDsp: "Audio DSP",
|
||||
tabShortcuts: "Atajos Teclado",
|
||||
|
||||
// Model Config
|
||||
modelConfigTitle: "Configuración de Modelo y Dispositivo",
|
||||
quickPresets: "Ajustes Rápidos (Perfil de Rendimiento)",
|
||||
latencyPreset: "⚡ Respuesta Instantánea (PM)",
|
||||
qualityPreset: "🎙️ Alta Fidelidad (RMVPE)",
|
||||
selectModel: "Seleccionar Modelo de Voz (RVC ONNX)",
|
||||
executionProvider: "Proveedor de Ejecución (Aceleración GPU)",
|
||||
routingMode: "Modo de Ruta de Audio",
|
||||
clientMode: "Modo Cliente (Streaming en Navegador)",
|
||||
serverMode: "Modo Servidor (Sounddevice Directo)",
|
||||
serverInput: "Micrófono de Entrada del Servidor",
|
||||
serverOutput: "Altavoz de Salida del Servidor",
|
||||
pitchMethod: "Método de Extracción de Tono",
|
||||
transpose: "Transposición (Modificador de Tono)",
|
||||
transposeMale: "-24 (Tono Grave Masculino)",
|
||||
transposeNormal: "0 (Original)",
|
||||
transposeFemale: "+24 (Tono Agudo/Anime)",
|
||||
|
||||
// DSP
|
||||
dspTitle: "Configuración de Procesamiento de Audio (DSP)",
|
||||
noiseGate: "Puerta de Ruido (Umbral)",
|
||||
noiseGateSens: "-60 dB (Sensible)",
|
||||
noiseGateDefault: "-40 dB (Predeterminado)",
|
||||
noiseGateStrict: "-10 dB (Estricto)",
|
||||
inputGain: "Ganancia de Entrada (Micrófono)",
|
||||
outputGain: "Ganancia de Salida (Volumen IA)",
|
||||
noiseCancel: "Cancelación de Ruido (Filtro)",
|
||||
noiseCancelDesc: "Filtra el eco y el zumbido de fondo",
|
||||
bufferSize: "Tamaño de Búfer (Tamaño de Chunk - Latencia vs Estabilidad)",
|
||||
|
||||
// Visualizers
|
||||
visualizerTitle: "Visualizador de Ondas de Audio",
|
||||
micSignal: "Señal de Entrada del Micrófono",
|
||||
aiSignal: "Señal de Salida de Voz IA",
|
||||
activeSignal: "Señal Activa",
|
||||
pipStream: "Forma de Onda PiP",
|
||||
pipClose: "Cerrar PiP",
|
||||
|
||||
// HUD
|
||||
hudLatency: "Latencia RTT",
|
||||
hudInference: "Velocidad de Inferencia",
|
||||
hudDetector: "Detector de Voz",
|
||||
hudTalking: "Hablando",
|
||||
hudSilent: "Silencio",
|
||||
hudSr: "Frecuencia del Modelo",
|
||||
hudHelp: "Presione ? para ver el menú de atajos",
|
||||
|
||||
// Shortcuts Dialog
|
||||
shortcutsTitle: "Guía de Atajos de Teclado",
|
||||
shortcutsDesc: "Utilice los siguientes atajos para controlar el panel de control sin el mouse:",
|
||||
shortcutsClose: "Cerrar",
|
||||
shortcutConnect: "Conectar / Desconectar Servidor WebSocket",
|
||||
shortcutStream: "Iniciar / Detener Modulador de Voz IA",
|
||||
shortcutMute: "Silenciar / Activar Escucha Local de Salida",
|
||||
shortcutPreset1: "Cargar Ajuste: Respuesta Instantánea (PM)",
|
||||
shortcutPreset2: "Cargar Ajuste: Alta Fidelidad (RMVPE)",
|
||||
shortcutHelp: "Abrir / Cerrar Diálogo de Ayuda de Atajos",
|
||||
|
||||
// Premium layouts
|
||||
characterCardTitle: "Voz del Personaje Activo",
|
||||
characterAvatarDesc: "Perfil de pesos de voz cargado actualmente.",
|
||||
welcomeBack: "Centro de Control de Audio en Tiempo Real",
|
||||
currentLang: "Idioma",
|
||||
themeSettings: "Tema de Interfaz y Acento",
|
||||
themeMode: "Modo de Tema",
|
||||
themeDark: "Modo Oscuro",
|
||||
themeLight: "Modo Claro",
|
||||
accentColorLabel: "Color de Acento Global",
|
||||
tabCredits: "Créditos",
|
||||
creditsTitle: "💖 Créditos de Código Abierto",
|
||||
creditsDescription: "ONNX VC es posible gracias a los siguientes increíbles proyectos y bibliotecas de código abierto:",
|
||||
liveTuningTitle: "Ajustes en Vivo",
|
||||
customCanvasTitle: "Ajustes de Canvas",
|
||||
showMicInput: "Mostrar Entrada Mic",
|
||||
showAiOutput: "Mostrar Salida IA",
|
||||
lineWidthLabel: "Grosor de Línea",
|
||||
traceDecayLabel: "Decaimiento del Trazo",
|
||||
inputLineColorLabel: "Color de Línea de Entrada",
|
||||
outputLineColorLabel: "Color de Línea de Salida",
|
||||
creditCreatorTitle: "Creador e Integrador",
|
||||
creditNeuralTitle: "Conversión Neuronal",
|
||||
creditEngineTitle: "Motor de Inferencia",
|
||||
creditPitchTitle: "Extracción de Tono",
|
||||
creditPipelineTitle: "Línea de Transmisión",
|
||||
creditFrameworkTitle: "Marco Frontend",
|
||||
creditDesignTitle: "Diseño y Animación",
|
||||
creditCreatorDesc: "Creadores de la interfaz de cliente ONNX VC e integradores del entorno de control de audio en tiempo real.",
|
||||
creditNeuralDesc: "Arquitectura central de red neuronal para la extracción de características de voz y conversión vocal.",
|
||||
creditEngineDesc: "Acelerador multiplataforma de inferencia de modelos de IA para CPU, GPU CUDA y GPU DirectML de Windows.",
|
||||
creditPitchDesc: "Modelo robusto de estimación de tono mínimo para un seguimiento de tono vocal de alta precisión.",
|
||||
creditPipelineDesc: "Tubería binaria de alta velocidad para la transferencia de tramas PCM float32 nativas entre el cliente y el servidor.",
|
||||
creditFrameworkDesc: "Marco de desarrollo web moderno que compila componentes de React para exportaciones estáticas optimizadas.",
|
||||
creditDesignDesc: "Utilidad de estilos CSS y librerías de animación declarativa para interfaces de usuario interactivas de primera calidad."
|
||||
}
|
||||
};
|
||||
@@ -1,595 +0,0 @@
|
||||
/* ==========================================================================
|
||||
CSS GLOBAL TOKENS & RESET
|
||||
========================================================================== */
|
||||
:root {
|
||||
--bg-dark: #07080e;
|
||||
--bg-card: rgba(13, 17, 30, 0.7);
|
||||
--border-color: rgba(99, 102, 241, 0.18);
|
||||
|
||||
--primary: #6366f1;
|
||||
--primary-glow: rgba(99, 102, 241, 0.4);
|
||||
--accent: #a855f7;
|
||||
--accent-glow: rgba(168, 85, 247, 0.45);
|
||||
--emerald: #10b981;
|
||||
--rose: #ef4444;
|
||||
|
||||
--text-main: #e2e8f0;
|
||||
--text-muted: #94a3b8;
|
||||
--font-header: 'Outfit', 'Inter', -apple-system, BlinkMacSystemFont, sans-serif;
|
||||
--font-body: 'Inter', -apple-system, BlinkMacSystemFont, sans-serif;
|
||||
|
||||
--transition-smooth: all 0.3s cubic-bezier(0.25, 0.8, 0.25, 1);
|
||||
}
|
||||
|
||||
* {
|
||||
box-sizing: border-box;
|
||||
margin: 0;
|
||||
padding: 0;
|
||||
}
|
||||
|
||||
body {
|
||||
background-color: var(--bg-dark);
|
||||
color: var(--text-main);
|
||||
font-family: var(--font-body);
|
||||
min-height: 100vh;
|
||||
overflow-x: hidden;
|
||||
position: relative;
|
||||
padding: 2rem 1.5rem;
|
||||
}
|
||||
|
||||
/* ==========================================================================
|
||||
DYNAMIC GLOWING BACKGROUND
|
||||
========================================================================== */
|
||||
.glow-backdrop {
|
||||
position: fixed;
|
||||
top: 0;
|
||||
left: 0;
|
||||
right: 0;
|
||||
bottom: 0;
|
||||
z-index: -1;
|
||||
background:
|
||||
radial-gradient(circle at 10% 20%, rgba(99, 102, 241, 0.08) 0%, transparent 40%),
|
||||
radial-gradient(circle at 90% 80%, rgba(168, 85, 247, 0.09) 0%, transparent 45%);
|
||||
pointer-events: none;
|
||||
}
|
||||
|
||||
/* ==========================================================================
|
||||
LAYOUT CONTAINER & CARDS
|
||||
========================================================================== */
|
||||
.dashboard-container {
|
||||
max-width: 1200px;
|
||||
margin: 0 auto;
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 1.5rem;
|
||||
}
|
||||
|
||||
.glassmorphism {
|
||||
background: var(--bg-card);
|
||||
backdrop-filter: blur(16px);
|
||||
-webkit-backdrop-filter: blur(16px);
|
||||
border: 1px solid var(--border-color);
|
||||
border-radius: 16px;
|
||||
box-shadow: 0 8px 32px 0 rgba(0, 0, 0, 0.37);
|
||||
transition: var(--transition-smooth);
|
||||
}
|
||||
|
||||
.glassmorphism:hover {
|
||||
border-color: rgba(99, 102, 241, 0.3);
|
||||
box-shadow: 0 10px 40px 0 rgba(99, 102, 241, 0.1);
|
||||
}
|
||||
|
||||
.card {
|
||||
padding: 1.75rem;
|
||||
}
|
||||
|
||||
.card-title {
|
||||
font-family: var(--font-header);
|
||||
font-size: 1.25rem;
|
||||
font-weight: 600;
|
||||
margin-bottom: 1.25rem;
|
||||
background: linear-gradient(135deg, #fff 0%, var(--text-muted) 100%);
|
||||
-webkit-background-clip: text;
|
||||
-webkit-text-fill-color: transparent;
|
||||
border-bottom: 1px solid rgba(255, 255, 255, 0.05);
|
||||
padding-bottom: 0.75rem;
|
||||
}
|
||||
|
||||
/* ==========================================================================
|
||||
APP HEADER
|
||||
========================================================================== */
|
||||
.app-header {
|
||||
text-align: center;
|
||||
margin-bottom: 1rem;
|
||||
}
|
||||
|
||||
.logo-area {
|
||||
display: inline-flex;
|
||||
align-items: center;
|
||||
gap: 0.75rem;
|
||||
margin-bottom: 0.5rem;
|
||||
}
|
||||
|
||||
.logo-area h1 {
|
||||
font-family: var(--font-header);
|
||||
font-size: 2.5rem;
|
||||
font-weight: 800;
|
||||
letter-spacing: -0.5px;
|
||||
background: linear-gradient(135deg, var(--primary) 0%, var(--accent) 100%);
|
||||
-webkit-background-clip: text;
|
||||
-webkit-text-fill-color: transparent;
|
||||
text-shadow: 0 0 40px rgba(99, 102, 241, 0.2);
|
||||
}
|
||||
|
||||
.pulse-indicator {
|
||||
width: 10px;
|
||||
height: 10px;
|
||||
border-radius: 50%;
|
||||
background-color: var(--rose);
|
||||
box-shadow: 0 0 10px var(--rose);
|
||||
}
|
||||
|
||||
.pulse-indicator.active {
|
||||
background-color: var(--emerald);
|
||||
box-shadow: 0 0 10px var(--emerald);
|
||||
animation: pulse 1.8s infinite;
|
||||
}
|
||||
|
||||
.tagline {
|
||||
color: var(--text-muted);
|
||||
font-size: 0.95rem;
|
||||
font-weight: 400;
|
||||
max-width: 600px;
|
||||
margin: 0 auto;
|
||||
}
|
||||
|
||||
/* ==========================================================================
|
||||
DASHBOARD GRID LAYOUT
|
||||
========================================================================== */
|
||||
.dashboard-grid {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(2, 1fr);
|
||||
gap: 1.5rem;
|
||||
}
|
||||
|
||||
@media (max-width: 768px) {
|
||||
.dashboard-grid {
|
||||
grid-template-columns: 1fr;
|
||||
}
|
||||
.col-span-2 {
|
||||
grid-column: span 1 !important;
|
||||
}
|
||||
}
|
||||
|
||||
.col-span-2 {
|
||||
grid-column: span 2;
|
||||
}
|
||||
|
||||
/* ==========================================================================
|
||||
INPUTS & CONTROLS
|
||||
========================================================================== */
|
||||
.control-group {
|
||||
margin-bottom: 1.25rem;
|
||||
}
|
||||
|
||||
.control-group:last-child {
|
||||
margin-bottom: 0;
|
||||
}
|
||||
|
||||
label {
|
||||
display: block;
|
||||
font-size: 0.85rem;
|
||||
font-weight: 500;
|
||||
color: var(--text-muted);
|
||||
margin-bottom: 0.5rem;
|
||||
text-transform: uppercase;
|
||||
letter-spacing: 0.5px;
|
||||
}
|
||||
|
||||
.custom-select {
|
||||
width: 100%;
|
||||
padding: 0.8rem 1rem;
|
||||
background-color: rgba(20, 24, 45, 0.8);
|
||||
border: 1px solid var(--border-color);
|
||||
border-radius: 8px;
|
||||
color: var(--text-main);
|
||||
font-size: 0.9rem;
|
||||
font-family: var(--font-body);
|
||||
outline: none;
|
||||
transition: var(--transition-smooth);
|
||||
cursor: pointer;
|
||||
appearance: none;
|
||||
background-image: url("data:image/svg+xml,%3Csvg xmlns='http://www.w3.org/2000/svg' width='24' height='24' viewBox='0 0 24 24' fill='none' stroke='%2394a3b8' stroke-width='2' stroke-linecap='round' stroke-linejoin='round'%3E%3Cpolyline points='6 9 12 15 18 9'%3E%3C/polyline%3E%3C/svg%3E");
|
||||
background-repeat: no-repeat;
|
||||
background-position: right 1rem center;
|
||||
background-size: 1.2rem;
|
||||
}
|
||||
|
||||
.custom-select:focus {
|
||||
border-color: var(--primary);
|
||||
box-shadow: 0 0 8px var(--primary-glow);
|
||||
}
|
||||
|
||||
.input-group input {
|
||||
background-color: rgba(20, 24, 45, 0.8);
|
||||
border: 1px solid var(--border-color);
|
||||
border-radius: 8px;
|
||||
color: var(--text-main);
|
||||
padding: 0.8rem 1rem;
|
||||
width: 100%;
|
||||
font-family: var(--font-body);
|
||||
font-size: 0.9rem;
|
||||
outline: none;
|
||||
transition: var(--transition-smooth);
|
||||
}
|
||||
|
||||
.input-group input:focus {
|
||||
border-color: var(--primary);
|
||||
box-shadow: 0 0 8px var(--primary-glow);
|
||||
}
|
||||
|
||||
/* ==========================================================================
|
||||
SLIDERS STYLING
|
||||
========================================================================== */
|
||||
.slider-header {
|
||||
display: flex;
|
||||
justify-content: space-between;
|
||||
align-items: center;
|
||||
margin-bottom: 0.25rem;
|
||||
}
|
||||
|
||||
.slider-value {
|
||||
font-family: var(--font-header);
|
||||
font-weight: 600;
|
||||
color: var(--accent);
|
||||
text-shadow: 0 0 8px var(--accent-glow);
|
||||
font-size: 0.95rem;
|
||||
}
|
||||
|
||||
.custom-slider {
|
||||
-webkit-appearance: none;
|
||||
width: 100%;
|
||||
height: 6px;
|
||||
border-radius: 3px;
|
||||
background: rgba(99, 102, 241, 0.15);
|
||||
outline: none;
|
||||
margin: 0.75rem 0;
|
||||
}
|
||||
|
||||
.custom-slider::-webkit-slider-thumb {
|
||||
-webkit-appearance: none;
|
||||
appearance: none;
|
||||
width: 18px;
|
||||
height: 18px;
|
||||
border-radius: 50%;
|
||||
background: linear-gradient(135deg, var(--primary) 0%, var(--accent) 100%);
|
||||
cursor: pointer;
|
||||
box-shadow: 0 0 10px var(--primary-glow);
|
||||
transition: transform 0.1s ease;
|
||||
}
|
||||
|
||||
.custom-slider::-webkit-slider-thumb:hover {
|
||||
transform: scale(1.2);
|
||||
}
|
||||
|
||||
.slider-ticks {
|
||||
display: flex;
|
||||
justify-content: space-between;
|
||||
font-size: 0.75rem;
|
||||
color: var(--text-muted);
|
||||
}
|
||||
|
||||
/* ==========================================================================
|
||||
BUTTONS
|
||||
========================================================================== */
|
||||
.btn {
|
||||
padding: 0.8rem 1.5rem;
|
||||
border-radius: 8px;
|
||||
font-family: var(--font-header);
|
||||
font-weight: 600;
|
||||
font-size: 0.9rem;
|
||||
cursor: pointer;
|
||||
border: none;
|
||||
outline: none;
|
||||
transition: var(--transition-smooth);
|
||||
display: inline-flex;
|
||||
align-items: center;
|
||||
justify-content: center;
|
||||
gap: 0.5rem;
|
||||
}
|
||||
|
||||
.btn-primary {
|
||||
background: linear-gradient(135deg, var(--primary) 0%, #4f46e5 100%);
|
||||
color: white;
|
||||
box-shadow: 0 4px 14px 0 var(--primary-glow);
|
||||
}
|
||||
|
||||
.btn-primary:hover:not(:disabled) {
|
||||
transform: translateY(-2px);
|
||||
box-shadow: 0 6px 20px 0 rgba(99, 102, 241, 0.6);
|
||||
}
|
||||
|
||||
.btn-accent {
|
||||
background: linear-gradient(135deg, var(--accent) 0%, #7c3aed 100%);
|
||||
color: white;
|
||||
box-shadow: 0 4px 14px 0 var(--accent-glow);
|
||||
}
|
||||
|
||||
.btn-accent:hover:not(:disabled) {
|
||||
transform: translateY(-2px);
|
||||
box-shadow: 0 6px 20px 0 rgba(168, 85, 247, 0.65);
|
||||
}
|
||||
|
||||
.btn:active:not(:disabled) {
|
||||
transform: translateY(0);
|
||||
}
|
||||
|
||||
.btn:disabled {
|
||||
opacity: 0.5;
|
||||
cursor: not-allowed;
|
||||
box-shadow: none;
|
||||
}
|
||||
|
||||
/* ==========================================================================
|
||||
CONNECTION BAR
|
||||
========================================================================== */
|
||||
.connection-bar {
|
||||
padding: 1rem 1.5rem !important;
|
||||
}
|
||||
|
||||
.form-row {
|
||||
display: flex;
|
||||
align-items: flex-end;
|
||||
gap: 1.5rem;
|
||||
flex-wrap: wrap;
|
||||
}
|
||||
|
||||
.form-row .input-group {
|
||||
flex: 1;
|
||||
min-width: 250px;
|
||||
}
|
||||
|
||||
.connection-status-container {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
height: 48px;
|
||||
}
|
||||
|
||||
.status-badge {
|
||||
padding: 0.4rem 0.8rem;
|
||||
border-radius: 20px;
|
||||
font-size: 0.8rem;
|
||||
font-weight: 600;
|
||||
text-transform: uppercase;
|
||||
letter-spacing: 0.5px;
|
||||
display: inline-flex;
|
||||
align-items: center;
|
||||
gap: 0.35rem;
|
||||
}
|
||||
|
||||
.status-badge::before {
|
||||
content: '';
|
||||
display: inline-block;
|
||||
width: 6px;
|
||||
height: 6px;
|
||||
border-radius: 50%;
|
||||
}
|
||||
|
||||
.status-badge.connected {
|
||||
background-color: rgba(16, 185, 129, 0.15);
|
||||
color: var(--emerald);
|
||||
border: 1px solid rgba(16, 185, 129, 0.3);
|
||||
}
|
||||
|
||||
.status-badge.connected::before {
|
||||
background-color: var(--emerald);
|
||||
box-shadow: 0 0 6px var(--emerald);
|
||||
}
|
||||
|
||||
.status-badge.disconnected {
|
||||
background-color: rgba(239, 68, 68, 0.15);
|
||||
color: var(--rose);
|
||||
border: 1px solid rgba(239, 68, 68, 0.3);
|
||||
}
|
||||
|
||||
.status-badge.disconnected::before {
|
||||
background-color: var(--rose);
|
||||
box-shadow: 0 0 6px var(--rose);
|
||||
}
|
||||
|
||||
.status-badge.connecting {
|
||||
background-color: rgba(168, 85, 247, 0.15);
|
||||
color: var(--accent);
|
||||
border: 1px solid rgba(168, 85, 247, 0.3);
|
||||
}
|
||||
|
||||
.status-badge.connecting::before {
|
||||
background-color: var(--accent);
|
||||
box-shadow: 0 0 6px var(--accent);
|
||||
animation: blink 1s infinite;
|
||||
}
|
||||
|
||||
.btn-group-row {
|
||||
display: flex;
|
||||
gap: 0.75rem;
|
||||
height: 48px;
|
||||
}
|
||||
|
||||
/* ==========================================================================
|
||||
MODERN RADIO TILES
|
||||
========================================================================== */
|
||||
.radio-group-modern {
|
||||
display: grid;
|
||||
grid-template-columns: repeat(2, 1fr);
|
||||
gap: 0.5rem;
|
||||
}
|
||||
|
||||
.radio-tile {
|
||||
position: relative;
|
||||
cursor: pointer;
|
||||
margin: 0;
|
||||
}
|
||||
|
||||
.radio-tile input {
|
||||
position: absolute;
|
||||
opacity: 0;
|
||||
}
|
||||
|
||||
.tile-label {
|
||||
display: block;
|
||||
padding: 0.6rem;
|
||||
background-color: rgba(20, 24, 45, 0.5);
|
||||
border: 1px solid var(--border-color);
|
||||
border-radius: 8px;
|
||||
text-align: center;
|
||||
font-size: 0.8rem;
|
||||
font-weight: 500;
|
||||
color: var(--text-muted);
|
||||
transition: var(--transition-smooth);
|
||||
}
|
||||
|
||||
.radio-tile input:checked + .tile-label {
|
||||
background-color: rgba(99, 102, 241, 0.12);
|
||||
border-color: var(--primary);
|
||||
color: var(--text-main);
|
||||
box-shadow: 0 0 10px rgba(99, 102, 241, 0.2);
|
||||
}
|
||||
|
||||
.radio-tile:hover .tile-label {
|
||||
border-color: rgba(99, 102, 241, 0.4);
|
||||
}
|
||||
|
||||
/* ==========================================================================
|
||||
OSCILLOSCOPE WAVEFORM CANVASES
|
||||
========================================================================== */
|
||||
.visualizer-row {
|
||||
display: flex;
|
||||
gap: 1.5rem;
|
||||
flex-wrap: wrap;
|
||||
}
|
||||
|
||||
.visualizer-container {
|
||||
flex: 1;
|
||||
min-width: 280px;
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 0.5rem;
|
||||
}
|
||||
|
||||
.vis-label {
|
||||
display: flex;
|
||||
align-items: center;
|
||||
gap: 0.5rem;
|
||||
font-size: 0.8rem;
|
||||
font-weight: 500;
|
||||
color: var(--text-muted);
|
||||
}
|
||||
|
||||
.dot {
|
||||
width: 6px;
|
||||
height: 6px;
|
||||
border-radius: 50%;
|
||||
}
|
||||
|
||||
.input-dot {
|
||||
background-color: var(--primary);
|
||||
box-shadow: 0 0 6px var(--primary);
|
||||
}
|
||||
|
||||
.output-dot {
|
||||
background-color: var(--accent);
|
||||
box-shadow: 0 0 6px var(--accent);
|
||||
}
|
||||
|
||||
.waveform-canvas {
|
||||
width: 100%;
|
||||
height: 150px;
|
||||
background-color: #0b0c13;
|
||||
border-radius: 8px;
|
||||
border: 1px solid rgba(255, 255, 255, 0.03);
|
||||
}
|
||||
|
||||
/* ==========================================================================
|
||||
PERFORMANCE HUD
|
||||
========================================================================== */
|
||||
.performance-hud {
|
||||
display: flex;
|
||||
justify-content: space-between;
|
||||
align-items: center;
|
||||
padding: 0.85rem 1.75rem !important;
|
||||
}
|
||||
|
||||
.hud-item {
|
||||
display: flex;
|
||||
flex-direction: column;
|
||||
gap: 0.15rem;
|
||||
}
|
||||
|
||||
.hud-label {
|
||||
font-size: 0.7rem;
|
||||
text-transform: uppercase;
|
||||
letter-spacing: 1px;
|
||||
color: var(--text-muted);
|
||||
font-weight: 500;
|
||||
}
|
||||
|
||||
.hud-value {
|
||||
font-family: var(--font-header);
|
||||
font-size: 1.1rem;
|
||||
font-weight: 700;
|
||||
color: white;
|
||||
}
|
||||
|
||||
.hud-separator {
|
||||
width: 1px;
|
||||
height: 30px;
|
||||
background-color: rgba(255, 255, 255, 0.08);
|
||||
}
|
||||
|
||||
.hud-value.text-accent {
|
||||
color: var(--accent);
|
||||
text-shadow: 0 0 8px var(--accent-glow);
|
||||
}
|
||||
|
||||
.active-badge {
|
||||
color: var(--emerald);
|
||||
text-shadow: 0 0 6px rgba(16, 185, 129, 0.4);
|
||||
}
|
||||
|
||||
@media (max-width: 600px) {
|
||||
.performance-hud {
|
||||
flex-direction: column;
|
||||
align-items: flex-start;
|
||||
gap: 0.75rem;
|
||||
}
|
||||
.hud-separator {
|
||||
display: none;
|
||||
}
|
||||
}
|
||||
|
||||
/* ==========================================================================
|
||||
KEYFRAME ANIMATIONS
|
||||
========================================================================== */
|
||||
@keyframes pulse {
|
||||
0% {
|
||||
transform: scale(0.9);
|
||||
box-shadow: 0 0 0 0 rgba(16, 185, 129, 0.7);
|
||||
}
|
||||
70% {
|
||||
transform: scale(1.1);
|
||||
box-shadow: 0 0 0 10px rgba(16, 185, 129, 0);
|
||||
}
|
||||
100% {
|
||||
transform: scale(0.9);
|
||||
box-shadow: 0 0 0 0 rgba(16, 185, 129, 0);
|
||||
}
|
||||
}
|
||||
|
||||
@keyframes blink {
|
||||
0%, 100% {
|
||||
opacity: 1;
|
||||
}
|
||||
50% {
|
||||
opacity: 0.4;
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,34 @@
|
||||
{
|
||||
"compilerOptions": {
|
||||
"target": "ES2017",
|
||||
"lib": ["dom", "dom.iterable", "esnext"],
|
||||
"allowJs": true,
|
||||
"skipLibCheck": true,
|
||||
"strict": true,
|
||||
"noEmit": true,
|
||||
"esModuleInterop": true,
|
||||
"module": "esnext",
|
||||
"moduleResolution": "bundler",
|
||||
"resolveJsonModule": true,
|
||||
"isolatedModules": true,
|
||||
"jsx": "react-jsx",
|
||||
"incremental": true,
|
||||
"plugins": [
|
||||
{
|
||||
"name": "next"
|
||||
}
|
||||
],
|
||||
"paths": {
|
||||
"@/*": ["./src/*"]
|
||||
}
|
||||
},
|
||||
"include": [
|
||||
"next-env.d.ts",
|
||||
"**/*.ts",
|
||||
"**/*.tsx",
|
||||
".next/types/**/*.ts",
|
||||
".next/dev/types/**/*.ts",
|
||||
"**/*.mts"
|
||||
],
|
||||
"exclude": ["node_modules"]
|
||||
}
|
||||
@@ -0,0 +1,134 @@
|
||||
import os
|
||||
import sys
|
||||
import torch
|
||||
import argparse
|
||||
import traceback
|
||||
|
||||
# Menambahkan direktori aktif ke path agar lib dapat diimpor
|
||||
sys.path.append(os.getcwd())
|
||||
|
||||
from lib.infer_pack.models_onnx import SynthesizerTrnMsNSFsidM
|
||||
|
||||
def export_model_to_onnx(model_path, output_onnx_path):
|
||||
print(f"Loading PyTorch checkpoint from: {model_path}")
|
||||
try:
|
||||
# Load checkpoint ke CPU
|
||||
cpt = torch.load(model_path, map_location="cpu")
|
||||
except Exception as e:
|
||||
print(f"Error loading checkpoint: {e}")
|
||||
return False
|
||||
|
||||
# Ambil metadata model
|
||||
tgt_sr = cpt["config"][-1]
|
||||
|
||||
# Ambil jumlah spk dari bobot embedding
|
||||
if "emb_g.weight" in cpt["weight"]:
|
||||
n_spk = cpt["weight"]["emb_g.weight"].shape[0]
|
||||
else:
|
||||
n_spk = 1
|
||||
|
||||
# Sesuaikan config spk_embed_dim
|
||||
cpt["config"][-3] = n_spk
|
||||
|
||||
version = cpt.get("version", "v1")
|
||||
if_f0 = cpt.get("f0", 1)
|
||||
|
||||
print(f"Model Version: {version}")
|
||||
print(f"Pitch (F0) Enabled: {if_f0}")
|
||||
print(f"Target Sample Rate: {tgt_sr} Hz")
|
||||
print(f"Number of Speakers: {n_spk}")
|
||||
|
||||
# Inisialisasi model khusus ONNX (SynthesizerTrnMsNSFsidM)
|
||||
# is_half set ke False untuk ekspor dalam FP32 demi kompabilitas ONNX Runtime yang stabil
|
||||
try:
|
||||
net_g = SynthesizerTrnMsNSFsidM(*cpt["config"], version=version, is_half=False)
|
||||
|
||||
# Hapus bagian encoder posterior yang tidak digunakan saat inferensi
|
||||
if hasattr(net_g, "enc_q"):
|
||||
del net_g.enc_q
|
||||
|
||||
# Muat bobot model, biarkan strict=False agar mengabaikan enc_q yang dihapus
|
||||
net_g.load_state_dict(cpt["weight"], strict=False)
|
||||
net_g.eval()
|
||||
|
||||
print("PyTorch model loaded successfully. Preparing dummy inputs...")
|
||||
except Exception as e:
|
||||
print(f"Failed to initialize RVC ONNX model class: {e}")
|
||||
traceback.print_exc()
|
||||
return False
|
||||
|
||||
# Siapkan dummy inputs untuk tracing ekspor
|
||||
test_len = 10 # Panjang sekuens dummy
|
||||
feat_dim = 256 if version == "v1" else 768
|
||||
|
||||
phone = torch.randn(1, test_len, feat_dim, dtype=torch.float32)
|
||||
phone_lengths = torch.tensor([test_len], dtype=torch.int64)
|
||||
pitch = torch.randint(1, 254, (1, test_len), dtype=torch.int64)
|
||||
nsff0 = torch.randn(1, test_len, dtype=torch.float32)
|
||||
g = torch.tensor([0], dtype=torch.int64) # Speaker ID 0
|
||||
rnd = torch.randn(1, 192, test_len, dtype=torch.float32)
|
||||
|
||||
input_names = ["phone", "phone_lengths", "pitch", "nsff0", "g", "rnd"]
|
||||
output_names = ["audio"]
|
||||
|
||||
dynamic_axes = {
|
||||
"phone": {1: "length"},
|
||||
"pitch": {1: "length"},
|
||||
"nsff0": {1: "length"},
|
||||
"rnd": {2: "length"},
|
||||
"audio": {1: "audio_length"}
|
||||
}
|
||||
|
||||
print(f"Exporting model to ONNX format at: {output_onnx_path}")
|
||||
try:
|
||||
torch.onnx.export(
|
||||
net_g,
|
||||
(phone, phone_lengths, pitch, nsff0, g, rnd),
|
||||
output_onnx_path,
|
||||
opset_version=17,
|
||||
input_names=input_names,
|
||||
output_names=output_names,
|
||||
dynamic_axes=dynamic_axes,
|
||||
verbose=False
|
||||
)
|
||||
print("ONNX model exported successfully!")
|
||||
return True
|
||||
except Exception as e:
|
||||
print(f"Error during ONNX export: {e}")
|
||||
traceback.print_exc()
|
||||
return False
|
||||
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser(description="Export RVC PyTorch .pth model to ONNX")
|
||||
parser.add_argument("--model_name", type=str, required=True, help="Nama model di folder weights (nama sub-folder)")
|
||||
parser.add_argument("--output", type=str, default="", help="Path output file ONNX (opsional)")
|
||||
|
||||
args = parser.parse_args()
|
||||
|
||||
model_root = "weights"
|
||||
model_dir = os.path.join(model_root, args.model_name)
|
||||
|
||||
if not os.path.isdir(model_dir):
|
||||
print(f"Error: Folder '{model_dir}' tidak ditemukan!")
|
||||
sys.exit(1)
|
||||
|
||||
pth_files = [f for f in os.listdir(model_dir) if f.endswith(".pth")]
|
||||
if not pth_files:
|
||||
print(f"Error: Tidak ada berkas .pth di dalam folder '{model_dir}'!")
|
||||
sys.exit(1)
|
||||
|
||||
pth_path = os.path.join(model_dir, pth_files[0])
|
||||
|
||||
if args.output:
|
||||
onnx_path = args.output
|
||||
else:
|
||||
# Default simpan di dalam sub-folder weights yang sama
|
||||
onnx_name = os.path.splitext(pth_files[0])[0] + ".onnx"
|
||||
onnx_path = os.path.join(model_dir, onnx_name)
|
||||
|
||||
success = export_model_to_onnx(pth_path, onnx_path)
|
||||
if success:
|
||||
print(f"\nSelesai! Model ONNX disimpan di: {onnx_path}")
|
||||
else:
|
||||
print("\nEkspor gagal!")
|
||||
sys.exit(1)
|
||||
@@ -20,9 +20,6 @@ import logging
|
||||
import traceback
|
||||
import argparse
|
||||
import threading
|
||||
import webbrowser
|
||||
from http.server import SimpleHTTPRequestHandler
|
||||
import socketserver
|
||||
import numpy as np
|
||||
import torch
|
||||
import onnxruntime as ort
|
||||
@@ -616,27 +613,7 @@ async def start_websocket_server(host, port):
|
||||
async with websockets.serve(websocket_handler, host, port):
|
||||
await asyncio.Future()
|
||||
|
||||
# --- HTTP STATIC FILE SERVER FOR FRONTEND ---
|
||||
def start_http_server(port, directory="frontend"):
|
||||
class MyHandler(SimpleHTTPRequestHandler):
|
||||
def __init__(self, *args, **kwargs):
|
||||
# Force serve from directory relative to the project root
|
||||
base_dir = os.path.dirname(os.path.abspath(__file__))
|
||||
full_dir = os.path.join(base_dir, directory)
|
||||
super().__init__(*args, directory=full_dir, **kwargs)
|
||||
|
||||
def log_message(self, format, *args):
|
||||
# Suppress standard logging to prevent console pollution
|
||||
pass
|
||||
|
||||
try:
|
||||
# Create a TCPServer that allows address reuse
|
||||
socketserver.TCPServer.allow_reuse_address = True
|
||||
with socketserver.TCPServer(("", port), MyHandler) as httpd:
|
||||
logger.info(f"Serving HTTP frontend on http://localhost:{port}")
|
||||
httpd.serve_forever()
|
||||
except Exception as e:
|
||||
logger.error(f"Failed to start HTTP server: {e}")
|
||||
|
||||
# --- LOCAL AUDIO DEVICE STREAM MODE ---
|
||||
def run_local_device_mode(model_name, f0_up_key, f0_method, device, input_device, output_device, chunk_size):
|
||||
@@ -709,7 +686,6 @@ if __name__ == "__main__":
|
||||
parser.add_argument("--mode", type=str, default="websocket", choices=["websocket", "device"], help="Server running mode")
|
||||
parser.add_argument("--host", type=str, default="127.0.0.1", help="WebSocket host")
|
||||
parser.add_argument("--port", type=int, default=8765, help="WebSocket port")
|
||||
parser.add_argument("--http_port", type=int, default=8000, help="HTTP static server port for Web UI")
|
||||
parser.add_argument("--model", type=str, default="", help="RVC Model folder name inside weights/")
|
||||
parser.add_argument("--transpose", type=int, default=0, help="Pitch shift in semitones (transpose)")
|
||||
parser.add_argument("--f0_method", type=str, default="pm", choices=["pm", "harvest", "dio", "rmvpe"], help="Pitch extraction method")
|
||||
@@ -731,27 +707,7 @@ if __name__ == "__main__":
|
||||
sys.exit(1)
|
||||
|
||||
if args.mode == "websocket":
|
||||
# 1. Start HTTP Server in a background thread to serve the frontend!
|
||||
http_thread = threading.Thread(
|
||||
target=start_http_server,
|
||||
args=(args.http_port, "frontend"),
|
||||
daemon=True
|
||||
)
|
||||
http_thread.start()
|
||||
|
||||
# 2. Automatically open the Web UI in the default browser!
|
||||
web_ui_url = f"http://127.0.0.1:{args.http_port}"
|
||||
logger.info(f"Automatically launching Web UI at {web_ui_url} in browser...")
|
||||
|
||||
# We give it a tiny delay to ensure the HTTP server socket is open
|
||||
def open_browser():
|
||||
time.sleep(0.5)
|
||||
webbrowser.open(web_ui_url)
|
||||
|
||||
browser_thread = threading.Thread(target=open_browser, daemon=True)
|
||||
browser_thread.start()
|
||||
|
||||
# 3. Start the WebSocket server on the main event loop
|
||||
# Start the WebSocket server on the main event loop
|
||||
try:
|
||||
asyncio.run(start_websocket_server(args.host, args.port))
|
||||
except KeyboardInterrupt:
|
||||
|
||||
@@ -6,10 +6,10 @@ set VENV_PYTHON=..\rvc-tts-webui\venv\Scripts\python.exe
|
||||
|
||||
if exist "%VENV_PYTHON%" (
|
||||
echo Menjalankan menggunakan virtual environment dari rvc-tts-webui...
|
||||
"%VENV_PYTHON%" -u server.py --host 127.0.0.1 --port 8765 --http_port 8000
|
||||
"%VENV_PYTHON%" -u server.py --host 127.0.0.1 --port 8765
|
||||
) else (
|
||||
echo Virtual environment tidak ditemukan, mencoba menggunakan python sistem...
|
||||
python -u server.py --host 127.0.0.1 --port 8765 --http_port 8000
|
||||
python -u server.py --host 127.0.0.1 --port 8765
|
||||
)
|
||||
|
||||
pause
|
||||
|
||||
Reference in New Issue
Block a user