Files
onnx-voice-changer/AGENTS.md
T

79 lines
3.9 KiB
Markdown

# 🤖 ONNX VC - Agent Guidelines & Architecture Map
This file serves as a guide for AI agents (Gemini, Claude, Cursor, etc.) working on the ONNX Voice Changer repository. It explains the project architecture, directory structure, core conventions, and how to maintain the codebase.
---
## 🛠️ Technology Stack
1. **Backend:** Python 3.10+, WebSocket (using `websockets`), ONNX Runtime, NumPy, PyTorch (only for RVC export).
2. **Frontend:** Next.js 15 (App Router), React 19, TypeScript, Tailwind CSS, Lucide React, Framer Motion.
3. **Voice Conversion:** Retrieval-based Voice Conversion (RVC) models accelerated via ONNX Runtime (CPU, CUDA, DirectML).
---
## 📁 Repository Structure
* [/server.py](file:///M:/Users/ahmad/project/onnx-voice-changer/server.py) — The main WebSocket API server.
* [/frontend/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend) — Next.js 15 client dashboard app.
* [/frontend-deprecated/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend-deprecated) — Legacy static single-page web files (HTML/CSS/JS). Do not modify.
* [/docs/](file:///M:/Users/ahmad/project/onnx-voice-changer/docs) — Holds localized README documentation files (Indonesian, Spanish, Japanese, Chinese).
* [/lib/](file:///M:/Users/ahmad/project/onnx-voice-changer/lib) — RVC models and export scripts (e.g. `export_onnx.py`).
* [/weights/](file:///M:/Users/ahmad/project/onnx-voice-changer/weights) — Character voice models (e.g., `weights/HuTao/HuTao.onnx`).
* [/pretrained/](file:///M:/Users/ahmad/project/onnx-voice-changer/pretrained) — Holds the pre-trained `vec-768-layer-12.onnx` ContentVec model.
---
## ⚙️ Core Architecture & Conventions
### 1. Pure API Backend (No Static Hosting)
* **Rule:** The Python backend (`server.py`) operates **strictly as a WebSocket API**.
* **Do NOT** configure Python to serve frontend static pages, build files, or index HTML.
* The Next.js frontend client runs independently (via `npm run dev` or a separate production server).
### 2. WebSocket Audio Pipeline
* Audio chunks are sent to and from the server as **binary WebSocket messages** containing raw `Float32` PCM audio data.
* Configuration changes, telemetry, and status controls are handled using **JSON WebSocket messages** sent in the same connection.
* Always check the message payload type (binary vs. string JSON text) in `server.py`.
### 3. Digital Signal Processing (DSP) Staging
* Audio preprocessing is handled on the server side:
1. **Low-Cut Filter:** Active Butterworth 1st order high-pass filter at 80Hz to eliminate AC hum.
2. **Noise Gate:** Threshold-based silence gate to bypass inference when the user is silent.
3. **Gain Controls:** Input and output gain staging before and after inference.
* Ensure all DSP math is optimized using `numpy` arrays to maintain low latency.
### 4. RVC ONNX Export
* PyTorch RVC models (`.pth`) must be converted to ONNX (`.onnx`) before inference.
* Always use [/lib/export_onnx.py](file:///M:/Users/ahmad/project/onnx-voice-changer/lib/export_onnx.py) for conversion:
```bash
python lib/export_onnx.py --model_name <CharacterFolder>
```
---
## 🎨 Frontend Design Guidelines
* **Responsive Layout:** Must support mobile and desktop views, utilizing a collapsible sidebar.
* **Themes & Accent Colors:** Supports dark/light mode toggling, with a custom accent color system (Purple, Blue, Emerald, Rose, Amber) stored in state.
* **i18n Translation:** Do not hardcode English/Indonesian strings. Ensure all labels, warnings, and messages are registered in [translations.ts](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend/src/utils/translations.ts).
---
## 🏃 Useful Development Commands
### Running Backend
```bash
python server.py --host 127.0.0.1 --port 8765 --device cuda
```
### Running Frontend Dev Server
```bash
cd frontend
npm run dev
```
### Building Frontend Production Server
```bash
cd frontend
npm run build
npm run start
```