3.9 KiB
3.9 KiB
🤖 ONNX VC - Agent Guidelines & Architecture Map
This file serves as a guide for AI agents (Gemini, Claude, Cursor, etc.) working on the ONNX Voice Changer repository. It explains the project architecture, directory structure, core conventions, and how to maintain the codebase.
🛠️ Technology Stack
- Backend: Python 3.10+, WebSocket (using
websockets), ONNX Runtime, NumPy, PyTorch (only for RVC export). - Frontend: Next.js 15 (App Router), React 19, TypeScript, Tailwind CSS, Lucide React, Framer Motion.
- Voice Conversion: Retrieval-based Voice Conversion (RVC) models accelerated via ONNX Runtime (CPU, CUDA, DirectML).
📁 Repository Structure
- /server.py — The main WebSocket API server.
- /frontend/ — Next.js 15 client dashboard app.
- /frontend-deprecated/ — Legacy static single-page web files (HTML/CSS/JS). Do not modify.
- /docs/ — Holds localized README documentation files (Indonesian, Spanish, Japanese, Chinese).
- /lib/ — RVC models and export scripts (e.g.
export_onnx.py). - /weights/ — Character voice models (e.g.,
weights/HuTao/HuTao.onnx). - /pretrained/ — Holds the pre-trained
vec-768-layer-12.onnxContentVec model.
⚙️ Core Architecture & Conventions
1. Pure API Backend (No Static Hosting)
- Rule: The Python backend (
server.py) operates strictly as a WebSocket API. - Do NOT configure Python to serve frontend static pages, build files, or index HTML.
- The Next.js frontend client runs independently (via
npm run devor a separate production server).
2. WebSocket Audio Pipeline
- Audio chunks are sent to and from the server as binary WebSocket messages containing raw
Float32PCM audio data. - Configuration changes, telemetry, and status controls are handled using JSON WebSocket messages sent in the same connection.
- Always check the message payload type (binary vs. string JSON text) in
server.py.
3. Digital Signal Processing (DSP) Staging
- Audio preprocessing is handled on the server side:
- Low-Cut Filter: Active Butterworth 1st order high-pass filter at 80Hz to eliminate AC hum.
- Noise Gate: Threshold-based silence gate to bypass inference when the user is silent.
- Gain Controls: Input and output gain staging before and after inference.
- Ensure all DSP math is optimized using
numpyarrays to maintain low latency.
4. RVC ONNX Export
- PyTorch RVC models (
.pth) must be converted to ONNX (.onnx) before inference. - Always use /lib/export_onnx.py for conversion:
python lib/export_onnx.py --model_name <CharacterFolder>
🎨 Frontend Design Guidelines
- Responsive Layout: Must support mobile and desktop views, utilizing a collapsible sidebar.
- Themes & Accent Colors: Supports dark/light mode toggling, with a custom accent color system (Purple, Blue, Emerald, Rose, Amber) stored in state.
- i18n Translation: Do not hardcode English/Indonesian strings. Ensure all labels, warnings, and messages are registered in translations.ts.
🏃 Useful Development Commands
Running Backend
python server.py --host 127.0.0.1 --port 8765 --device cuda
Running Frontend Dev Server
cd frontend
npm run dev
Building Frontend Production Server
cd frontend
npm run build
npm run start