🤖 ONNX VC - Agent Guidelines & Architecture Map

This file serves as a guide for AI agents (Gemini, Claude, Cursor, etc.) working on the ONNX Voice Changer repository. It explains the project architecture, directory structure, core conventions, and how to maintain the codebase.

🛠️ Technology Stack

Backend: Python 3.10+, WebSocket (using websockets), ONNX Runtime, NumPy, PyTorch (only for RVC export).
Frontend: Next.js 15 (App Router), React 19, TypeScript, Tailwind CSS, Lucide React, Framer Motion.
Voice Conversion: Retrieval-based Voice Conversion (RVC) models accelerated via ONNX Runtime (CPU, CUDA, DirectML).

📁 Repository Structure

/server.py — The main WebSocket API server.
/frontend/ — Next.js 15 client dashboard app.
/frontend-deprecated/ — Legacy static single-page web files (HTML/CSS/JS). Do not modify.
/docs/ — Holds localized README documentation files (Indonesian, Spanish, Japanese, Chinese).
/lib/ — RVC models and export scripts (e.g. export_onnx.py).
/weights/ — Character voice models (e.g., weights/HuTao/HuTao.onnx).
/pretrained/ — Holds the pre-trained vec-768-layer-12.onnx ContentVec model.

⚙️ Core Architecture & Conventions

1. Pure API Backend (No Static Hosting)

Rule: The Python backend (server.py) operates strictly as a WebSocket API.
Do NOT configure Python to serve frontend static pages, build files, or index HTML.
The Next.js frontend client runs independently (via npm run dev or a separate production server).

2. WebSocket Audio Pipeline

Audio chunks are sent to and from the server as binary WebSocket messages containing raw Float32 PCM audio data.
Configuration changes, telemetry, and status controls are handled using JSON WebSocket messages sent in the same connection.
Always check the message payload type (binary vs. string JSON text) in server.py.

3. Digital Signal Processing (DSP) Staging

Audio preprocessing is handled on the server side:
1. Low-Cut Filter: Active Butterworth 1st order high-pass filter at 80Hz to eliminate AC hum.
2. Noise Gate: Threshold-based silence gate to bypass inference when the user is silent.
3. Gain Controls: Input and output gain staging before and after inference.
Ensure all DSP math is optimized using numpy arrays to maintain low latency.

4. RVC ONNX Export

PyTorch RVC models (.pth) must be converted to ONNX (.onnx) before inference.

Always use /lib/export_onnx.py for conversion:

python lib/export_onnx.py --model_name <CharacterFolder>

🎨 Frontend Design Guidelines

Responsive Layout: Must support mobile and desktop views, utilizing a collapsible sidebar.
Themes & Accent Colors: Supports dark/light mode toggling, with a custom accent color system (Purple, Blue, Emerald, Rose, Amber) stored in state.
i18n Translation: Do not hardcode English/Indonesian strings. Ensure all labels, warnings, and messages are registered in translations.ts.

🏃 Useful Development Commands

Running Backend

python server.py --host 127.0.0.1 --port 8765 --device cuda

Running Frontend Dev Server

cd frontend
npm run dev

Building Frontend Production Server

cd frontend
npm run build
npm run start

3.9 KiB Raw Blame History