# 🤖 ONNX VC - Agent Guidelines & Architecture Map This file serves as a guide for AI agents (Gemini, Claude, Cursor, etc.) working on the ONNX Voice Changer repository. It explains the project architecture, directory structure, core conventions, and how to maintain the codebase. --- ## 🛠️ Technology Stack 1. **Backend:** Python 3.10+, WebSocket (using `websockets`), ONNX Runtime, NumPy, PyTorch (only for RVC export). 2. **Frontend:** Next.js 15 (App Router), React 19, TypeScript, Tailwind CSS, Lucide React, Framer Motion. 3. **Voice Conversion:** Retrieval-based Voice Conversion (RVC) models accelerated via ONNX Runtime (CPU, CUDA, DirectML). --- ## 📁 Repository Structure * [/server.py](file:///M:/Users/ahmad/project/onnx-voice-changer/server.py) — The main WebSocket API server. * [/frontend/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend) — Next.js 15 client dashboard app. * [/frontend-deprecated/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend-deprecated) — Legacy static single-page web files (HTML/CSS/JS). Do not modify. * [/docs/](file:///M:/Users/ahmad/project/onnx-voice-changer/docs) — Holds localized README documentation files (Indonesian, Spanish, Japanese, Chinese). * [/lib/](file:///M:/Users/ahmad/project/onnx-voice-changer/lib) — RVC models and export scripts (e.g. `export_onnx.py`). * [/weights/](file:///M:/Users/ahmad/project/onnx-voice-changer/weights) — Character voice models (e.g., `weights/HuTao/HuTao.onnx`). * [/pretrained/](file:///M:/Users/ahmad/project/onnx-voice-changer/pretrained) — Holds the pre-trained `vec-768-layer-12.onnx` ContentVec model. --- ## ⚙️ Core Architecture & Conventions ### 1. Pure API Backend (No Static Hosting) * **Rule:** The Python backend (`server.py`) operates **strictly as a WebSocket API**. * **Do NOT** configure Python to serve frontend static pages, build files, or index HTML. * The Next.js frontend client runs independently (via `npm run dev` or a separate production server). ### 2. WebSocket Audio Pipeline * Audio chunks are sent to and from the server as **binary WebSocket messages** containing raw `Float32` PCM audio data. * Configuration changes, telemetry, and status controls are handled using **JSON WebSocket messages** sent in the same connection. * Always check the message payload type (binary vs. string JSON text) in `server.py`. ### 3. Digital Signal Processing (DSP) Staging * Audio preprocessing is handled on the server side: 1. **Low-Cut Filter:** Active Butterworth 1st order high-pass filter at 80Hz to eliminate AC hum. 2. **Noise Gate:** Threshold-based silence gate to bypass inference when the user is silent. 3. **Gain Controls:** Input and output gain staging before and after inference. * Ensure all DSP math is optimized using `numpy` arrays to maintain low latency. ### 4. RVC ONNX Export * PyTorch RVC models (`.pth`) must be converted to ONNX (`.onnx`) before inference. * Always use [/lib/export_onnx.py](file:///M:/Users/ahmad/project/onnx-voice-changer/lib/export_onnx.py) for conversion: ```bash python lib/export_onnx.py --model_name ``` --- ## 🎨 Frontend Design Guidelines * **Responsive Layout:** Must support mobile and desktop views, utilizing a collapsible sidebar. * **Themes & Accent Colors:** Supports dark/light mode toggling, with a custom accent color system (Purple, Blue, Emerald, Rose, Amber) stored in state. * **i18n Translation:** Do not hardcode English/Indonesian strings. Ensure all labels, warnings, and messages are registered in [translations.ts](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend/src/utils/translations.ts). --- ## 🏃 Useful Development Commands ### Running Backend ```bash python server.py --host 127.0.0.1 --port 8765 --device cuda ``` ### Running Frontend Dev Server ```bash cd frontend npm run dev ``` ### Building Frontend Production Server ```bash cd frontend npm run build npm run start ```