Files

3.9 KiB

🤖 ONNX VC - Agent Guidelines & Architecture Map

This file serves as a guide for AI agents (Gemini, Claude, Cursor, etc.) working on the ONNX Voice Changer repository. It explains the project architecture, directory structure, core conventions, and how to maintain the codebase.


🛠️ Technology Stack

  1. Backend: Python 3.10+, WebSocket (using websockets), ONNX Runtime, NumPy, PyTorch (only for RVC export).
  2. Frontend: Next.js 15 (App Router), React 19, TypeScript, Tailwind CSS, Lucide React, Framer Motion.
  3. Voice Conversion: Retrieval-based Voice Conversion (RVC) models accelerated via ONNX Runtime (CPU, CUDA, DirectML).

📁 Repository Structure

  • /server.py — The main WebSocket API server.
  • /frontend/ — Next.js 15 client dashboard app.
  • /frontend-deprecated/ — Legacy static single-page web files (HTML/CSS/JS). Do not modify.
  • /docs/ — Holds localized README documentation files (Indonesian, Spanish, Japanese, Chinese).
  • /lib/ — RVC models and export scripts (e.g. export_onnx.py).
  • /weights/ — Character voice models (e.g., weights/HuTao/HuTao.onnx).
  • /pretrained/ — Holds the pre-trained vec-768-layer-12.onnx ContentVec model.

⚙️ Core Architecture & Conventions

1. Pure API Backend (No Static Hosting)

  • Rule: The Python backend (server.py) operates strictly as a WebSocket API.
  • Do NOT configure Python to serve frontend static pages, build files, or index HTML.
  • The Next.js frontend client runs independently (via npm run dev or a separate production server).

2. WebSocket Audio Pipeline

  • Audio chunks are sent to and from the server as binary WebSocket messages containing raw Float32 PCM audio data.
  • Configuration changes, telemetry, and status controls are handled using JSON WebSocket messages sent in the same connection.
  • Always check the message payload type (binary vs. string JSON text) in server.py.

3. Digital Signal Processing (DSP) Staging

  • Audio preprocessing is handled on the server side:
    1. Low-Cut Filter: Active Butterworth 1st order high-pass filter at 80Hz to eliminate AC hum.
    2. Noise Gate: Threshold-based silence gate to bypass inference when the user is silent.
    3. Gain Controls: Input and output gain staging before and after inference.
  • Ensure all DSP math is optimized using numpy arrays to maintain low latency.

4. RVC ONNX Export

  • PyTorch RVC models (.pth) must be converted to ONNX (.onnx) before inference.
  • Always use /lib/export_onnx.py for conversion:
    python lib/export_onnx.py --model_name <CharacterFolder>
    

🎨 Frontend Design Guidelines

  • Responsive Layout: Must support mobile and desktop views, utilizing a collapsible sidebar.
  • Themes & Accent Colors: Supports dark/light mode toggling, with a custom accent color system (Purple, Blue, Emerald, Rose, Amber) stored in state.
  • i18n Translation: Do not hardcode English/Indonesian strings. Ensure all labels, warnings, and messages are registered in translations.ts.

🏃 Useful Development Commands

Running Backend

python server.py --host 127.0.0.1 --port 8765 --device cuda

Running Frontend Dev Server

cd frontend
npm run dev

Building Frontend Production Server

cd frontend
npm run build
npm run start