From 34fa3718975da3c6c1ed5875f288ac0a7fb94739 Mon Sep 17 00:00:00 2001 From: akukanara Date: Sun, 31 May 2026 17:15:37 +0700 Subject: [PATCH] docs: add root AGENTS.md guide for AI assistants --- AGENTS.md | 78 +++++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 78 insertions(+) create mode 100644 AGENTS.md diff --git a/AGENTS.md b/AGENTS.md new file mode 100644 index 0000000..1593357 --- /dev/null +++ b/AGENTS.md @@ -0,0 +1,78 @@ +# 🤖 ONNX VC - Agent Guidelines & Architecture Map + +This file serves as a guide for AI agents (Gemini, Claude, Cursor, etc.) working on the ONNX Voice Changer repository. It explains the project architecture, directory structure, core conventions, and how to maintain the codebase. + +--- + +## 🛠️ Technology Stack +1. **Backend:** Python 3.10+, WebSocket (using `websockets`), ONNX Runtime, NumPy, PyTorch (only for RVC export). +2. **Frontend:** Next.js 15 (App Router), React 19, TypeScript, Tailwind CSS, Lucide React, Framer Motion. +3. **Voice Conversion:** Retrieval-based Voice Conversion (RVC) models accelerated via ONNX Runtime (CPU, CUDA, DirectML). + +--- + +## 📁 Repository Structure +* [/server.py](file:///M:/Users/ahmad/project/onnx-voice-changer/server.py) — The main WebSocket API server. +* [/frontend/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend) — Next.js 15 client dashboard app. +* [/frontend-deprecated/](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend-deprecated) — Legacy static single-page web files (HTML/CSS/JS). Do not modify. +* [/docs/](file:///M:/Users/ahmad/project/onnx-voice-changer/docs) — Holds localized README documentation files (Indonesian, Spanish, Japanese, Chinese). +* [/lib/](file:///M:/Users/ahmad/project/onnx-voice-changer/lib) — RVC models and export scripts (e.g. `export_onnx.py`). +* [/weights/](file:///M:/Users/ahmad/project/onnx-voice-changer/weights) — Character voice models (e.g., `weights/HuTao/HuTao.onnx`). +* [/pretrained/](file:///M:/Users/ahmad/project/onnx-voice-changer/pretrained) — Holds the pre-trained `vec-768-layer-12.onnx` ContentVec model. + +--- + +## ⚙️ Core Architecture & Conventions + +### 1. Pure API Backend (No Static Hosting) +* **Rule:** The Python backend (`server.py`) operates **strictly as a WebSocket API**. +* **Do NOT** configure Python to serve frontend static pages, build files, or index HTML. +* The Next.js frontend client runs independently (via `npm run dev` or a separate production server). + +### 2. WebSocket Audio Pipeline +* Audio chunks are sent to and from the server as **binary WebSocket messages** containing raw `Float32` PCM audio data. +* Configuration changes, telemetry, and status controls are handled using **JSON WebSocket messages** sent in the same connection. +* Always check the message payload type (binary vs. string JSON text) in `server.py`. + +### 3. Digital Signal Processing (DSP) Staging +* Audio preprocessing is handled on the server side: + 1. **Low-Cut Filter:** Active Butterworth 1st order high-pass filter at 80Hz to eliminate AC hum. + 2. **Noise Gate:** Threshold-based silence gate to bypass inference when the user is silent. + 3. **Gain Controls:** Input and output gain staging before and after inference. +* Ensure all DSP math is optimized using `numpy` arrays to maintain low latency. + +### 4. RVC ONNX Export +* PyTorch RVC models (`.pth`) must be converted to ONNX (`.onnx`) before inference. +* Always use [/lib/export_onnx.py](file:///M:/Users/ahmad/project/onnx-voice-changer/lib/export_onnx.py) for conversion: + ```bash + python lib/export_onnx.py --model_name + ``` + +--- + +## 🎨 Frontend Design Guidelines +* **Responsive Layout:** Must support mobile and desktop views, utilizing a collapsible sidebar. +* **Themes & Accent Colors:** Supports dark/light mode toggling, with a custom accent color system (Purple, Blue, Emerald, Rose, Amber) stored in state. +* **i18n Translation:** Do not hardcode English/Indonesian strings. Ensure all labels, warnings, and messages are registered in [translations.ts](file:///M:/Users/ahmad/project/onnx-voice-changer/frontend/src/utils/translations.ts). + +--- + +## 🏃 Useful Development Commands + +### Running Backend +```bash +python server.py --host 127.0.0.1 --port 8765 --device cuda +``` + +### Running Frontend Dev Server +```bash +cd frontend +npm run dev +``` + +### Building Frontend Production Server +```bash +cd frontend +npm run build +npm run start +```