akukanara 9d876de930 feat: refactor and optimize frontend with Next.js App Router, TypeScript, and Tailwind CSS
- Complete refactoring of old frontend into Next.js App Router workspace
- Redesigned sidebar collapsing animation with absolute toggle positioning
- Resolved visual canvas bleed transitions between light/dark themes
- Added custom dark theme variant for toggle switch buttons
- Implemented full localization across Indonesian, English, Spanish, Japanese, and Chinese
- Synchronized HTML document themes to apply dark mode styles to portals/overlays
9d876de930 · 2026-05-31 16:46:57 +07:00
4 Commits
2026-05-30 21:30:49 +07:00
2026-05-30 21:30:49 +07:00
2026-05-30 21:30:49 +07:00

🎙️ Standalone ONNX Real-Time Voice Changer Service

A high-performance, low-latency, real-time voice conversion system powered by ONNX Runtime and Retrieval-based Voice Conversion (RVC). This application enables real-time voice conversion from a microphone/browser source to a designated target character model with minimal processing latency.


Key Features

  • 🚀 WebSocket Audio Pipeline: Streaming audio transfer using binary WebSocket connections (raw PCM float32) for minimal overhead.
  • Multi-Backend ONNX Acceleration: Supports execution providers including NVIDIA CUDA, AMD/Intel DirectML, and fallback CPU.
  • 🎼 High-Fidelity DSP Pipeline:
    • Low-Cut Filter: Active 1st order Butterworth high-pass filter at 80Hz to eliminate AC hum and rumble.
    • Noise Gate: Threshold-based noise suppression to bypass inference during silence (saving CPU/GPU cycles).
    • Gain Controls: Independent input/output digital gain staging.
  • 🧠 Advanced Pitch Extraction: Optimized 16kHz pitch prediction using the RMVPE (Retrieval-based Minimum Vocal Pitch Estimation) model.
  • 🌐 Dual Routing Architecture: Supports routing audio via the web browser (Web Audio API) or directly through the server's local audio hardware (using sounddevice).

🛠️ System Architecture

graph TD
    A[Microphone / Web Browser] -->|Web Audio API| B(WebSocket Connection)
    B -->|Raw Float32 PCM Chunk| C[server.py Backend]
    C -->|1. High-Pass Filter 80Hz| D[DSP Stage]
    D -->|2. Gain & Noise Gate| D
    D -->|3. Resample to 16kHz| E[Hubert/ContentVec ONNX]
    D -->|4. Pitch Estimation RMVPE| F[Pitch Predictor]
    E --> G[RVC ONNX Model Inference]
    F --> G
    G -->|Target Audio Chunks| H(WebSocket Connection)
    H -->|Play audio| I[Browser Speakers / Audio Device]

📁 Repository Structure

  • server.py — The main WebSocket backend and static HTTP server managing connection loops, audio resampling, and model execution.
  • start.bat — Windows launcher batch file that automatically resolves the Python virtual environment and executes the server.
  • requirements.txt — Python dependencies list.
  • frontend/ — Contains client-side Web UI files:
  • lib/ — core package containing inference models and prediction tools.
  • weights/ — Directory for voice model weights. Place your custom .onnx and .pth model sub-directories here.
  • pretrained/ — Directory containing base pre-trained models such as vec-768-layer-12.onnx or vec-256-layer-12.onnx.

🚀 Getting Started

📋 Prerequisites

  • Python 3.10+ (Recommended)
  • FFmpeg installed and added to the system PATH (Required for audio processing utilities).
  • (Optional) NVIDIA CUDA Toolkit (v11.x/12.x) and cuDNN for GPU execution acceleration.

📦 Installation

  1. Clone this repository to your local directory.
  2. Initialize and activate a virtual environment (optional but recommended):
    python -m venv venv
    .\venv\Scripts\activate
    
  3. Install the required dependencies:
    pip install -r requirements.txt
    
  4. Place your ContentVec base model (vec-768-layer-12.onnx or vec-256-layer-12.onnx) inside the pretrained/ directory.
  5. Place your character models in weights/ in structured folders (e.g., weights/HuTao/ containing HuTao.onnx and HuTao.pth).

🏃 Running the Server

Option A: Quick Launch (Windows)

Simply double-click the start.bat file. It will automatically detect Python, set up the directory paths, and launch the service.

Option B: Manual CLI execution

Execute the server using your terminal:

python server.py --host 127.0.0.1 --port 8765 --http_port 8000 --device cuda

⚙️ Command-Line Arguments

Argument Description Default
--host The address the WebSocket server binds to. 127.0.0.1
--port WebSocket communication port. 8765
--http_port Port serving the static frontend Web UI. 8000
--device The ONNX Runtime execution device (cpu, cuda, dml). cuda
--model Target folder name in weights/ to load directly upon startup. None

Once the server begins execution, it will spin up the local server, and your Web UI should open automatically at http://localhost:8000.


🔊 Audio DSP Details

To achieve low latency without output artifacts, the audio processing utilizes:

  1. Sliding Window Context Buffer: Keeps a short historical buffer of the audio to feed the model the required context frames while minimizing output audio delay.
  2. Convolution Padding Fadeout: 120ms of trailing silent padding is temporarily appended to input segments to avoid edge-fading anomalies inherent to RVC convolutional steps.
  3. Linear Resampling: Low-overhead linear interpolation for quick sample rate adaptation.
S
Description
Just Simple Onnx Voice Changer
Readme Apache-2.0 233 MiB
Languages
Python 48.2%
TypeScript 32%
JavaScript 12.9%
CSS 6.8%