IndexTTS-Rust / CLAUDE.md

Refactor: Remove internationalization (i18n) support and related files

e3e7558 18 days ago

4.4 kB

	# CLAUDE.md

	This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.

	## Project Overview

	IndexTTS-Rust is a high-performance Text-to-Speech engine, a complete Rust rewrite of the Python IndexTTS system. It uses ONNX Runtime for neural network inference and provides zero-shot voice cloning with emotion control.

	## Build and Development Commands

	```bash
	# Build (always build release for performance testing)
	cargo build --release

	# Run linter (MANDATORY before commits - catches many issues)
	cargo clippy -- -D warnings

	# Run tests
	cargo test

	# Run specific test
	cargo test test_name

	# Run benchmarks (Criterion-based)
	cargo bench

	# Run specific benchmark
	cargo bench --bench mel_spectrogram
	cargo bench --bench inference

	# Check compilation without building
	cargo check

	# Format code
	cargo fmt

	# Full pre-commit workflow (BUILD -> CLIPPY -> BUILD)
	cargo build --release && cargo clippy -- -D warnings && cargo build --release
	```

	## CLI Usage

	```bash
	# Show help
	./target/release/indextts --help

	# Synthesize speech
	./target/release/indextts synthesize \
	--text "Hello world" \
	--voice examples/voice_01.wav \
	--output output.wav

	# Generate default config
	./target/release/indextts init-config -o config.yaml

	# Show system info
	./target/release/indextts info

	# Run built-in benchmarks
	./target/release/indextts benchmark --iterations 100
	```

	## Architecture

	The codebase follows a modular pipeline architecture where each stage processes data sequentially:

	```
	Text Input → Normalization → Tokenization → Model Inference → Vocoding → Audio Output
	```

	### Core Modules (src/)

	- audio/ - Audio DSP operations
	- `mel.rs` - Mel-spectrogram computation (STFT, filterbanks)
	- `io.rs` - WAV file I/O using hound
	- `dsp.rs` - Signal processing utilities
	- `resample.rs` - Audio resampling using rubato

	- text/ - Text processing pipeline
	- `normalizer.rs` - Text normalization (Chinese/English/mixed)
	- `tokenizer.rs` - BPE tokenization via HuggingFace tokenizers
	- `phoneme.rs` - Grapheme-to-phoneme conversion

	- model/ - Neural network inference
	- `session.rs` - ONNX Runtime wrapper (load-dynamic feature)
	- `gpt.rs` - GPT-based sequence generation
	- `embedding.rs` - Speaker and emotion encoders

	- vocoder/ - Neural vocoding
	- `bigvgan.rs` - BigVGAN waveform synthesis
	- `activations.rs` - Snake/SnakeBeta activation functions

	- pipeline/ - TTS orchestration
	- `synthesis.rs` - Main synthesis logic, coordinates all modules

	- config/ - Configuration management (YAML-based via serde)

	- error.rs - Error types using thiserror

	- lib.rs - Library entry point, exposes public API

	- main.rs - CLI entry point using clap

	### Key Constants (lib.rs)

	```rust
	pub const SAMPLE_RATE: u32 = 22050; // Output audio sample rate
	pub const N_MELS: usize = 80; // Mel filterbank channels
	pub const N_FFT: usize = 1024; // FFT size
	pub const HOP_LENGTH: usize = 256; // STFT hop length
	```

	### Dependencies Pattern

	- Audio: hound (WAV), rustfft/realfft (DSP), rubato (resampling), dasp (signal processing)
	- ML Inference: ort (ONNX Runtime with load-dynamic), ndarray, safetensors
	- Text: tokenizers (HuggingFace), jieba-rs (Chinese), regex, unicode-segmentation
	- Parallelism: rayon (data parallelism), tokio (async)
	- CLI: clap (derive), env_logger, indicatif

	## Important Notes

	1. ONNX Runtime: Uses `load-dynamic` feature - requires ONNX Runtime library installed on system
	2. Model Files: ONNX models go in `models/` directory (not in git, download separately)
	3. Reference Implementation: Python code in `indextts - REMOVING - REF ONLY/` is kept for reference only
	4. Performance: Release builds use LTO and single codegen-unit for maximum optimization
	5. Audio Format: All internal processing at 22050 Hz, 80-band mel spectrograms

	## Testing Strategy

	- Unit tests inline in modules
	- Criterion benchmarks in `benches/` for performance regression testing
	- Python regression tests in `tests/` for end-to-end validation
	- Example audio files in `examples/` for testing voice cloning

	## Missing Infrastructure (TODO)

	- No `scripts/manage.sh` yet (should include build, test, clean, docker controls)
	- No `context.md` yet for conversation continuity
	- No integration tests with actual ONNX models