Spaces:

asdd12e2ad
/

yourmt3

Runtime error

App Files Files Community

yourmt3 / LOCAL_SETUP.md

asdd12e2ad

asd

c207bc4 5 months ago

preview code

raw

history blame contribute delete

3.73 kB

	# YourMT3+ Local Setup Guide

	## 🚀 Quick Start (Local Installation)

	### 1. Install Dependencies
	```bash
	pip install torch torchaudio transformers gradio pytorch-lightning einops numpy librosa
	```

	### 2. Setup Model Weights
	- Download YourMT3 model weights
	- Place them in: `amt/logs/2024/`
	- Default expected: `mc13_256_g4_all_v7_mt3f_sqr_rms_moe_wf4_n8k2_silu_rope_rp_b36_nops@last.ckpt`

	### 3. Run Setup Check
	```bash
	cd /path/to/YourMT3
	python setup_local.py
	```

	### 4. Quick Test
	```bash
	python test_local.py
	```

	### 5. Launch Web Interface
	```bash
	python app.py
	```
	Then open: http://127.0.0.1:7860

	## 🎯 New Features

	### Instrument Conditioning
	- Problem: YourMT3+ switches instruments mid-track (vocals → violin → guitar)
	- Solution: Select target instrument from dropdown
	- Options: Auto, Vocals, Guitar, Piano, Violin, Drums, Bass, Saxophone, Flute

	### How It Works
	1. Upload audio or paste YouTube URL
	2. Select instrument from dropdown menu
	3. Click Transcribe
	4. Get focused transcription without instrument confusion

	## 🔧 Troubleshooting

	### "Unknown event type: transcribe_singing"
	This is expected! The error indicates your model doesn't have special task tokens, which is normal. The system will:
	1. Try task tokens (may fail - that's OK)
	2. Fall back to post-processing filtering
	3. Still give you better results

	### Debug Output
	Look for these messages in console:
	```
	=== TRANSCRIBE FUNCTION CALLED ===
	Audio file: /path/to/audio.wav
	Instrument hint: vocals

	=== INSTRUMENT CONDITIONING ACTIVATED ===
	Model Task Configuration Debug:
	✓ Model has task_manager
	Task name: mc13_full_plus_256
	Available subtask prefixes: ['default']

	=== APPLYING INSTRUMENT FILTER ===
	Found instruments in transcription: {0: 45, 100: 123, 40: 12}
	Primary instrument: 100 (73% of notes)
	Target program for vocals: 100
	Converted 57 notes to primary instrument 100
	```

	### Common Issues

	1. Import Errors
	```bash
	pip install torch torchaudio transformers gradio pytorch-lightning
	```

	2. Model Not Found
	- Download model weights to `amt/logs/2024/`
	- Check filename matches exactly

	3. No Audio Examples
	- Place test audio files in `examples/` folder
	- Supported formats: .wav, .mp3

	4. Port Already in Use
	- Web interface runs on port 7860
	- If busy, it will try 7861, 7862, etc.

	## 📊 Expected Results

	### Before (Original YourMT3+)
	- Vocals file → outputs: vocals + violin + guitar tracks
	- Saxophone solo → incomplete transcription
	- Flute solo → single note only

	### After (With Instrument Conditioning)
	- Select "Vocals/Singing" → clean vocal transcription only
	- Select "Saxophone" → complete saxophone solo
	- Select "Flute" → full flute transcription

	## 🛠️ Advanced Usage

	### Command Line
	```bash
	python transcribe_cli.py audio.wav --instrument vocals --verbose
	```

	### Python API
	```python
	from model_helper import transcribe, load_model_checkpoint

	# Load model
	model = load_model_checkpoint(args=model_args, device="cuda")

	# Transcribe with instrument conditioning
	midifile = transcribe(model, audio_info, instrument_hint="vocals")
	```

	### Confidence Tuning
	- High confidence (0.8): Strict instrument filtering
	- Low confidence (0.4): Allows more mixed content
	- Auto-adjusts based on task token availability

	## 📝 Files Modified

	- `app.py` - Added instrument dropdown to web interface
	- `model_helper.py` - Enhanced transcription with conditioning
	- `transcribe_cli.py` - New command-line interface
	- `setup_local.py` - Local setup checker
	- `test_local.py` - Quick functionality test

	## 🎵 Enjoy Better Transcriptions!

	No more instrument confusion - you now have full control over what gets transcribed! 🎉