yourmt3 / LOCAL_SETUP.md
asdd12e2ad's picture
asd
c207bc4
# YourMT3+ Local Setup Guide
## πŸš€ Quick Start (Local Installation)
### 1. Install Dependencies
```bash
pip install torch torchaudio transformers gradio pytorch-lightning einops numpy librosa
```
### 2. Setup Model Weights
- Download YourMT3 model weights
- Place them in: `amt/logs/2024/`
- Default expected: `mc13_256_g4_all_v7_mt3f_sqr_rms_moe_wf4_n8k2_silu_rope_rp_b36_nops@last.ckpt`
### 3. Run Setup Check
```bash
cd /path/to/YourMT3
python setup_local.py
```
### 4. Quick Test
```bash
python test_local.py
```
### 5. Launch Web Interface
```bash
python app.py
```
Then open: http://127.0.0.1:7860
## 🎯 New Features
### Instrument Conditioning
- **Problem**: YourMT3+ switches instruments mid-track (vocals β†’ violin β†’ guitar)
- **Solution**: Select target instrument from dropdown
- **Options**: Auto, Vocals, Guitar, Piano, Violin, Drums, Bass, Saxophone, Flute
### How It Works
1. **Upload audio** or paste YouTube URL
2. **Select instrument** from dropdown menu
3. **Click Transcribe**
4. **Get focused transcription** without instrument confusion
## πŸ”§ Troubleshooting
### "Unknown event type: transcribe_singing"
**This is expected!** The error indicates your model doesn't have special task tokens, which is normal. The system will:
1. Try task tokens (may fail - that's OK)
2. Fall back to post-processing filtering
3. Still give you better results
### Debug Output
Look for these messages in console:
```
=== TRANSCRIBE FUNCTION CALLED ===
Audio file: /path/to/audio.wav
Instrument hint: vocals
=== INSTRUMENT CONDITIONING ACTIVATED ===
Model Task Configuration Debug:
βœ“ Model has task_manager
Task name: mc13_full_plus_256
Available subtask prefixes: ['default']
=== APPLYING INSTRUMENT FILTER ===
Found instruments in transcription: {0: 45, 100: 123, 40: 12}
Primary instrument: 100 (73% of notes)
Target program for vocals: 100
Converted 57 notes to primary instrument 100
```
### Common Issues
**1. Import Errors**
```bash
pip install torch torchaudio transformers gradio pytorch-lightning
```
**2. Model Not Found**
- Download model weights to `amt/logs/2024/`
- Check filename matches exactly
**3. No Audio Examples**
- Place test audio files in `examples/` folder
- Supported formats: .wav, .mp3
**4. Port Already in Use**
- Web interface runs on port 7860
- If busy, it will try 7861, 7862, etc.
## πŸ“Š Expected Results
### Before (Original YourMT3+)
- Vocals file β†’ outputs: vocals + violin + guitar tracks
- Saxophone solo β†’ incomplete transcription
- Flute solo β†’ single note only
### After (With Instrument Conditioning)
- Select "Vocals/Singing" β†’ clean vocal transcription only
- Select "Saxophone" β†’ complete saxophone solo
- Select "Flute" β†’ full flute transcription
## πŸ› οΈ Advanced Usage
### Command Line
```bash
python transcribe_cli.py audio.wav --instrument vocals --verbose
```
### Python API
```python
from model_helper import transcribe, load_model_checkpoint
# Load model
model = load_model_checkpoint(args=model_args, device="cuda")
# Transcribe with instrument conditioning
midifile = transcribe(model, audio_info, instrument_hint="vocals")
```
### Confidence Tuning
- High confidence (0.8): Strict instrument filtering
- Low confidence (0.4): Allows more mixed content
- Auto-adjusts based on task token availability
## πŸ“ Files Modified
- `app.py` - Added instrument dropdown to web interface
- `model_helper.py` - Enhanced transcription with conditioning
- `transcribe_cli.py` - New command-line interface
- `setup_local.py` - Local setup checker
- `test_local.py` - Quick functionality test
## 🎡 Enjoy Better Transcriptions!
No more instrument confusion - you now have full control over what gets transcribed! πŸŽ‰