Qwen3.5-27B to 40B Upscale
⚠️ BIG WARNING ⚠️
NOT TO BE USED AS IS, AND REQUIRES FINE-TUNING.
This upscaled model produces gibberish out of the box and currently has a default PPL of 500k.
Send me your support to help me feed the data beast! also taking comissions for universe specific models
Support on Ko-fiModel Description
This model is an interleaved upscale of Qwen3.5-27B to 40B. It expands the base architecture from 64 to 96 layers using an interleaved copying technique.
Upscaling Details:
- Layer Expansion: 64 to 96 layers.
- Copying Strategy: Layers were copied in groups of 4 to successfully keep the 3>1 linear to full attention ratio.
- Added Noise: Noise was purposefully introduced during the upscaling process to aid in future fine-tuning recovery.
✦ Layerso_proj,down_proj, andout_projwere mapped with σ = 0.000625.
✦ The remaining layers were mapped with σ = 0.0025.
Acknowledgements
- Credit to Qwen for the powerful Qwen3 architecture and for releasing their work openly.
- Downloads last month
- 549
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support