Qwen3.5-27B to 40B Upscale

⚠️ BIG WARNING ⚠️

NOT TO BE USED AS IS, AND REQUIRES FINE-TUNING.

This upscaled model produces gibberish out of the box and currently has a default PPL of 500k.

Send me your support to help me feed the data beast! also taking comissions for universe specific models

Support on Ko-fi

Model Description

This model is an interleaved upscale of Qwen3.5-27B to 40B. It expands the base architecture from 64 to 96 layers using an interleaved copying technique.

Upscaling Details:

  • Layer Expansion: 64 to 96 layers.
  • Copying Strategy: Layers were copied in groups of 4 to successfully keep the 3>1 linear to full attention ratio.
  • Added Noise: Noise was purposefully introduced during the upscaling process to aid in future fine-tuning recovery.
    ✦ Layers o_proj, down_proj, and out_proj were mapped with σ = 0.000625.
    ✦ The remaining layers were mapped with σ = 0.0025.

Acknowledgements

  • Credit to Qwen for the powerful Qwen3 architecture and for releasing their work openly.
Downloads last month
549
Safetensors
Model size
40B params
Tensor type
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Darkhn/Qwen3.5-40B

Base model

Qwen/Qwen3.5-27B
Finetuned
(176)
this model
Adapters
1 model
Finetunes
1 model