⚠️ BIG WARNING ⚠️

NOT TO BE USED AS IS, AND REQUIRES FINE-TUNING.

This upscaled model produces gibberish out of the box and currently has a default PPL of 500k.

Send me your support to help me feed the data beast! also taking comissions for universe specific models

Model Description

This model is an interleaved upscale of Qwen3.5-27B to 40B. It expands the base architecture from 64 to 96 layers using an interleaved copying technique.

Upscaling Details:

Layer Expansion: 64 to 96 layers.
Copying Strategy: Layers were copied in groups of 4 to successfully keep the 3>1 linear to full attention ratio.
Added Noise: Noise was purposefully introduced during the upscaling process to aid in future fine-tuning recovery.
✦ Layers o_proj, down_proj, and out_proj were mapped with σ = 0.000625.
✦ The remaining layers were mapped with σ = 0.0025.

Acknowledgements

Credit to Qwen for the powerful Qwen3 architecture and for releasing their work openly.

Downloads last month: 549

Safetensors

Model size

40B params

Tensor type

F32

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Darkhn/Qwen3.5-40B

Base model

Qwen/Qwen3.5-27B

Finetuned

(176)

this model

Adapters

1 model

Finetunes

1 model

Darkhn
/

Qwen3.5-40B

Qwen3.5-27B to 40B Upscale

⚠️ BIG WARNING ⚠️

Model Description

Acknowledgements

Model tree for Darkhn/Qwen3.5-40B