RuntimeError: The size of tensor a (3072) must match the size of tensor b (6144) at non-singleton dimension 1

#5
by lianyouzao - opened
This comment has been hidden (marked as Resolved)

You must run with --disable-shared-experts-fusion in sglang, otherwise it will incorrectly attempt to fuse the BF16 shared expert.

I am running it using vllm like this

vllm serve /vllm-workspace/models/GLM-5.1-NVFP4
--tensor-parallel-size 4
--speculative-config.method mtp
--speculative-config.num_speculative_tokens 3
--tool-call-parser glm47
--reasoning-parser glm45
--enable-auto-tool-choice
--chat-template-content-format=string
--served-model-name glm-5.1

getting the same error message

how did you solve it @lianyouzao ? any recommendation @lukealonso

Using vllm 0.19.1

Same error for 8xH200

vllm serve /vllm-workspace/models/GLM-5.1-NVFP4/ --tensor-parallel-size 8 --speculative-config.method mtp --speculative-config.num_speculative_tokens 3 --tool-call-parser glm47 --reasoning-parser glm45 --enable-auto-tool-choice --chat-template-content-format=string --served-model-name glm-5.1

Sign up or log in to comment