RuntimeError: The size of tensor a (3072) must match the size of tensor b (6144) at non-singleton dimension 1

by lianyouzao - opened 25 days ago

Discussion

lianyouzao

25 days ago

This comment has been hidden (marked as Resolved)

lukealonso

Owner 25 days ago

You must run with --disable-shared-experts-fusion in sglang, otherwise it will incorrectly attempt to fuse the BF16 shared expert.

paolovic

11 days ago

I am running it using vllm like this

vllm serve /vllm-workspace/models/GLM-5.1-NVFP4
--tensor-parallel-size 4
--speculative-config.method mtp
--speculative-config.num_speculative_tokens 3
--tool-call-parser glm47
--reasoning-parser glm45
--enable-auto-tool-choice
--chat-template-content-format=string
--served-model-name glm-5.1

getting the same error message

how did you solve it @lianyouzao ? any recommendation @lukealonso

paolovic

9 days ago

Using vllm 0.19.1

Same error for 8xH200

vllm serve /vllm-workspace/models/GLM-5.1-NVFP4/ --tensor-parallel-size 8 --speculative-config.method mtp --speculative-config.num_speculative_tokens 3 --tool-call-parser glm47 --reasoning-parser glm45 --enable-auto-tool-choice --chat-template-content-format=string --served-model-name glm-5.1

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment