Observed same memory leak issue caused OOM when multiple completion requests with difference video or picture per prompt

#6
by CHONGYOEYAT - opened

https://github.com/vllm-project/vllm/issues/28230

Observed same memory leak issue caused OOM when multiple completion requests with difference video or picture per prompt

Environment=CUDA_DEVICE_ORDER=PCI_BUS_ID
Environment=CUDA_VISIBLE_DEVICES=0,1,2,3
Environment=VLLM_SLEEP_WHEN_IDLE=1
ExecStart=/bin/bash -c 'source /home/aiserver/venv-vllm-nightly/bin/activate && vllm serve Qwen3-VL-235B-A22B-Instruct-FP8 --served-model-name llm_model --enable-expert-parallel --tensor-parallel-size 4 --max-model-len 262144 --gpu-memory-utilization 0.84 --trust_remote_code --host 0.0.0.0 --port 14501 --enable-auto-tool-choice --tool-call-parser hermes --mm-encoder-tp-mode data --async-scheduling --max_num_seqs=10 --mm-processor-cache-gb 5'

Environment:
6000 Pro Q-MAX 96GB * 4
Driver Version: 580.95.05
CUDA Version: 12.9
vllm Version: 0.11.2
python Version: 3.12.11
ubuntu 22.04

Sign up or log in to comment