Inference Code

by wangzijian - opened Sep 26, 2025

Discussion

wangzijian

Sep 26, 2025

Hey . Its a great work! but i want to know if there has infernce code like onnx-community/chatterbox-ONNX? Thank U!!

vladislavbro

ONNX Community org Sep 26, 2025

Hi @wangzijian , yes, it's coming!

wangzijian

Sep 27, 2025

•

edited Sep 27, 2025

Hi @wangzijian , yes, it's coming!

Thank You!!!!!!

ozguntosun

Sep 27, 2025

can we use gpu for inference?

vladislavbro

ONNX Community org Sep 27, 2025

@ozguntosun Hi, there is a link in readme into conversion script, you could export with cuda device instead of cpu and use it in inference, it should be easy enough

ozguntosun

Sep 28, 2025

Thank you, i converted for gpu and its working. Another update required for new model update realeased for multilanguage tokenizer. It makes big improvements for inference non english languages. We need to load new t3 model and new tokenizer(https://github.com/resemble-ai/chatterbox/blob/bf169fe5f518760cb0b6c6a6eba3f885e10fa86f/src/chatterbox/mtl_tts.py#L184). Can you help for edit conversion and inference for this new pipe?

vladislavbro

ONNX Community org Sep 29, 2025

Probably it's needed just to replace tokenizer config, I'll take a look a bit later

vladislavbro

ONNX Community org Sep 29, 2025

@ozguntosun I've updated tokenizer, you could try again

ozguntosun

Sep 29, 2025

Just changing the tokenizer is not enough to improve audio quality — I tried this already. For the LLaMA backbone and inference, the EXAGGERATION_TOKEN is no longer required as a parameter; it now has a different processing logic. When I used the updated backbone from the latest model, I was able to achieve the quality I wanted.

vladislavbro

ONNX Community org Sep 30, 2025

there is also updated backbone in the repo, at least from the recent version 1.4.0. What exactly you try to achieve and where is the difference? What do you mean exaggeration token is not required, we use it to control tensity of the speech and it is used in original code to prepare conditionals.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment