Inference Code

#2
by wangzijian - opened

Hey . Its a great work! but i want to know if there has infernce code like onnx-community/chatterbox-ONNX? Thank U!!

ONNX Community org

Hi @wangzijian , yes, it's coming!

Hi @wangzijian , yes, it's coming!

Thank You!!!!!!

can we use gpu for inference?

ONNX Community org

@ozguntosun Hi, there is a link in readme into conversion script, you could export with cuda device instead of cpu and use it in inference, it should be easy enough

Thank you, i converted for gpu and its working. Another update required for new model update realeased for multilanguage tokenizer. It makes big improvements for inference non english languages. We need to load new t3 model and new tokenizer(https://github.com/resemble-ai/chatterbox/blob/bf169fe5f518760cb0b6c6a6eba3f885e10fa86f/src/chatterbox/mtl_tts.py#L184). Can you help for edit conversion and inference for this new pipe?

ONNX Community org

Probably it's needed just to replace tokenizer config, I'll take a look a bit later

ONNX Community org

@ozguntosun I've updated tokenizer, you could try again

Just changing the tokenizer is not enough to improve audio quality — I tried this already. For the LLaMA backbone and inference, the EXAGGERATION_TOKEN is no longer required as a parameter; it now has a different processing logic. When I used the updated backbone from the latest model, I was able to achieve the quality I wanted.

ONNX Community org

there is also updated backbone in the repo, at least from the recent version 1.4.0. What exactly you try to achieve and where is the difference? What do you mean exaggeration token is not required, we use it to control tensity of the speech and it is used in original code to prepare conditionals.

Sign up or log in to comment