sample python usage request
#1
by
drzraf
- opened
I'm interested by this repository (I secretly wish q4 holds onto 2GB VRAM) but I've a couple of noob questions regarding usage:
- Using
optimum.onnxruntime.pipeline()there is no way that I know to specify the quantization. - Neither from
ORTModelForImageClassification.from_pretrained() - By default,
model.onnx_datais retrieved. There ismodel_fp16.onnx_datatoo but no such thing forint8orq4(it's simply calledmodel_q4.onnx from_pretrained("~/.cache/huggingface/hub/models--onnx-community--siglip2-so400m-patch16-512-ONNX")"can't infer the library", like it needs the config to even load the pretrained model (??). Not sure what's the way to load it locally (or is it theconfig.jsonwhich would be missing something?)- ... but even then, it's doesn't help when it comes to specifying the quantization and the overall loading process in this specific situation still seems a bit opaque to me).
I also don't know whether pipeline (and batch/dataset) will even work.
Any example would be greatly appreciated to bootstrap.
thank you!