sample python usage request

by drzraf - opened Dec 4, 2025

Dec 4, 2025

•

I'm interested by this repository (I secretly wish q4 holds onto 2GB VRAM) but I've a couple of noob questions regarding usage:

Using optimum.onnxruntime.pipeline() there is no way that I know to specify the quantization.
Neither from ORTModelForImageClassification.from_pretrained()
By default, model.onnx_data is retrieved. There is model_fp16.onnx_data too but no such thing for int8 or q4 (it's simply called model_q4.onnx
from_pretrained("~/.cache/huggingface/hub/models--onnx-community--siglip2-so400m-patch16-512-ONNX") "can't infer the library", like it needs the config to even load the pretrained model (??). Not sure what's the way to load it locally (or is it the config.json which would be missing something?)
... but even then, it's doesn't help when it comes to specifying the quantization and the overall loading process in this specific situation still seems a bit opaque to me).

I also don't know whether pipeline (and batch/dataset) will even work.

Any example would be greatly appreciated to bootstrap.

thank you!

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment