Qwen3-Reranker-0.6B
Multi-format version of Qwen/Qwen3-Reranker-0.6B
Converted for deployment on Modal.com and other platforms.
Model Information
| Property | Value |
|---|---|
| Source Model | Qwen/Qwen3-Reranker-0.6B |
| Formats | SafeTensors FP32 + ONNX FP32 + SafeTensors FP16 + ONNX INT8 |
| Task | reranker-llm |
| Trust Remote Code | True |
Available Versions
safetensors-fp32/: PyTorch FP32 (baseline, accuracy cao nhat)onnx-fp32/: ONNX FP32 (portable, cross-platform)safetensors-fp16/: PyTorch FP16 (GPU inference, giam ~50%)onnx-int8/: ONNX INT8 Quantized (CPU inference, giam ~75%)
Usage
PyTorch (GPU - Modal.com)
from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
# GPU inference voi FP16 (khuyen nghi cho Modal.com)
model = AutoModelForCausalLM.from_pretrained(
"n24q02m/Qwen3-Reranker-0.6B/safetensors-fp16",
torch_dtype=torch.float16, trust_remote_code=True
).cuda()
tokenizer = AutoTokenizer.from_pretrained(
"n24q02m/Qwen3-Reranker-0.6B/safetensors-fp16", trust_remote_code=True
)
# Inference
inputs = tokenizer("Hello world", return_tensors="pt").to("cuda")
with torch.no_grad():
outputs = model(**inputs)
ONNX Runtime (CPU)
import numpy as np
import onnxruntime as ort
from transformers import AutoTokenizer
# CPU inference voi ONNX
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-Reranker-0.6B", trust_remote_code=True)
session = ort.InferenceSession(
"n24q02m/Qwen3-Reranker-0.6B/onnx-int8/model_quantized.onnx",
providers=["CPUExecutionProvider"]
)
# Inference
inputs = tokenizer("Hello world", return_tensors="np")
input_names = [inp.name for inp in session.get_inputs()]
feed_dict = {}
for name in input_names:
if name in inputs:
feed_dict[name] = inputs[name].astype(np.int64)
outputs = session.run(None, feed_dict)
Notes
- SafeTensors FP16 la format chinh cho GPU inference tren Modal.com
- Load tokenizer tu model goc hoac tu cung folder
- ONNX INT8 la format cho CPU fallback, kich thuoc nho nhat
License
Apache 2.0 (following the original model license)
Credits
- Original model: Qwen/Qwen3-Reranker-0.6B
- Conversion: Optimum + PyTorch