Qwen3-Reranker-0.6B

Multi-format version of Qwen/Qwen3-Reranker-0.6B

Converted for deployment on Modal.com and other platforms.

Model Information

Property Value
Source Model Qwen/Qwen3-Reranker-0.6B
Formats SafeTensors FP32 + ONNX FP32 + SafeTensors FP16 + ONNX INT8
Task reranker-llm
Trust Remote Code True

Available Versions

  • safetensors-fp32/: PyTorch FP32 (baseline, accuracy cao nhat)
  • onnx-fp32/: ONNX FP32 (portable, cross-platform)
  • safetensors-fp16/: PyTorch FP16 (GPU inference, giam ~50%)
  • onnx-int8/: ONNX INT8 Quantized (CPU inference, giam ~75%)

Usage

PyTorch (GPU - Modal.com)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# GPU inference voi FP16 (khuyen nghi cho Modal.com)
model = AutoModelForCausalLM.from_pretrained(
    "n24q02m/Qwen3-Reranker-0.6B/safetensors-fp16",
    torch_dtype=torch.float16, trust_remote_code=True
).cuda()
tokenizer = AutoTokenizer.from_pretrained(
    "n24q02m/Qwen3-Reranker-0.6B/safetensors-fp16", trust_remote_code=True
)

# Inference
inputs = tokenizer("Hello world", return_tensors="pt").to("cuda")
with torch.no_grad():
    outputs = model(**inputs)

ONNX Runtime (CPU)

import numpy as np
import onnxruntime as ort
from transformers import AutoTokenizer

# CPU inference voi ONNX
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-Reranker-0.6B", trust_remote_code=True)
session = ort.InferenceSession(
    "n24q02m/Qwen3-Reranker-0.6B/onnx-int8/model_quantized.onnx",
    providers=["CPUExecutionProvider"]
)

# Inference
inputs = tokenizer("Hello world", return_tensors="np")
input_names = [inp.name for inp in session.get_inputs()]
feed_dict = {}
for name in input_names:
    if name in inputs:
        feed_dict[name] = inputs[name].astype(np.int64)
outputs = session.run(None, feed_dict)

Notes

  1. SafeTensors FP16 la format chinh cho GPU inference tren Modal.com
  2. Load tokenizer tu model goc hoac tu cung folder
  3. ONNX INT8 la format cho CPU fallback, kich thuoc nho nhat

License

Apache 2.0 (following the original model license)

Credits

Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for n24q02m/Qwen3-Reranker-0.6B

Quantized
(42)
this model