Qwen3-Reranker-0.6B

Multi-format version of Qwen/Qwen3-Reranker-0.6B

Converted for deployment on Modal.com and other platforms.

Model Information

Property	Value
Source Model	Qwen/Qwen3-Reranker-0.6B
Formats	SafeTensors FP32 + ONNX FP32 + SafeTensors FP16 + ONNX INT8
Task	reranker-llm
Trust Remote Code	True

Available Versions

safetensors-fp32/: PyTorch FP32 (baseline, accuracy cao nhat)
onnx-fp32/: ONNX FP32 (portable, cross-platform)
safetensors-fp16/: PyTorch FP16 (GPU inference, giam ~50%)
onnx-int8/: ONNX INT8 Quantized (CPU inference, giam ~75%)

Usage

PyTorch (GPU - Modal.com)

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch

# GPU inference voi FP16 (khuyen nghi cho Modal.com)
model = AutoModelForCausalLM.from_pretrained(
    "n24q02m/Qwen3-Reranker-0.6B/safetensors-fp16",
    torch_dtype=torch.float16, trust_remote_code=True
).cuda()
tokenizer = AutoTokenizer.from_pretrained(
    "n24q02m/Qwen3-Reranker-0.6B/safetensors-fp16", trust_remote_code=True
)

# Inference
inputs = tokenizer("Hello world", return_tensors="pt").to("cuda")
with torch.no_grad():
    outputs = model(**inputs)

ONNX Runtime (CPU)

import numpy as np
import onnxruntime as ort
from transformers import AutoTokenizer

# CPU inference voi ONNX
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-Reranker-0.6B", trust_remote_code=True)
session = ort.InferenceSession(
    "n24q02m/Qwen3-Reranker-0.6B/onnx-int8/model_quantized.onnx",
    providers=["CPUExecutionProvider"]
)

# Inference
inputs = tokenizer("Hello world", return_tensors="np")
input_names = [inp.name for inp in session.get_inputs()]
feed_dict = {}
for name in input_names:
    if name in inputs:
        feed_dict[name] = inputs[name].astype(np.int64)
outputs = session.run(None, feed_dict)

Notes

SafeTensors FP16 la format chinh cho GPU inference tren Modal.com
Load tokenizer tu model goc hoac tu cung folder
ONNX INT8 la format cho CPU fallback, kich thuoc nho nhat

License

Apache 2.0 (following the original model license)

Credits

Original model: Qwen/Qwen3-Reranker-0.6B
Conversion: Optimum + PyTorch

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for n24q02m/Qwen3-Reranker-0.6B

Base model

Qwen/Qwen3-0.6B-Base

Finetuned

Qwen/Qwen3-Reranker-0.6B

Quantized

(42)

this model