Octen-Embedding-0.6B โ FP16 ONNX
FP16-converted ONNX of Octen/Octen-Embedding-0.6B, a Qwen3-derived 1024-dim retrieval embedding model with 32k context and last-token pooling.
1.2 GB (50 % memory of FP32), retrieval-quality-equivalent to FP32 in our gates.
Quality
| Metric | Value | Threshold |
|---|---|---|
cos_min vs PyTorch FP32 reference (6-text multilingual probe) |
1.000000 | โฅ 0.99 |
cos_mean vs same |
1.000000 | โ |
Validated under fastembed-rs' cosine_parity harness on probe/ort-rc12 (ORT 1.24). Probe set covers EN/DE/ZH plus retrieval-style sentences; per-row cosines all โฅ 0.999999.
Files
| File | Size | Description |
|---|---|---|
model.fp16.onnx |
~5 MB | ONNX header (external data) |
model.fp16.onnx.data |
~1.2 GB | FP16 weights |
tokenizer.json, config.json, tokenizer_config.json, special_tokens_map.json |
small | tokenizer + model config |
Conversion
Streaming FP32โFP16 via convert_fp16_streaming.py (bypasses the 2 GB protobuf serialization limit by writing external data directly without an intermediate Python-side proto).
Use via fastembed-rs
let embedder = TextEmbedding::try_new(
InitOptions::new(EmbeddingModel::OctenEmbedding0_6BFp16))?;
let vectors = embedder.embed(vec!["hello world"], None)?;
Pooling: last-token (auto-applied by fastembed-rs). Use "Query: " / "Document: " prefixes for asymmetric retrieval.
License
Apache 2.0, inherited from the base model.
- Downloads last month
- 44
Model tree for cstr/Octen-Embedding-0.6B-ONNX-FP16
Base model
Qwen/Qwen3-0.6B-Base Finetuned
Qwen/Qwen3-Embedding-0.6B Finetuned
Octen/Octen-Embedding-0.6B