Apuri-V3-0.6B-SFT-QAT
Finnish instruction-following model based on Qwen3-0.6B.
Model Details
- Base: Qwen/Qwen3-0.6B
- Training: CPT (Finnish) → SFT with QAT
- Languages: Finnish, English
- Quantization: Q4_K_M with imatrix (379MB)
Usage with Ollama
Create a Modelfile with the chat template:
FROM ./apuri-v3-0.6b-sft-qat-Q4_K_M.gguf
TEMPLATE """{{- if .System }}<|im_start|>system
{{ .System }}<|im_end|>
{{ end }}<|im_start|>user
{{ .Prompt }}<|im_end|>
<|im_start|>assistant
"""
PARAMETER temperature 0.6
PARAMETER stop <|im_end|>
PARAMETER num_ctx 2048
Then:
ollama create apuri-finnish -f Modelfile
ollama run apuri-finnish
Usage with llama-cpp-python
from llama_cpp import Llama
llm = Llama(
model_path="apuri-v3-0.6b-sft-qat-Q4_K_M.gguf",
n_ctx=2048,
chat_format="chatml", # Important: use chatml format
n_gpu_layers=-1,
)
response = llm.create_chat_completion(
messages=[{"role": "user", "content": "Mikä on Suomen pääkaupunki?"}],
max_tokens=100,
temperature=0.6,
)
print(response["choices"][0]["message"]["content"])
# Output: Suomen pääkaupunki on Helsinki...
Test Results
| Test | Result |
|---|---|
| Mikä on Suomen pääkaupunki? | Helsinki ✅ |
| Mikä on 15 + 27? | 42 ✅ |
| Listaa kolme Suomen kaupunkia | Helsinki, Turku, Tampere ✅ |
| 3 omenaa + 5 lisää = ? | 8 omenaa ✅ |
Files
apuri-v3-0.6b-sft-qat-Q4_K_M.gguf- Q4_K_M quantized (379MB) - Recommendedapuri-v3-0.6b-sft-qat-bf16.gguf- BF16 full precision (1.2GB)
Training
- CPT: 1M Finnish tokens (fineweb-edu, mc4-fi, news)
- SFT: 30K instructions (OpenHermes + Finnish OASST2/Capybara)
- QAT: int4 quantization-aware training
Limitations
- Small 0.6B model - best for simple tasks
- May have knowledge gaps in Finnish culture/history
- Works best with the chat template shown above
- Downloads last month
- 4
Hardware compatibility
Log In to add your hardware
4-bit
16-bit
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support