TemRust-SMOL-v5-1.5B
A 1.5B Rust-specialized coding assistant, fine-tuned via LoRA SFT on top of
Qwen/Qwen2.5-Coder-1.5B-Instruct using a curated 355-row Rust SFT mix
(263 real merged-PR file-pair fixes from popular Rust repos + 92 teacher-distilled
synthetic examples covering borrow/lifetime archetypes and test generation).
Benchmark — TemRust-* (n=37 hand-curated tasks, cargo-graded)
The benchmark contains four sub-evals (all hand-curated; all graded by running
cargo check, cargo test, or cargo run in a fresh tempdir per task —
no mocks):
- borrow (10): borrow-checker / lifetime / move errors
- issue (9): "fix this documented bug" (real GitHub issues)
- test (9): write passing
#[test]cases for given function - type (9): type-system / trait-bound errors
| sub-eval | this model | rate |
|---|---|---|
| borrow | 7/10 | 70.0% |
| issue | 7/9 | 77.8% |
| test | 4/9 | 44.4% |
| type | 7/9 | 77.8% |
| total | 25/37 | 67.6% |
Comparison to bases and other Tem-Rust versions
| Model | Class | Pass rate |
|---|---|---|
| Qwen3-1.7B-chat (untrained) | 1.7B | 35.1% |
| Qwen2.5-Coder-1.5B-Instruct (this base, untrained) | 1.5B | 51.4% |
| Tem-Rust v4 (Qwen3-1.7B-chat + LoRA) | 1.7B | 54.1% |
| TemRust-SMOL-v5-1.5B | 1.5B | 67.6% |
| Qwen2.5-Coder-3B-Instruct (untrained, 2× the params) | 3B | 73.0% |
| Tem-Rust v4 ∪ v5 ensemble + cargo check | 3.2B | 83.8% |
Usage
Quick fix-this-Rust-file pattern
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
tok = AutoTokenizer.from_pretrained("nagisanzeninz/TemRust-SMOL-v5-1.5B")
model = AutoModelForCausalLM.from_pretrained(
"nagisanzeninz/TemRust-SMOL-v5-1.5B", torch_dtype=torch.bfloat16, device_map="auto"
)
SYSTEM = (
"You are Tem-Rust, a Rust coding assistant. Return the complete fixed Rust "
"file in a single ```rust code block. Do not include any other code blocks "
"or explanations outside the block."
)
buggy_rust = '''
fn longest(x: &str, y: &str) -> &str {
if x.len() > y.len() { x } else { y }
}
'''
messages = [
{"role": "system", "content": SYSTEM},
{"role": "user", "content": f"```rust\n{buggy_rust}\n```"},
]
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(
**inputs, max_new_tokens=2048, temperature=0.0, do_sample=False,
pad_token_id=tok.eos_token_id,
)
print(tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))
Recommended for production
Run the model's output through cargo check (and cargo test if your task
adds tests) before accepting. The ~2× lift from running both this model AND
Tem-Rust-v4 (1.7B) and accepting whichever passes cargo check is documented
above as the v4 ∪ v5 ensemble.
Training
- Method: LoRA r=32, alpha=64, dropout=0.05 on
q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj, then merged into base for release - Data: 355-row mix:
- 263 real merged-PR file pairs (pre-fix → post-fix) crawled from 35+ popular Rust GitHub repos via the v3 issue crawler
- 41 teacher-distilled coverage-style test examples (Qwen3-Coder-Next-FP8)
- 51 teacher-fixed borrow/lifetime archetypes (canonical move-after-borrow, lifetime missing, &mut/& conflict, dangling reference, closure capture, etc.)
- Hyperparameters: 10 epochs, lr 2e-5 cosine, warmup 3%, batch 4 with
grad_accum 2 (effective 8), bf16, gradient checkpointing on, packing=True,
max_seq_len 4096;
adamw_torchoptimizer - Compute: 1× RunPod H100 SXM5 80GB, ~20 min wall time
- Stack (pinned for
torch==2.4.0compatibility):transformers==4.45.2,peft==0.13.2,trl==0.11.4,accelerate==1.0.1,datasets==3.0.2
Limitations
- Whole-file SFT format: longer than 4096 tokens gets truncated during training. Multi-file refactoring or large-codebase reasoning is out of scope.
- Distribution skew: the 37-task benchmark is hand-curated to balance borrow/issue/test/type, but real Rust code has much heavier issue-fix tails and much more boilerplate. Don't extrapolate the 62% headline to "Tem-Rust fixes 62% of all Rust bugs."
- No safety / RLHF post-training: standard helpful-instruction tuning only.
- Training is non-deterministic: same hyperparams + same data on different H100 runs landed in 21-23/37 range. The released checkpoint is one sample from this distribution.
Source pipeline
Full data + scripts + reproducibility: https://github.com/temm1e-labs/temrust
Citation: if you use this model, please cite the GitHub repo.
- Downloads last month
- 21
Model tree for nagisanzeninz/TemRust-SMOL-v5-1.5B
Base model
Qwen/Qwen2.5-1.5B