TemRust-SMOL-v5-1.5B

A 1.5B Rust-specialized coding assistant, fine-tuned via LoRA SFT on top of Qwen/Qwen2.5-Coder-1.5B-Instruct using a curated 355-row Rust SFT mix (263 real merged-PR file-pair fixes from popular Rust repos + 92 teacher-distilled synthetic examples covering borrow/lifetime archetypes and test generation).

Benchmark — TemRust-* (n=37 hand-curated tasks, cargo-graded)

The benchmark contains four sub-evals (all hand-curated; all graded by running cargo check, cargo test, or cargo run in a fresh tempdir per task — no mocks):

borrow (10): borrow-checker / lifetime / move errors
issue (9): "fix this documented bug" (real GitHub issues)
test (9): write passing #[test] cases for given function
type (9): type-system / trait-bound errors

sub-eval	this model	rate
borrow	7/10	70.0%
issue	7/9	77.8%
test	4/9	44.4%
type	7/9	77.8%
total	25/37	67.6%

Comparison to bases and other Tem-Rust versions

Model	Class	Pass rate
Qwen3-1.7B-chat (untrained)	1.7B	35.1%
Qwen2.5-Coder-1.5B-Instruct (this base, untrained)	1.5B	51.4%
Tem-Rust v4 (Qwen3-1.7B-chat + LoRA)	1.7B	54.1%
TemRust-SMOL-v5-1.5B	1.5B	67.6%
Qwen2.5-Coder-3B-Instruct (untrained, 2× the params)	3B	73.0%
Tem-Rust v4 ∪ v5 ensemble + cargo check	3.2B	83.8%

Usage

Quick fix-this-Rust-file pattern

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tok = AutoTokenizer.from_pretrained("nagisanzeninz/TemRust-SMOL-v5-1.5B")
model = AutoModelForCausalLM.from_pretrained(
    "nagisanzeninz/TemRust-SMOL-v5-1.5B", torch_dtype=torch.bfloat16, device_map="auto"
)

SYSTEM = (
    "You are Tem-Rust, a Rust coding assistant. Return the complete fixed Rust "
    "file in a single ```rust code block. Do not include any other code blocks "
    "or explanations outside the block."
)

buggy_rust = '''
fn longest(x: &str, y: &str) -> &str {
    if x.len() > y.len() { x } else { y }
}
'''

messages = [
    {"role": "system", "content": SYSTEM},
    {"role": "user", "content": f"```rust\n{buggy_rust}\n```"},
]
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(
    **inputs, max_new_tokens=2048, temperature=0.0, do_sample=False,
    pad_token_id=tok.eos_token_id,
)
print(tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Recommended for production

Run the model's output through cargo check (and cargo test if your task adds tests) before accepting. The ~2× lift from running both this model AND Tem-Rust-v4 (1.7B) and accepting whichever passes cargo check is documented above as the v4 ∪ v5 ensemble.

Training

Method: LoRA r=32, alpha=64, dropout=0.05 on q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, then merged into base for release
Data: 355-row mix:
- 263 real merged-PR file pairs (pre-fix → post-fix) crawled from 35+ popular Rust GitHub repos via the v3 issue crawler
- 41 teacher-distilled coverage-style test examples (Qwen3-Coder-Next-FP8)
- 51 teacher-fixed borrow/lifetime archetypes (canonical move-after-borrow, lifetime missing, &mut/& conflict, dangling reference, closure capture, etc.)
Hyperparameters: 10 epochs, lr 2e-5 cosine, warmup 3%, batch 4 with grad_accum 2 (effective 8), bf16, gradient checkpointing on, packing=True, max_seq_len 4096; adamw_torch optimizer
Compute: 1× RunPod H100 SXM5 80GB, ~20 min wall time
Stack (pinned for torch==2.4.0 compatibility): transformers==4.45.2, peft==0.13.2, trl==0.11.4, accelerate==1.0.1, datasets==3.0.2

Limitations

Whole-file SFT format: longer than 4096 tokens gets truncated during training. Multi-file refactoring or large-codebase reasoning is out of scope.
Distribution skew: the 37-task benchmark is hand-curated to balance borrow/issue/test/type, but real Rust code has much heavier issue-fix tails and much more boilerplate. Don't extrapolate the 62% headline to "Tem-Rust fixes 62% of all Rust bugs."
No safety / RLHF post-training: standard helpful-instruction tuning only.
Training is non-deterministic: same hyperparams + same data on different H100 runs landed in 21-23/37 range. The released checkpoint is one sample from this distribution.