TemRust-SMOL-v5-1.5B

A 1.5B Rust-specialized coding assistant, fine-tuned via LoRA SFT on top of Qwen/Qwen2.5-Coder-1.5B-Instruct using a curated 355-row Rust SFT mix (263 real merged-PR file-pair fixes from popular Rust repos + 92 teacher-distilled synthetic examples covering borrow/lifetime archetypes and test generation).

Benchmark — TemRust-* (n=37 hand-curated tasks, cargo-graded)

The benchmark contains four sub-evals (all hand-curated; all graded by running cargo check, cargo test, or cargo run in a fresh tempdir per task — no mocks):

  • borrow (10): borrow-checker / lifetime / move errors
  • issue (9): "fix this documented bug" (real GitHub issues)
  • test (9): write passing #[test] cases for given function
  • type (9): type-system / trait-bound errors
sub-eval this model rate
borrow 7/10 70.0%
issue 7/9 77.8%
test 4/9 44.4%
type 7/9 77.8%
total 25/37 67.6%

Comparison to bases and other Tem-Rust versions

Model Class Pass rate
Qwen3-1.7B-chat (untrained) 1.7B 35.1%
Qwen2.5-Coder-1.5B-Instruct (this base, untrained) 1.5B 51.4%
Tem-Rust v4 (Qwen3-1.7B-chat + LoRA) 1.7B 54.1%
TemRust-SMOL-v5-1.5B 1.5B 67.6%
Qwen2.5-Coder-3B-Instruct (untrained, 2× the params) 3B 73.0%
Tem-Rust v4 ∪ v5 ensemble + cargo check 3.2B 83.8%

Usage

Quick fix-this-Rust-file pattern

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

tok = AutoTokenizer.from_pretrained("nagisanzeninz/TemRust-SMOL-v5-1.5B")
model = AutoModelForCausalLM.from_pretrained(
    "nagisanzeninz/TemRust-SMOL-v5-1.5B", torch_dtype=torch.bfloat16, device_map="auto"
)

SYSTEM = (
    "You are Tem-Rust, a Rust coding assistant. Return the complete fixed Rust "
    "file in a single ```rust code block. Do not include any other code blocks "
    "or explanations outside the block."
)

buggy_rust = '''
fn longest(x: &str, y: &str) -> &str {
    if x.len() > y.len() { x } else { y }
}
'''

messages = [
    {"role": "system", "content": SYSTEM},
    {"role": "user", "content": f"```rust\n{buggy_rust}\n```"},
]
prompt = tok.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
inputs = tok(prompt, return_tensors="pt").to(model.device)
out = model.generate(
    **inputs, max_new_tokens=2048, temperature=0.0, do_sample=False,
    pad_token_id=tok.eos_token_id,
)
print(tok.decode(out[0][inputs["input_ids"].shape[1]:], skip_special_tokens=True))

Recommended for production

Run the model's output through cargo check (and cargo test if your task adds tests) before accepting. The ~2× lift from running both this model AND Tem-Rust-v4 (1.7B) and accepting whichever passes cargo check is documented above as the v4 ∪ v5 ensemble.

Training

  • Method: LoRA r=32, alpha=64, dropout=0.05 on q_proj, k_proj, v_proj, o_proj, gate_proj, up_proj, down_proj, then merged into base for release
  • Data: 355-row mix:
    • 263 real merged-PR file pairs (pre-fix → post-fix) crawled from 35+ popular Rust GitHub repos via the v3 issue crawler
    • 41 teacher-distilled coverage-style test examples (Qwen3-Coder-Next-FP8)
    • 51 teacher-fixed borrow/lifetime archetypes (canonical move-after-borrow, lifetime missing, &mut/& conflict, dangling reference, closure capture, etc.)
  • Hyperparameters: 10 epochs, lr 2e-5 cosine, warmup 3%, batch 4 with grad_accum 2 (effective 8), bf16, gradient checkpointing on, packing=True, max_seq_len 4096; adamw_torch optimizer
  • Compute: 1× RunPod H100 SXM5 80GB, ~20 min wall time
  • Stack (pinned for torch==2.4.0 compatibility): transformers==4.45.2, peft==0.13.2, trl==0.11.4, accelerate==1.0.1, datasets==3.0.2

Limitations

  • Whole-file SFT format: longer than 4096 tokens gets truncated during training. Multi-file refactoring or large-codebase reasoning is out of scope.
  • Distribution skew: the 37-task benchmark is hand-curated to balance borrow/issue/test/type, but real Rust code has much heavier issue-fix tails and much more boilerplate. Don't extrapolate the 62% headline to "Tem-Rust fixes 62% of all Rust bugs."
  • No safety / RLHF post-training: standard helpful-instruction tuning only.
  • Training is non-deterministic: same hyperparams + same data on different H100 runs landed in 21-23/37 range. The released checkpoint is one sample from this distribution.

Source pipeline

Full data + scripts + reproducibility: https://github.com/temm1e-labs/temrust

Citation: if you use this model, please cite the GitHub repo.

Downloads last month
21
Safetensors
Model size
2B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nagisanzeninz/TemRust-SMOL-v5-1.5B

Adapter
(109)
this model
Adapters
1 model