metadata
library_name: transformers
model_name: Qemma-redux
tags:
- generated_from_trainer
- sft
- trl
licence: license
license: osl-3.0
datasets:
- O1-OPEN/OpenO1-SFT
- yahma/alpaca-cleaned
- Jackrong/gpt-oss-120b-reasoning-STEM-5K
language:
- en
base_model:
- reaperdoesntknow/Qemma-sft
pipeline_tag: text-generation
Model Card for Qemma
Redux This Model underwent an additional merge between Qemma-sft and Qwen3-0.6B, in addition to adding Rope Scaling. Qemma is a HuggingFace-native hybrid model that merges Gemma-3 (1B) and Qwen-3 (0.6B) at the weight level (no adapters). Design: Gemma MLP/body + Qwen attention/head, projected and aligned to Gemma’s hidden size. The model is then SFT-tuned for stepwise reasoning. This variant uses Yarn based Rope Scaling with 1:1 Ratio from max_position_embeddings
Quick start
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
model_id = "reaperdoesntknow/Qemma-redux"
tokenizer = AutoTokenizer.from_pretrained(model_id, use_fast=True)
model = AutoModelForCausalLM.from_pretrained(model_id, torch_dtype=torch.bfloat16).eval()
text = "I notice that the sum involves the absolute values of three linear expressions of x."
inputs = tokenizer(text, return_tensors="pt", max_length=64, padding='max_length', truncation=True)
inputs = {k: v.to(model.device) for k, v in inputs.items()}
with torch.no_grad():
model.eval()
outputs = model.generate(**inputs, max_new_tokens=256, do_sample=True, min_length=32)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
What’s inside
- Architecture: Gemma-3 backbone (26 layers, hidden 1152, MLP 6912) with Qwen-style attention regrouped to Gemma’s 4×256 heads.
- Tokenizer: Gemma-3 tokenizer and chat template (see
chat_template.jinja). - Training: SFT for instruction following and stepwise reasoning.
Intended use & limitations
Use: research, instruction following, code/help, analysis, further SFT/RLHF. Limits: may hallucinate; not for safety-critical, medical, legal, or financial decisions. Follow dataset/model licenses.
Training procedure
- ~512 warm-start steps (Alpaca-style data)
- 256 Additional pretraining steps on (O1-OPEN/OpenO1-SFT)
- 128 SFT steps with (Jackrong/gpt-oss-120b-reasoning-STEM-5K)
- 256 SFT steps with (O1-OPEN/OpenO1-SFT)
Framework versions
- TRL: 0.25.0
- Transformers: 4.57.1
- Pytorch: 2.8.0+cpu
- Datasets: 4.4.1
- Tokenizers: 0.22.1
Citations
Cite TRL as:
@misc{vonwerra2022trl,
title = {{TRL: Transformer Reinforcement Learning}},
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
year = 2020,
journal = {GitHub repository},
publisher = {GitHub},
howpublished = {\url{https://github.com/huggingface/trl}}
}