ItsMaxNorm
/

DeepSeek-R1-Distill-SmolLM3-3B-GRPO

Text Generation

Generated from Trainer

Model card Files Files and versions

DeepSeek-R1-Distill-SmolLM3-3B-GRPO / latest

ItsMaxNorm's picture

Model save

5befd42 verified 6 months ago

history blame contribute delete

14 Bytes

global_step460