austral-grpo-merged-r1
This is a merge of pre-trained language models created using mergekit.
Merge Details
Merge Method
This model was merged using the Passthrough merge method using Delta-Vector/MS3.2-Austral-Winton + /home/dgxuser/workspace/Mango/verifiers/outputs/writing-judge-training-1e-5-power-2/checkpoint-40 as a base.
Models Merged
The following models were included in the merge:
Configuration
The following YAML configuration was used to produce this model:
base_model: Delta-Vector/MS3.2-Austral-Winton+/home/dgxuser/workspace/Mango/verifiers/outputs/writing-judge-training-1e-5-power-2/checkpoint-40
dtype: bfloat16
merge_method: passthrough
models:
- model: Delta-Vector/MS3.2-Austral-Winton+/home/dgxuser/workspace/Mango/verifiers/outputs/writing-judge-training-1e-5-power-2/checkpoint-40
- Downloads last month
- 11
Model tree for NewEden/Austral-24b-GRPO
Base model
mistralai/Mistral-Small-3.1-24B-Base-2503
Finetuned
Gryphe/Codex-24B-Small-3.2
Finetuned
Delta-Vector/MS3.2-Austral-24B-SFT
Finetuned
Delta-Vector/MS3.2-Austral-24B-KTO
Finetuned
Delta-Vector/MS3.2-Austral-Winton