About the uploaded model
- Developed by: prithivMLmods
- License: apache-2.0
- Finetuned from model : unsloth/meta-llama-3.1-8b-bnb-4bit
The model is still in the training phase. This is not the final version and may contain artifacts and perform poorly in some cases.
Trainer Configuration
| Parameter | Value |
|---|---|
| Model | model |
| Tokenizer | tokenizer |
| Train Dataset | dataset |
| Dataset Text Field | text |
| Max Sequence Length | max_seq_length |
| Dataset Number of Processes | 2 |
| Packing | False (Can make training 5x faster for short sequences.) |
| Training Arguments | |
| - Per Device Train Batch Size | 2 |
| - Gradient Accumulation Steps | 4 |
| - Warmup Steps | 5 |
| - Number of Train Epochs | 1 (Set this for 1 full training run.) |
| - Max Steps | 60 |
| - Learning Rate | 2e-4 |
| - FP16 | not is_bfloat16_supported() |
| - BF16 | is_bfloat16_supported() |
| - Logging Steps | 1 |
| - Optimizer | adamw_8bit |
| - Weight Decay | 0.01 |
| - LR Scheduler Type | linear |
| - Seed | 3407 |
| - Output Directory | outputs |
.
. This llama model was trained 2x faster with Unsloth and Huggingface's TRL library.
- Downloads last month
- -