Update README.md
Browse files
README.md
CHANGED
|
@@ -7,13 +7,17 @@ tags:
|
|
| 7 |
- biology
|
| 8 |
---
|
| 9 |
|
| 10 |
-
|
| 11 |
-
|
| 12 |
-
|
| 13 |
-
|
| 14 |
-
|
| 15 |
-
|
| 16 |
-
|
| 17 |
-
|
| 18 |
-
|
| 19 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 7 |
- biology
|
| 8 |
---
|
| 9 |
|
| 10 |
+
### Model Configuration
|
| 11 |
+
|
| 12 |
+
| Parameter | Value | Description |
|
| 13 |
+
|------------|--------|-------------|
|
| 14 |
+
| **context_length** | 150 | Maximum sequence length for the input context. |
|
| 15 |
+
| **depth** | 24 | Number of transformer layers. |
|
| 16 |
+
| **diffusion_steps** | 500 | Number of diffusion steps during training. |
|
| 17 |
+
| **hidden_size** | 1280 | Hidden dimension size in the transformer. |
|
| 18 |
+
| **mlp_ratio** | 4 | Expansion ratio in the MLP block. |
|
| 19 |
+
| **num_heads** | 16 | Number of attention heads. |
|
| 20 |
+
| **task_name** | `pretrain` | Task type for model training. |
|
| 21 |
+
| **tokenizer_name** | `pretrain` | Tokenizer used for model input. |
|
| 22 |
+
| **vocab_ring_len** | 300 | Length of the circular vocabulary window. |
|
| 23 |
+
| **vocab_size** | 3000 | Total vocabulary size. |
|