| --- |
| license: mit |
| datasets: |
| - liuganghuggingface/demodiff_downstream |
| tags: |
| - chemistry |
| - biology |
| --- |
| |
| ### Model Configuration |
|
|
| | Parameter | Value | Description | |
| |------------|--------|-------------| |
| | **context_length** | 150 | Maximum sequence length for the input context. | |
| | **depth** | 24 | Number of transformer layers. | |
| | **diffusion_steps** | 500 | Number of diffusion steps during training. | |
| | **hidden_size** | 1280 | Hidden dimension size in the transformer. | |
| | **mlp_ratio** | 4 | Expansion ratio in the MLP block. | |
| | **num_heads** | 16 | Number of attention heads. | |
| | **task_name** | `pretrain` | Task type for model training. | |
| | **tokenizer_name** | `pretrain` | Tokenizer used for model input. | |
| | **vocab_ring_len** | 300 | Length of the circular vocabulary window. | |
| | **vocab_size** | 3000 | Total vocabulary size. | |
|
|