--- license: mit datasets: - liuganghuggingface/demodiff_downstream tags: - chemistry - biology --- ### Model Configuration | Parameter | Value | Description | |------------|--------|-------------| | **context_length** | 150 | Maximum sequence length for the input context. | | **depth** | 24 | Number of transformer layers. | | **diffusion_steps** | 500 | Number of diffusion steps during training. | | **hidden_size** | 1280 | Hidden dimension size in the transformer. | | **mlp_ratio** | 4 | Expansion ratio in the MLP block. | | **num_heads** | 16 | Number of attention heads. | | **task_name** | `pretrain` | Task type for model training. | | **tokenizer_name** | `pretrain` | Tokenizer used for model input. | | **vocab_ring_len** | 300 | Length of the circular vocabulary window. | | **vocab_size** | 3000 | Total vocabulary size. |