DemoDiff-0.7B / README.md
liuganghuggingface's picture
Update README.md
32d216f verified
|
raw
history blame
842 Bytes
metadata
license: mit
datasets:
  - liuganghuggingface/demodiff_downstream
tags:
  - chemistry
  - biology

Model Configuration

Parameter Value Description
context_length 150 Maximum sequence length for the input context.
depth 24 Number of transformer layers.
diffusion_steps 500 Number of diffusion steps during training.
hidden_size 1280 Hidden dimension size in the transformer.
mlp_ratio 4 Expansion ratio in the MLP block.
num_heads 16 Number of attention heads.
task_name pretrain Task type for model training.
tokenizer_name pretrain Tokenizer used for model input.
vocab_ring_len 300 Length of the circular vocabulary window.
vocab_size 3000 Total vocabulary size.