Transformers
Safetensors
dplm2
custom_code
lhallee commited on
Commit
36868f1
·
verified ·
1 Parent(s): d86675f

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +25 -2
README.md CHANGED
@@ -39,8 +39,31 @@ with torch.no_grad():
39
  ## DPLM2 modality types
40
  DPLM2 infers `type_ids` automatically from `input_ids` and `attention_mask` when they are not provided.
41
 
42
- ## Attention backend
43
- `sdpa` is the default backend. Flex Attention is available by setting `config.attn_backend = "flex"` before loading.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
44
 
45
  ## Embed datasets
46
  All DPLM2 models inherit `EmbeddingMixin`, so you can call `model.embed_dataset(...)` directly.
 
39
  ## DPLM2 modality types
40
  DPLM2 infers `type_ids` automatically from `input_ids` and `attention_mask` when they are not provided.
41
 
42
+ ## Attention backends
43
+
44
+ `sdpa` (PyTorch Scaled Dot Product Attention) is the default.
45
+
46
+ | Backend | Key | Notes |
47
+ | :--- | :--- | :--- |
48
+ | PyTorch SDPA | `"sdpa"` | Default. Exact numerics, stable on all hardware. |
49
+ | Flash Attention | `"kernels_flash"` | Fastest on Ampere/Hopper GPUs. Requires `pip install flash-attn`. Outputs differ slightly from SDPA due to online softmax reordering, but differences are numerically harmless. |
50
+ | Flex Attention | `"flex"` | Skips padding tokens via block mask — faster on variable-length batches. Near-exact numerics. First use compiles a Triton kernel (30–120 s). Best combined with `torch.compile`. |
51
+ | Auto | `"auto"` | Picks the best available: `kernels_flash` → `flex` → `sdpa`. |
52
+
53
+ Set via config before loading, or change on the model after loading (DPLM2 propagates the change to all attention layers immediately):
54
+
55
+ ```python
56
+ from transformers import AutoConfig, AutoModel
57
+
58
+ # Option 1: set before loading
59
+ config = AutoConfig.from_pretrained("Synthyra/DPLM2-150M", trust_remote_code=True)
60
+ config.attn_backend = "flex"
61
+ model = AutoModel.from_pretrained("Synthyra/DPLM2-150M", config=config, trust_remote_code=True)
62
+
63
+ # Option 2: set after loading
64
+ model = AutoModel.from_pretrained("Synthyra/DPLM2-150M", trust_remote_code=True)
65
+ model.attn_backend = "flex" # propagates to all attention layers in-place
66
+ ```
67
 
68
  ## Embed datasets
69
  All DPLM2 models inherit `EmbeddingMixin`, so you can call `model.embed_dataset(...)` directly.