Transformers
Safetensors
dplm2
custom_code
lhallee commited on
Commit
bb85f2c
·
verified ·
1 Parent(s): 36868f1

Upload README.md with huggingface_hub

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -46,7 +46,7 @@ DPLM2 infers `type_ids` automatically from `input_ids` and `attention_mask` when
46
  | Backend | Key | Notes |
47
  | :--- | :--- | :--- |
48
  | PyTorch SDPA | `"sdpa"` | Default. Exact numerics, stable on all hardware. |
49
- | Flash Attention | `"kernels_flash"` | Fastest on Ampere/Hopper GPUs. Requires `pip install flash-attn`. Outputs differ slightly from SDPA due to online softmax reordering, but differences are numerically harmless. |
50
  | Flex Attention | `"flex"` | Skips padding tokens via block mask — faster on variable-length batches. Near-exact numerics. First use compiles a Triton kernel (30–120 s). Best combined with `torch.compile`. |
51
  | Auto | `"auto"` | Picks the best available: `kernels_flash` → `flex` → `sdpa`. |
52
 
 
46
  | Backend | Key | Notes |
47
  | :--- | :--- | :--- |
48
  | PyTorch SDPA | `"sdpa"` | Default. Exact numerics, stable on all hardware. |
49
+ | Flash Attention | `"kernels_flash"` | Fastest on Ampere/Hopper GPUs. Requires `pip install kernels` (pre-built — no hours-long compilation). Outputs are not bitwise identical to SDPA due to online softmax reordering; differences are often small but not guaranteed to be inconsequential — use `"sdpa"` if exact numerics matter. |
50
  | Flex Attention | `"flex"` | Skips padding tokens via block mask — faster on variable-length batches. Near-exact numerics. First use compiles a Triton kernel (30–120 s). Best combined with `torch.compile`. |
51
  | Auto | `"auto"` | Picks the best available: `kernels_flash` → `flex` → `sdpa`. |
52