Update README.md
Browse files
README.md
CHANGED
|
@@ -469,7 +469,7 @@ Try it out by running the following snippet.
|
|
| 469 |
> a FP8 triton kernel for fast accelerated matmuls
|
| 470 |
> (`w8a8_block_fp8_matmul_triton`) will be used
|
| 471 |
> without any degradation in accuracy. However, if you want to
|
| 472 |
-
> run your model in BF16 see (#transformers-bf16)
|
| 473 |
|
| 474 |
Then load our tokenizer along with the model and generate:
|
| 475 |
|
|
@@ -516,6 +516,8 @@ decoded_output = tokenizer.decode(output[len(tokenized["input_ids"][0]):])
|
|
| 516 |
print(decoded_output)
|
| 517 |
```
|
| 518 |
|
|
|
|
|
|
|
| 519 |
#### Transformers BF16
|
| 520 |
|
| 521 |
Transformers allows you to automatically convert the checkpoint to Bfloat16. To do so, simply load the model as follows:
|
|
@@ -531,8 +533,6 @@ model = Mistral3ForConditionalGeneration.from_pretrained(
|
|
| 531 |
)
|
| 532 |
```
|
| 533 |
|
| 534 |
-
</details>
|
| 535 |
-
|
| 536 |
## License
|
| 537 |
|
| 538 |
This model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt).
|
|
|
|
| 469 |
> a FP8 triton kernel for fast accelerated matmuls
|
| 470 |
> (`w8a8_block_fp8_matmul_triton`) will be used
|
| 471 |
> without any degradation in accuracy. However, if you want to
|
| 472 |
+
> run your model in BF16 see ([here](#transformers-bf16))
|
| 473 |
|
| 474 |
Then load our tokenizer along with the model and generate:
|
| 475 |
|
|
|
|
| 516 |
print(decoded_output)
|
| 517 |
```
|
| 518 |
|
| 519 |
+
</details>
|
| 520 |
+
|
| 521 |
#### Transformers BF16
|
| 522 |
|
| 523 |
Transformers allows you to automatically convert the checkpoint to Bfloat16. To do so, simply load the model as follows:
|
|
|
|
| 533 |
)
|
| 534 |
```
|
| 535 |
|
|
|
|
|
|
|
| 536 |
## License
|
| 537 |
|
| 538 |
This model is licensed under the [Apache 2.0 License](https://www.apache.org/licenses/LICENSE-2.0.txt).
|