Instructions to use alvarobartt/grok-2-tokenizer with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use alvarobartt/grok-2-tokenizer with Transformers:
# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("alvarobartt/grok-2-tokenizer", dtype="auto") - Notebooks
- Google Colab
- Kaggle
| library_name: transformers | |
| tags: | |
| - tokenizers | |
| - sglang | |
| license: other | |
| license_name: grok-2 | |
| license_link: https://huggingface.co/xai-org/grok-2/blob/main/LICENSE | |
| # Grok-2 Tokenizer | |
| A 🤗-compatible version of the **Grok-2 tokenizer** (adapted from [xai-org/grok-2](https://huggingface.co/xai-org/grok-2)). | |
| This means it can be used with Hugging Face libraries including [Transformers](https://github.com/huggingface/transformers), | |
| [Tokenizers](https://github.com/huggingface/tokenizers), and [Transformers.js](https://github.com/xenova/transformers.js). | |
| ## Motivation | |
| As Grok 2.5 aka. [xai-org/grok-2](https://github.com/xai-org/grok-2) has been recently released on the 🤗 Hub with [SGLang](https://github.com/sgl-project/sglang) | |
| native support, but the checkpoints on the Hub won't come with a Hugging Face compatible tokenizer, but rather with a `tiktoken`-based | |
| JSON export, which is [internally read and patched in SGLang](https://github.com/sgl-project/sglang/blob/fd71b11b1d96d385b09cb79c91a36f1f01293639/python/sglang/srt/tokenizer/tiktoken_tokenizer.py#L29-L108). | |
| This repository then contains the Hugging Face compatible export so that users can easily interact and play around with the Grok-2 tokenizer, | |
| besides that allowing to use it via SGLang without having to pull the repository manually from the Hub and then using a mount, to prevent from directly having | |
| to point to the tokenizer path, so that Grok-2 can be deployed as: | |
| ```bash | |
| python3 -m sglang.launch_server --model-path xai-org/grok-2 --tokenizer-path alvarobartt/grok-2-tokenizer --tp-size 8 --quantization fp8 --attention-backend triton | |
| ``` | |
| Rather than the former 2-step process: | |
| ```bash | |
| hf download xai-org/grok-2 --local-dir /local/grok-2 | |
| python3 -m sglang.launch_server --model-path /local/grok-2 --tokenizer-path /local/grok-2/tokenizer.tok.json --tp-size 8 --quantization fp8 --attention-backend triton | |
| ``` | |
| ## Example | |
| ```py | |
| from transformers import AutoTokenizer | |
| tokenizer = AutoTokenizer.from_pretrained("alvarobartt/grok-2-tokenizer") | |
| assert tokenizer.encode("Human: What is Deep Learning?<|separator|>\n\n") == [ | |
| 35406, | |
| 186, | |
| 2171, | |
| 458, | |
| 17454, | |
| 14803, | |
| 191, | |
| 1, | |
| 417, | |
| ] | |
| assert ( | |
| tokenizer.apply_chat_template( | |
| [{"role": "user", "content": "What is the capital of France?"}], tokenize=False | |
| ) | |
| == "Human: What is the capital of France?<|separator|>\n\n" | |
| ) | |
| ``` | |
| > [!NOTE] | |
| > This repository has been inspired by earlier similar work by [Xenova](https://huggingface.co/Xenova) in [`Xenova/grok-1-tokenizer`](https://huggingface.co/Xenova/grok-1-tokenizer). |