The BPE pre-tokenizer was not recognized!

kegintheai · May 2, 2026, 5:27pm

Hi. After fine-tuning a Qwen3.5-4B, I tried to convert it to gguf using llama.cpp/convert_hf_to_gguf.py but I’m getting the following error. I upgraded transformers to the latest and ran convert_hf_to_gguf_update.py but without success. I would appreciate any guidance. Thanks!

WARNING:hf-to-gguf:**************************************************************************************

WARNING:hf-to-gguf:** WARNING: The BPE pre-tokenizer was not recognized!

WARNING:hf-to-gguf:** There are 2 possible reasons for this:

WARNING:hf-to-gguf:** - the model has not been added to convert_hf_to_gguf_update.py yet

WARNING:hf-to-gguf:** - the pre-tokenization config has changed upstream

WARNING:hf-to-gguf:** Check your model files and convert_hf_to_gguf_update.py and update them accordingly.

WARNING:hf-to-gguf:** ref: https://github.com/ggml-org/llama.cpp/pull/6920

WARNING:hf-to-gguf:**

WARNING:hf-to-gguf:** chkhsh: 1444df51289cfa8063b96f0e62b1125440111bc79a52003ea14b6eac7016fd5f

WARNING:hf-to-gguf:**************************************************************************************

WARNING:hf-to-gguf:

Traceback (most recent call last):

File “/Users/admin/ai_tools/./llama.cpp/convert_hf_to_gguf.py”, line 13593, in

main**()**

\~\~\~\~**^^**

File “/Users/admin/ai_tools/./llama.cpp/convert_hf_to_gguf.py”, line 13587, in main

model_instance.write**()**

\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~**^^**

File “/Users/admin/ai_tools/./llama.cpp/convert_hf_to_gguf.py”, line 934, in write

self.prepare_metadata**(vocab_only=False)**

\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~**^^^^^^^^^^^^^^^^^^**

File “/Users/admin/ai_tools/./llama.cpp/convert_hf_to_gguf.py”, line 1078, in prepare_metadata

self.set_vocab**()**

\~\~\~\~\~\~\~\~\~\~\~\~\~\~**^^**

File “/Users/admin/ai_tools/./llama.cpp/convert_hf_to_gguf.py”, line 1050, in set_vocab

self.\_set_vocab_gpt2**()**

\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~**^^**

File “/Users/admin/ai_tools/./llama.cpp/convert_hf_to_gguf.py”, line 1567, in _set_vocab_gpt2

tokens, toktypes, tokpre = self.get_vocab_base**()**

                           \~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~\~**^^**

File “/Users/admin/ai_tools/./llama.cpp/convert_hf_to_gguf.py”, line 1234, in get_vocab_base

tokpre = self.get_vocab_base_pre(tokenizer)

File “/Users/admin/ai_tools/./llama.cpp/convert_hf_to_gguf.py”, line 1555, in get_vocab_base_pre

raise NotImplementedError("BPE pre-tokenizer was not recognized - update get_vocab_base_pre()")

NotImplementedError: BPE pre-tokenizer was not recognized - update get_vocab_base_pre()

Lightcap · May 2, 2026, 6:08pm

I’d first check the tokenizer files tbh. I don’t think upgrading transformers is the main thing here.

From the traceback, the converter already reaches the vocab/tokenizer part, but llama.cpp does not recognize the pre-tokenizer config from your tokenizer.json.

Can you try converting the original base model with the same llama.cpp commit? If the base model works but your fine-tuned/merged folder fails, then probably something changed in the tokenizer files.

I’d compare tokenizer.json, tokenizer_config.json, special_tokens_map.json, and added tokens. If you didn’t add/change tokens during fine-tuning, try copying the tokenizer files from the base model into the merged folder and run the conversion again.

Also please share the exact base model name, llama.cpp commit, and whether you added any tokens. Without those, it is hard to say much more than guessing from the chkhsh.

John6666 · May 3, 2026, 6:00am

I’d first check the tokenizer files tbh.

True…

Hmm, in the simplest case, it’s possible that you called model.save_pretrained() but forgot to call tokenizer.save_pretrained(). This would result in a model folder missing only the tokenizer-related files, which would explain the symptoms. If you don’t get any warnings when converting the official Qwen 3.5, then this or a similar issue is likely the culprit.

However, since there are cases where Qwen 3.5 and GGUF do not work as expected, I think it’s best to suspect an issue specific to the Qwen 3.5 series. If converting models from other series to GGUF works fine, then a specific issue with this series is likely the cause.

`convert_hf_to_gguf.py`: “The BPE pre-tokenizer was not recognized” after fine-tuning Qwen3.5-4B

I would treat this as a tokenizer / GGUF metadata compatibility problem, not primarily a transformers problem.

Your traceback gets all the way to the vocabulary/tokenizer phase:

prepare_metadata()
  set_vocab()
    _set_vocab_gpt2()
      get_vocab_base()
        get_vocab_base_pre(tokenizer)
          raise NotImplementedError(...)

That means convert_hf_to_gguf.py has already started processing the model and is now trying to encode the tokenizer contract into GGUF metadata. The failure happens because llama.cpp cannot recognize the BPE pre-tokenizer behavior loaded from your model folder.

The key line is:

chkhsh: 1444df51289cfa8063b96f0e62b1125440111bc79a52003ea14b6eac7016fd5f

That chkhsh is a tokenizer fingerprint. llama.cpp’s convert_hf_to_gguf_update.py generates hash-to-pre-tokenizer mappings for get_vocab_base_pre(). If your tokenizer produces a hash that is not in that mapping, the converter refuses to guess.

So the short version is:

The model weights may be fine. The folder you are converting contains a tokenizer configuration that your llama.cpp converter cannot map to a known GGUF pre-tokenizer type.

Why upgrading `transformers` did not fix it

Upgrading transformers helps when Transformers cannot load a model, config, or tokenizer. But this failure is inside llama.cpp’s conversion code.

The Hugging Face llama.cpp integration docs describe the conversion process as roughly:

load config.json with AutoConfig,
load tokenizer information with AutoTokenizer,
select a converter class from the model architecture,
map tensors,
write GGUF weights, tokenizer metadata, and model metadata.

Your failure happens after the tokenizer is loaded, when llama.cpp tries to classify the tokenizer’s BPE pre-tokenization behavior.

So this is not simply:

Transformers is too old.

It is more like:

llama.cpp does not recognize the tokenizer behavior in <model_dir>.

That is also why running convert_hf_to_gguf_update.py may not help automatically. That script is mainly a converter-maintenance tool: it regenerates known pre-tokenizer hashes from models listed in the script. It does not magically repair a local fine-tuned folder whose tokenizer files changed or are incomplete.

Relevant source: convert_hf_to_gguf_update.py.

Background: what a “BPE pre-tokenizer” is

A tokenizer is not only a vocabulary file.

A simplified Hugging Face tokenizer pipeline is:

raw text
  -> normalizer
  -> pre-tokenizer
  -> BPE model / merges
  -> post-processor / special-token handling
  -> token IDs

The Hugging Face Tokenizers docs describe the PreTokenizer as the component that splits text before the tokenizer model applies BPE/WordPiece/Unigram rules.

This matters because two tokenizers can have:

the same vocabulary size,
the same model architecture,
similar-looking special tokens,

but still produce different token IDs if the pre-tokenizer differs.

Examples where pre-tokenization differences can matter:

"Hello world"
" Hello world"
"Hello\nworld"
"你好，世界"
"こんにちは世界"
"🙂🚀 café naïve"
"<|im_start|>user\nHello<|im_end|>\n<|im_start|>assistant\n"

If llama.cpp wrote the wrong tokenizer.ggml.pre metadata, the resulting GGUF could load but tokenize prompts differently from Transformers. That can cause bad output, broken Unicode handling, broken chat markers, or high perplexity. So llama.cpp stops instead of guessing.

Good background references:

Why Qwen3.5 makes this easier to hit

Qwen3.5 support in llama.cpp is relatively recent and commit-sensitive.

There are recent llama.cpp issues around Qwen3.5 conversion support, including Qwen3_5ForCausalLM not being supported in some converter paths:

Your error is not exactly the same as those architecture errors, because your traceback reaches tokenizer handling. But the lesson is still important:

“Qwen-ish support exists” does not necessarily mean “my exact Qwen3.5 variant, my exact tokenizer files, and my exact llama.cpp commit are supported.”

Also, the official Qwen/Qwen3.5-4B repo contains several important tokenizer/config/processor files. The file list includes things like:

tokenizer.json
tokenizer_config.json
vocab.json
merges.txt
chat_template.jinja
preprocessor_config.json
video_preprocessor_config.json
config.json

See the repo file listing here: Qwen/Qwen3.5-4B/tree/main.

For Qwen3.5, I would treat tokenizer and processor files as part of the model contract, not as disposable side files.

Most likely causes, ranked

1. Your fine-tuned or merged folder has tokenizer drift

This is the most likely case if:

the original base model converts with the same llama.cpp commit,
your fine-tuned/merged folder fails,
you did not intentionally add tokens,
your training/export tool saved or regenerated tokenizer files.

Tokenizer drift means that files such as these differ from the base model:

tokenizer.json
tokenizer_config.json
vocab.json
merges.txt
special_tokens_map.json
added_tokens.json
chat_template.jinja
preprocessor_config.json
video_preprocessor_config.json

This can happen even if you never manually edited tokenizer files. Fine-tuning tools often call save_pretrained(), copy partial artifacts, rewrite tokenizer_config.json, alter chat templates, or omit files that the original base repo had.

If the tokenizer was not intentionally changed during training, the safest practical fix is often to copy the tokenizer-related files from the exact base model revision back into the merged folder.

2. Your llama.cpp checkout is too old for the exact Qwen3.5 path

If the original base model also fails with the same kind of error, then your fine-tune is probably not the main issue.

In that case, update llama.cpp itself, not just Python packages:

cd <llama_cpp_dir>
git pull --rebase
python -m pip install -U -r requirements.txt
python convert_hf_to_gguf_update.py

Then retry converting the base model.

Qwen3.5-related converter support has changed recently, so the exact llama.cpp commit matters.

3. You added or changed tokens during fine-tuning

If your training code did anything like:

tokenizer.add_tokens(...)
tokenizer.add_special_tokens(...)
model.resize_token_embeddings(len(tokenizer))

then copying base tokenizer files can be wrong.

Why? Because the model’s embedding matrix may now contain rows for new token IDs. If you overwrite the tokenizer with the base tokenizer, token IDs and embedding rows can disagree.

In that case, first verify:

len(tokenizer) == config.vocab_size == embedding rows

If these do not match, fix the merged Transformers folder before trying GGUF conversion.

Relevant background:

4. You are mixing Qwen3.5 base / instruct / text-only / multimodal artifacts

Qwen3.5-4B is not just an old-style plain text-only layout. Some Qwen3.5 workflows involve multimodal files, processor configs, chat templates, or separate projector handling.

Be careful not to mix files from:

Qwen/Qwen3.5-4B
Qwen/Qwen3.5-4B-Base
an Unsloth Qwen3.5 repo
a text-only derivative
a LoRA adapter folder
a merged full model folder
a GGUF repo

Use tokenizer files from the exact model and revision you trained from, not from a “nearby” Qwen model.

Useful references:

The decisive diagnostic test

Before editing anything, test the original base model with the same llama.cpp commit.

Step 1: record your environment

cd <llama_cpp_dir>
git rev-parse HEAD
python --version
python -m pip show transformers tokenizers huggingface_hub gguf sentencepiece protobuf

Also record:

base model: <base_model_name>
base revision: <base_model_revision_or_unknown>
fine-tuning method: <lora_qlora_full_finetune>
merged folder: <merged_model_dir>
did you add tokens: <yes_or_no>
did you change chat_template: <yes_or_no>
target: <text_only_or_multimodal>

Step 2: download the exact base model

If the base was Qwen/Qwen3.5-4B:

hf download Qwen/Qwen3.5-4B \
  --local-dir <base_model_dir> \
  --include "*.safetensors" \
  --include "*.json" \
  --include "*.txt" \
  --include "*.jinja"

If you know the exact revision you trained from, pin it:

hf download Qwen/Qwen3.5-4B \
  --revision <base_model_revision> \
  --local-dir <base_model_dir> \
  --include "*.safetensors" \
  --include "*.json" \
  --include "*.txt" \
  --include "*.jinja"

Step 3: try converting the base model

python <llama_cpp_dir>/convert_hf_to_gguf.py \
  <base_model_dir> \
  --outtype bf16 \
  --outfile <base_model_dir>/base-bf16.gguf

Use BF16/F16 for debugging. Do not make your first target a 4-bit quant.

The normal Qwen flow is:

Transformers folder -> high-precision GGUF -> quantized GGUF

See the official Qwen llama.cpp quantization guide: llama.cpp - Qwen.

How to interpret the result

Result	Meaning
Base model converts	llama.cpp probably supports the base tokenizer. Your fine-tuned/merged folder likely drifted.
Base model fails with the same BPE pre-tokenizer hash	Your llama.cpp checkout probably does not support that exact tokenizer state.
Base model fails with architecture error	You likely need newer llama.cpp Qwen3.5 architecture support.
Base converts, fine-tuned model fails	Compare and probably restore tokenizer files, unless you added tokens.

This test is the most important one.

Compare tokenizer files

Run this against the base folder and your merged/fine-tuned folder:

from pathlib import Path
import hashlib

base = Path("<base_model_dir>")
ft = Path("<merged_model_dir>")

files = [
    "tokenizer.json",
    "tokenizer_config.json",
    "vocab.json",
    "merges.txt",
    "chat_template.jinja",
    "special_tokens_map.json",
    "added_tokens.json",
    "config.json",
    "processor_config.json",
    "preprocessor_config.json",
    "video_preprocessor_config.json",
]

def sha(p):
    if not p.exists():
        return "MISSING"
    return hashlib.sha256(p.read_bytes()).hexdigest()

for name in files:
    b = sha(base / name)
    f = sha(ft / name)
    print(f"{name:32} {'same' if b == f else 'DIFF'}")
    print(f"  base: {b}")
    print(f"  ft:   {f}")

Suspicious results if you did not add tokens:

tokenizer.json                  DIFF
tokenizer_config.json           DIFF
vocab.json                      MISSING
merges.txt                      MISSING
added_tokens.json               added or changed
special_tokens_map.json         changed
chat_template.jinja             missing or changed
processor/preprocessor files    missing

Check whether tokenization actually changed

Hashes are useful, but direct token-ID comparison is even more concrete.

from transformers import AutoTokenizer

base_tok = AutoTokenizer.from_pretrained("<base_model_dir>", trust_remote_code=True)
ft_tok = AutoTokenizer.from_pretrained("<merged_model_dir>", trust_remote_code=True)

tests = [
    "Hello world",
    " Hello world",
    "Hello\nworld",
    "a  b   c",
    "你好，世界",
    "こんにちは世界",
    "🙂🚀 café naïve",
    "<|im_start|>user\nHello<|im_end|>\n<|im_start|>assistant\n",
    "def f(x):\n    return x + 1",
]

for s in tests:
    b = base_tok.encode(s, add_special_tokens=False)
    f = ft_tok.encode(s, add_special_tokens=False)

    print("\nTEXT:", repr(s))
    print("same:", b == f)

    if b != f:
        print("base:", b[:100])
        print("ft:  ", f[:100])

If these differ and you did not intentionally change the tokenizer, that strongly points to tokenizer drift.

If you did not add tokens: likely fix

If all of this is true:

base model converts,
fine-tuned/merged model fails,
you did not add tokens,
tokenizer files differ,

then copy tokenizer/config support files from the exact base model revision into your merged folder.

cp <base_model_dir>/tokenizer.json <merged_model_dir>/
cp <base_model_dir>/tokenizer_config.json <merged_model_dir>/
cp <base_model_dir>/vocab.json <merged_model_dir>/
cp <base_model_dir>/merges.txt <merged_model_dir>/
cp <base_model_dir>/chat_template.jinja <merged_model_dir>/

cp <base_model_dir>/special_tokens_map.json <merged_model_dir>/ 2>/dev/null || true
cp <base_model_dir>/added_tokens.json <merged_model_dir>/ 2>/dev/null || true
cp <base_model_dir>/processor_config.json <merged_model_dir>/ 2>/dev/null || true
cp <base_model_dir>/preprocessor_config.json <merged_model_dir>/ 2>/dev/null || true
cp <base_model_dir>/video_preprocessor_config.json <merged_model_dir>/ 2>/dev/null || true

Then rerun conversion:

python <llama_cpp_dir>/convert_hf_to_gguf.py \
  <merged_model_dir> \
  --outtype bf16 \
  --outfile <output_bf16_gguf>

After that succeeds, quantize:

<llama_cpp_dir>/build/bin/llama-quantize \
  <output_bf16_gguf> \
  <output_q4_k_m_gguf> \
  Q4_K_M

This is the fix I would try first in your case, assuming no tokens were added.

If you added tokens: do not copy blindly

If you added tokens, check tokenizer/model consistency first:

from transformers import AutoTokenizer, AutoConfig, AutoModelForCausalLM

path = "<merged_model_dir>"

tok = AutoTokenizer.from_pretrained(path, trust_remote_code=True)
cfg = AutoConfig.from_pretrained(path, trust_remote_code=True)

print("len(tokenizer):", len(tok))
print("config.vocab_size:", getattr(cfg, "vocab_size", None))
print("added vocab size:", len(tok.get_added_vocab()))
print("added vocab:", tok.get_added_vocab())

model = AutoModelForCausalLM.from_pretrained(
    path,
    torch_dtype="auto",
    device_map="cpu",
    trust_remote_code=True,
)

print("embedding rows:", model.get_input_embeddings().weight.shape[0])

if model.get_output_embeddings() is not None:
    print("output rows:", model.get_output_embeddings().weight.shape[0])

You want:

len(tokenizer) == config.vocab_size == embedding rows

If that does not hold, fix the merged Transformers model first.

If the tokenizer is intentionally modified and internally consistent, then llama.cpp may genuinely need support for that tokenizer fingerprint. In that case, copying the base tokenizer would hide the real issue and may break the model.

If the original base model also fails

If the base model fails too, stop debugging the fine-tuned folder. Use a fresh current llama.cpp checkout:

git clone https://github.com/ggml-org/llama.cpp <llama_cpp_clean_dir>
cd <llama_cpp_clean_dir>

python -m pip install -U -r requirements.txt

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --build build --config Release

python convert_hf_to_gguf_update.py

Then retry:

python <llama_cpp_clean_dir>/convert_hf_to_gguf.py \
  <base_model_dir> \
  --outtype bf16 \
  --outfile <base_model_dir>/base-bf16.gguf

If it still fails with the same chkhsh, then it is probably an upstream llama.cpp support issue for that exact tokenizer/model revision.

A good report should include:

base model: <base_model_name>
base revision: <base_model_revision>
fine-tuned model: <fine_tuned_model_or_local_only>
llama.cpp commit: <commit_hash>
python version: <python_version>
transformers version: <transformers_version>
tokenizers version: <tokenizers_version>
did you add tokens: <yes_or_no>
did you change chat_template: <yes_or_no>
did you merge LoRA: <yes_or_no>
target: <text_only_or_multimodal>
full converter command: <command>
chkhsh: 1444df51289cfa8063b96f0e62b1125440111bc79a52003ea14b6eac7016fd5f

Also include hashes:

sha256sum \
  <merged_model_dir>/tokenizer.json \
  <merged_model_dir>/tokenizer_config.json \
  <merged_model_dir>/vocab.json \
  <merged_model_dir>/merges.txt \
  <merged_model_dir>/special_tokens_map.json \
  <merged_model_dir>/added_tokens.json \
  <merged_model_dir>/chat_template.jinja \
  2>/dev/null

Why manual hash patching is risky

You may be tempted to edit convert_hf_to_gguf.py and add something like:

if chkhsh == "1444df51289cfa8063b96f0e62b1125440111bc79a52003ea14b6eac7016fd5f":
    res = "qwen35"

or:

res = "qwen2"

I would not do that as the first fix.

The hash is only a fingerprint. The actual GGUF needs a correct tokenizer.ggml.pre value that llama.cpp can reproduce at runtime. If you map the hash to the wrong pre-tokenizer, the conversion may succeed but inference can be subtly broken.

This is worse than a clean failure.

Only consider a manual mapping if you can prove:

the base tokenizer and fine-tuned tokenizer encode a broad set of test strings identically,
the tokenizer JSON pre-tokenizer is equivalent to an existing llama.cpp pre-tokenizer,
llama.cpp runtime tokenizer code supports that behavior,
generated text and/or perplexity look sane after conversion.

Relevant source: convert_hf_to_gguf_update.py.

Conversion and quantization order

Do not debug this by jumping straight to Q4_K_M.

Use the standard two-step route:

Transformers model folder
  -> high-precision GGUF: BF16/F16/F32
  -> quantized GGUF: Q4_K_M, Q5_K_M, Q8_0, etc.

For Qwen models, the Qwen docs show converting first, often with --outtype bf16, then quantizing with llama-quantize. See: Qwen llama.cpp quantization guide.

Example:

python <llama_cpp_dir>/convert_hf_to_gguf.py \
  <merged_model_dir> \
  --outtype bf16 \
  --outfile <model_bf16_gguf>

<llama_cpp_dir>/build/bin/llama-quantize \
  <model_bf16_gguf> \
  <model_q4_k_m_gguf> \
  Q4_K_M

If quality matters, consider an importance matrix later, but only after the BF16/F16 GGUF conversion works.

Related issues and references

Useful references for this class of problem:

convert_hf_to_gguf_update.py — the relevant pre-tokenizer hash/update logic.
Hugging Face llama.cpp integration docs — explains the HF-to-GGUF conversion path.
Hugging Face GGUF docs — format-level background.
Qwen llama.cpp quantization guide — Qwen-specific convert/quantize/evaluate flow.
Qwen/Qwen3.5-4B — official model repo.
Qwen/Qwen3.5-4B/tree/main — file list to compare against.
Qwen3_5ForCausalLM is not supported — Qwen3.5 architecture support context.
convert_hf_to_gguf.py does not support text Qwen3.5 — Qwen3.5 text conversion context.
WARNING: The BPE pre-tokenizer was not recognized — same warning pattern with chkhsh.
BPE pre-tokenizer not recognized for several models — shows this is a general converter compatibility class, not Qwen-only.
Hugging Face Tokenizers components — pre-tokenizer background.
Hugging Face Tokenizers pipeline — tokenizer pipeline background.
Hugging Face LLM course: BPE — beginner-friendly BPE explanation.
Transformers tokenizer summary — byte-level BPE and tokenizer concepts.
Unsloth Qwen3.5 docs — Qwen3.5 runtime/frontend caveats.

My best guess for your case

Given the exact traceback and the fact that upgrading transformers plus running convert_hf_to_gguf_update.py did not fix it, my best guess is:

Your fine-tuned/merged Qwen3.5-4B folder has tokenizer drift or missing tokenizer-side files.

The fix I would try first is:

Convert the original base model with the same llama.cpp commit.
If the base converts, compare tokenizer files.
If you did not add tokens, copy the exact base tokenizer/config/processor files into the merged folder.
Convert to BF16/F16 GGUF.
Quantize only after conversion succeeds.

If the original base model also fails, then this is probably not your fine-tune. It is more likely a llama.cpp support issue for that exact Qwen3.5 tokenizer/model revision.

Short checklist

Record llama.cpp commit.
Record transformers, tokenizers, huggingface_hub, and Python versions.
Confirm the exact base model and revision.
Confirm whether tokens were added.
Try converting the original base model.
Compare tokenizer.json, tokenizer_config.json, vocab.json, merges.txt, chat_template.jinja, and special-token files.
If no tokens were added, restore tokenizer files from the exact base model revision.
Convert to BF16/F16 GGUF first.
Quantize to Q4_K_M only after the high-precision GGUF conversion succeeds.
Do not manually map the hash unless tokenizer equivalence and runtime support are verified.

kegintheai · May 3, 2026, 4:42pm

Hi @John6666 @Lightcap thank you so much for your valuable feedback. I truly appreciate it!

The issue was fixed by appending the new hash to convert_hf_to_gguf.py as shown below:

if chkhsh == “1444df51289cfa8063b96f0e62b1125440111bc79a52003ea14b6eac7016fd5f”:
res = “qwen2”

kegintheai · May 5, 2026, 7:09am

I believe the following should be added to the llama.cpp-convert_hf_to_gguf.py function:

def get_vocab_base_pre(self, tokenizer)

if chkhsh == “1444df51289cfa8063b96f0e62b1125440111bc79a52003ea14b6eac7016fd5f”:

# ref: Qwen/Qwen3.5-4B-Base · Hugging Face

res = “qwen2”

John6666 · May 5, 2026, 6:16pm

Hmm… Seems there’s already a similar issue on official GitHub:
https://github.com/ggml-org/llama.cpp/issues/22700

Maybe it would be better to leave a comment there instead of opening a new issue.

kegintheai · May 7, 2026, 5:06am

I added it yesterday and I’m happy to report that the GitHub team responded quickly and the issue was resolved as part of [Model] Support MiniCPM-V 4.6 by tc-mb · Pull Request #22529 · ggml-org/llama.cpp · GitHub

Topic		Replies	Views
Fine tuning gguf models? 🤗Transformers	1	1496	April 30, 2024
Hugging Face to GGUF Conversion Broken? 🤗Hub	1	5572	February 11, 2024
Transformers v3.0.0 is out! 🤗Transformers	0	2007	July 7, 2020
Seeking Advice: Qwen3.5-27B failing on Inference Endpoints — is Unsloth GGUF a viable alternative for text editing? Models	2	343	March 3, 2026
How can I get a list of word segmentation results for non-English string? 🤗Transformers	14	119	November 6, 2025