Does Scout have Nope layers or not?

#4
by jonaskuebler - opened

Hi folks, thanks for the great models.

There seems to be some discrepancy of configs. In this model the "nope_layer_interval": 4, and the conversion script would write that into the hf checkpoint https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama4/convert_llama4_weights_to_hf.py#L237 .

On the other hand in the released hf checkpoint the "no_rope_layers": [], https://huggingface.co/meta-llama/Llama-4-Scout-17B-16E-Instruct/blob/main/config.json#L30

So the question is: does Scout have nope layers or not? Or is it save to adjust the conversion script to put an empty list for no_rope_layers as well? https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama4/convert_llama4_weights_to_hf.py#L289

Okay I think I am getting it now.

https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama4/configuration_llama4.py#L338-L343

So every fourth layer should use NOPE. I think than the problem is that the conversion script puts an integer "no_rope_layers": 4 into the config https://github.com/huggingface/transformers/blob/main/src/transformers/models/llama4/convert_llama4_weights_to_hf.py#L289. this gives an error.
But from what I understand, putting an empty list there, the config will use the default (which is the every 4-th layer strategy).

Probably then it is rather a fix for the conversion script

Sign up or log in to comment