Configuration Parsing Warning:Config file config.json cannot be fetched (too big)

Configuration Parsing Warning:Config file tokenizer_config.json cannot be fetched (too big)

YAML Metadata Warning:empty or missing yaml metadata in repo card

Check out the documentation for more information.

GptOssDense

GptOssDense is a dense variant of the GptOss model architecture. While GptOss uses a Mixture-of-Experts (MoE) approach with routing, GptOssDense replaces the MoE layer with a standard dense feedforward network (FFN).

✅ Verified to work with trust_remote_code=True on stable transformers (v4.40+)

Model Architecture

Attention: Same as GptOss with sliding window attention and sink tokens
MLP: Dense FFN with GLU activation (instead of MoE with router)
Activation: Same GLU activation as GptOss experts: (up + 1) * gate * sigmoid(gate * alpha) where alpha=1.702
Normalization: RMSNorm
RoPE: YaRN (Yet another RoPE extensioN)

Key Differences from GptOss

Feature	GptOss	GptOssDense
MLP Type	Mixture-of-Experts	Dense FFN
Router	Yes	No
Experts	Multiple (128)	Single
Parameters	More (due to multiple experts)	Fewer
Inference	Routes tokens to top-k experts	Single FFN for all tokens

Usage

Quick Start - Random Initialization

Try the model with randomly initialized weights (outputs will be random):

from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer
import torch

# Load config and tokenizer
config = AutoConfig.from_pretrained("marksverdhei/gpt-oss-dense", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("marksverdhei/gpt-oss-dense")

# Initialize model with random weights
model = AutoModelForCausalLM.from_config(config, trust_remote_code=True)
model.eval()

# Generate text (will be random since model is not trained)
prompt = "Hello, how are you?"
inputs = tokenizer(prompt, return_tensors="pt")

with torch.no_grad():
    outputs = model.generate(
        inputs.input_ids,
        max_new_tokens=20,
        do_sample=True,
        temperature=1.0,
        top_k=50,
        pad_token_id=tokenizer.pad_token_id
    )

print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# Example output: "Hello, how are you? pronunci bhithCiudadstdafxipseігlanders導 conveyoruviainn"
# (random tokens since model is not trained)

Loading Pre-trained Weights (when available)

Once model weights are uploaded to the repository:

from transformers import AutoModelForCausalLM, AutoTokenizer

# Load model with weights
model = AutoModelForCausalLM.from_pretrained(
    "marksverdhei/gpt-oss-dense",
    trust_remote_code=True
)

# Load tokenizer (you'll need to upload a tokenizer)
tokenizer = AutoTokenizer.from_pretrained("marksverdhei/gpt-oss-dense")

# Generate text
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))

With transformers fork

Using the marksverdhei/transformers fork where GptOssDense is registered:

# Install the fork
pip install git+https://github.com/marksverdhei/transformers.git

from transformers import GptOssDenseForCausalLM, GptOssDenseConfig

config = GptOssDenseConfig()
model = GptOssDenseForCausalLM(config)

Model Configuration

Matches openai/gpt-oss-20b configuration (dense variant):

Hidden size: 2880
Intermediate size: 2880
Number of layers: 24
Number of attention heads: 64
Number of key-value heads: 8
Head dimension: 64
Vocabulary size: 201,088
Max position embeddings: 131,072
Initial context length: 4,096
Sliding window: 128
RoPE type: YaRN with factor 32.0
SwiGLU limit: 7.0
Total parameters: ~2.4B

License

Apache 2.0

Citation

If you use this model, please cite the original GptOss work and acknowledge this dense variant.

Downloads last month: 6

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support