Configuration Parsing Warning: Config file config.json cannot be fetched (too big)
Configuration Parsing Warning: Config file tokenizer_config.json cannot be fetched (too big)
YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)
GptOssDense
GptOssDense is a dense variant of the GptOss model architecture. While GptOss uses a Mixture-of-Experts (MoE) approach with routing, GptOssDense replaces the MoE layer with a standard dense feedforward network (FFN).
✅ Verified to work with trust_remote_code=True on stable transformers (v4.40+)
Model Architecture
- Attention: Same as GptOss with sliding window attention and sink tokens
- MLP: Dense FFN with GLU activation (instead of MoE with router)
- Activation: Same GLU activation as GptOss experts:
(up + 1) * gate * sigmoid(gate * alpha)wherealpha=1.702 - Normalization: RMSNorm
- RoPE: YaRN (Yet another RoPE extensioN)
Key Differences from GptOss
| Feature | GptOss | GptOssDense |
|---|---|---|
| MLP Type | Mixture-of-Experts | Dense FFN |
| Router | Yes | No |
| Experts | Multiple (128) | Single |
| Parameters | More (due to multiple experts) | Fewer |
| Inference | Routes tokens to top-k experts | Single FFN for all tokens |
Usage
Quick Start - Random Initialization
Try the model with randomly initialized weights (outputs will be random):
from transformers import AutoConfig, AutoModelForCausalLM, AutoTokenizer
import torch
# Load config and tokenizer
config = AutoConfig.from_pretrained("marksverdhei/gpt-oss-dense", trust_remote_code=True)
tokenizer = AutoTokenizer.from_pretrained("marksverdhei/gpt-oss-dense")
# Initialize model with random weights
model = AutoModelForCausalLM.from_config(config, trust_remote_code=True)
model.eval()
# Generate text (will be random since model is not trained)
prompt = "Hello, how are you?"
inputs = tokenizer(prompt, return_tensors="pt")
with torch.no_grad():
outputs = model.generate(
inputs.input_ids,
max_new_tokens=20,
do_sample=True,
temperature=1.0,
top_k=50,
pad_token_id=tokenizer.pad_token_id
)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
# Example output: "Hello, how are you? pronunci bhithCiudadstdafxipseігlanders導 conveyoruviainn"
# (random tokens since model is not trained)
Loading Pre-trained Weights (when available)
Once model weights are uploaded to the repository:
from transformers import AutoModelForCausalLM, AutoTokenizer
# Load model with weights
model = AutoModelForCausalLM.from_pretrained(
"marksverdhei/gpt-oss-dense",
trust_remote_code=True
)
# Load tokenizer (you'll need to upload a tokenizer)
tokenizer = AutoTokenizer.from_pretrained("marksverdhei/gpt-oss-dense")
# Generate text
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs, max_length=50)
print(tokenizer.decode(outputs[0]))
With transformers fork
Using the marksverdhei/transformers fork where GptOssDense is registered:
# Install the fork
pip install git+https://github.com/marksverdhei/transformers.git
from transformers import GptOssDenseForCausalLM, GptOssDenseConfig
config = GptOssDenseConfig()
model = GptOssDenseForCausalLM(config)
Model Configuration
Matches openai/gpt-oss-20b configuration (dense variant):
- Hidden size: 2880
- Intermediate size: 2880
- Number of layers: 24
- Number of attention heads: 64
- Number of key-value heads: 8
- Head dimension: 64
- Vocabulary size: 201,088
- Max position embeddings: 131,072
- Initial context length: 4,096
- Sliding window: 128
- RoPE type: YaRN with factor 32.0
- SwiGLU limit: 7.0
- Total parameters: ~2.4B
License
Apache 2.0
Citation
If you use this model, please cite the original GptOss work and acknowledge this dense variant.
- Downloads last month
- 13
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support