NeuronAI-Uzbek

NeuronAI-Uzbek is a Qwen3-family causal language model fine-tuned to be helpful for Uzbek (primary) and English. This repository contains model weights (safetensors shards), tokenizer files, and a chat template.

Model summary

Architecture: Qwen3ForCausalLM (decoder-only)
Dtype: bfloat16
Layers: 36
Hidden size: 2560
Attention heads: 32 (KV heads: 8)
Vocab size: 180,000
Max position embeddings: 40,960 (model config)
Generation defaults
- temperature=0.6
- top_p=0.95
- top_k=20

Note: This model is from the Qwen3 family and is intended to be used with recent transformers.

Training data (token counts)

This model was trained on a mixture of:

Uzbek: 1.2B tokens
English: 0.8B tokens

Total: 2.0B tokens.

Training process

We trained NeuronAI-Uzbek in stages:

Data preparation
- Collected Uzbek- and English-language text.
- Cleaned and normalized text (deduplication/format normalization).
- Tokenized into a mixed Uzbek/English stream.
Model training / adaptation
- Continued training / adaptation on the mixed corpus (2.0B tokens total) to improve Uzbek capability while retaining English.
Supervised fine-tuning (SFT)
Export
- Exported weights to safetensors shards + index.
- Uploaded to Hugging Face.

Intended use

Primary: chat assistant for Uzbek, including general Q&A, drafting, summarization, translation (Uzbek↔English), and instruction following.
Secondary: English chat and general text generation.

Limitations and risks

The model can generate incorrect or hallucinated information.
It may reflect biases present in the training data.
It is not guaranteed safe for medical/legal/financial advice.
Uzbek language variants/dialects and domain-specific jargon may be weaker.

How to use

Requirements

transformers (a recent version)
torch

Text generation (Transformers)

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "NeuronUz/NeuronAI-Uzbek"

tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

prompt = "Uzbek tilida qisqa va aniq qilib sun'iy intellekt nima ekanligini tushuntir."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    out = model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=True,
        temperature=0.6,
        top_p=0.95,
        top_k=20,
    )

print(tokenizer.decode(out[0], skip_special_tokens=True))

Chat formatting

This repository includes a chat_template.jinja. Some environments may not automatically load it into the tokenizer; if tokenizer.chat_template is empty, you can set it manually:

from pathlib import Path
from transformers import AutoTokenizer

repo_id = "NeuronUz/NeuronAI-Uzbek"

tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)

if not getattr(tokenizer, "chat_template", None):
    tokenizer.chat_template = Path("chat_template.jinja").read_text(encoding="utf-8")

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Uzbek tilida menga salom ber."},
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
print(text)

If you are running in a notebook or environment where the template file is not present locally, download it from the repo first (or copy the template content directly).

Example prompts

Uzbek:
- "Quyidagi matnni xulosa qil: ..."
- "Menga Python'da fayl o'qish misolini ko'rsat."
- "Inglizchadan o'zbekchaga tarjima qil: ..."
English:
- "Explain gradient checkpointing in simple terms."
- "Summarize this document in bullet points: ..."

License

The license for this release is currently marked as other because the upstream/base and dataset licensing details are not fully specified in this repository. If you want, I can update this section once you confirm the intended license.

Citation

If you use this model, please cite the repository:

@misc{neuronai_uzbek,
  title        = {NeuronAI-Uzbek},
  author       = {NeuronUz},
  howpublished = {\url{https://huggingface.co/NeuronUz/NeuronAI-Uzbek}},
  year         = {2025}
}

Downloads last month: 17

Safetensors

Model size

4B params

Tensor type

BF16

Model tree for NeuronUz/NeuronAI-Uzbek

Base model

Qwen/Qwen3-4B-Base

Finetuned

Qwen/Qwen3-4B

Finetuned

(375)

this model