NeuronAI-Uzbek
NeuronAI-Uzbek is a Qwen3-family causal language model fine-tuned to be helpful for Uzbek (primary) and English. This repository contains model weights (safetensors shards), tokenizer files, and a chat template.
Model summary
- Architecture:
Qwen3ForCausalLM(decoder-only) - Dtype:
bfloat16 - Layers: 36
- Hidden size: 2560
- Attention heads: 32 (KV heads: 8)
- Vocab size: 180,000
- Max position embeddings: 40,960 (model config)
- Generation defaults
temperature=0.6top_p=0.95top_k=20
Note: This model is from the Qwen3 family and is intended to be used with recent transformers.
Training data (token counts)
This model was trained on a mixture of:
- Uzbek: 1.2B tokens
- English: 0.8B tokens
Total: 2.0B tokens.
Training process
We trained NeuronAI-Uzbek in stages:
Data preparation
- Collected Uzbek- and English-language text.
- Cleaned and normalized text (deduplication/format normalization).
- Tokenized into a mixed Uzbek/English stream.
Model training / adaptation
- Continued training / adaptation on the mixed corpus (2.0B tokens total) to improve Uzbek capability while retaining English.
Supervised fine-tuning (SFT)
Export
- Exported weights to
safetensorsshards + index. - Uploaded to Hugging Face.
- Exported weights to
Intended use
- Primary: chat assistant for Uzbek, including general Q&A, drafting, summarization, translation (UzbekโEnglish), and instruction following.
- Secondary: English chat and general text generation.
Limitations and risks
- The model can generate incorrect or hallucinated information.
- It may reflect biases present in the training data.
- It is not guaranteed safe for medical/legal/financial advice.
- Uzbek language variants/dialects and domain-specific jargon may be weaker.
How to use
Requirements
transformers(a recent version)torch
Text generation (Transformers)
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
repo_id = "NeuronUz/NeuronAI-Uzbek"
tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
repo_id,
torch_dtype=torch.bfloat16,
device_map="auto",
trust_remote_code=True,
)
prompt = "Uzbek tilida qisqa va aniq qilib sun'iy intellekt nima ekanligini tushuntir."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)
with torch.no_grad():
out = model.generate(
**inputs,
max_new_tokens=256,
do_sample=True,
temperature=0.6,
top_p=0.95,
top_k=20,
)
print(tokenizer.decode(out[0], skip_special_tokens=True))
Chat formatting
This repository includes a chat_template.jinja. Some environments may not automatically load it into the tokenizer; if tokenizer.chat_template is empty, you can set it manually:
from pathlib import Path
from transformers import AutoTokenizer
repo_id = "NeuronUz/NeuronAI-Uzbek"
tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
if not getattr(tokenizer, "chat_template", None):
tokenizer.chat_template = Path("chat_template.jinja").read_text(encoding="utf-8")
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Uzbek tilida menga salom ber."},
]
text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
print(text)
If you are running in a notebook or environment where the template file is not present locally, download it from the repo first (or copy the template content directly).
Example prompts
Uzbek:
- "Quyidagi matnni xulosa qil: ..."
- "Menga Python'da fayl o'qish misolini ko'rsat."
- "Inglizchadan o'zbekchaga tarjima qil: ..."
English:
- "Explain gradient checkpointing in simple terms."
- "Summarize this document in bullet points: ..."
License
The license for this release is currently marked as other because the upstream/base and dataset licensing details are not fully specified in this repository. If you want, I can update this section once you confirm the intended license.
Citation
If you use this model, please cite the repository:
@misc{neuronai_uzbek,
title = {NeuronAI-Uzbek},
author = {NeuronUz},
howpublished = {\url{https://huggingface.co/NeuronUz/NeuronAI-Uzbek}},
year = {2025}
}
- Downloads last month
- 17