NeuronAI-Uzbek

NeuronAI-Uzbek is a Qwen3-family causal language model fine-tuned to be helpful for Uzbek (primary) and English. This repository contains model weights (safetensors shards), tokenizer files, and a chat template.

Model summary

  • Architecture: Qwen3ForCausalLM (decoder-only)
  • Dtype: bfloat16
  • Layers: 36
  • Hidden size: 2560
  • Attention heads: 32 (KV heads: 8)
  • Vocab size: 180,000
  • Max position embeddings: 40,960 (model config)
  • Generation defaults
    • temperature=0.6
    • top_p=0.95
    • top_k=20

Note: This model is from the Qwen3 family and is intended to be used with recent transformers.

Training data (token counts)

This model was trained on a mixture of:

  • Uzbek: 1.2B tokens
  • English: 0.8B tokens

Total: 2.0B tokens.

Training process

We trained NeuronAI-Uzbek in stages:

  1. Data preparation

    • Collected Uzbek- and English-language text.
    • Cleaned and normalized text (deduplication/format normalization).
    • Tokenized into a mixed Uzbek/English stream.
  2. Model training / adaptation

    • Continued training / adaptation on the mixed corpus (2.0B tokens total) to improve Uzbek capability while retaining English.
  3. Supervised fine-tuning (SFT)

  4. Export

    • Exported weights to safetensors shards + index.
    • Uploaded to Hugging Face.

Intended use

  • Primary: chat assistant for Uzbek, including general Q&A, drafting, summarization, translation (Uzbekโ†”English), and instruction following.
  • Secondary: English chat and general text generation.

Limitations and risks

  • The model can generate incorrect or hallucinated information.
  • It may reflect biases present in the training data.
  • It is not guaranteed safe for medical/legal/financial advice.
  • Uzbek language variants/dialects and domain-specific jargon may be weaker.

How to use

Requirements

  • transformers (a recent version)
  • torch

Text generation (Transformers)

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

repo_id = "NeuronUz/NeuronAI-Uzbek"

tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    repo_id,
    torch_dtype=torch.bfloat16,
    device_map="auto",
    trust_remote_code=True,
)

prompt = "Uzbek tilida qisqa va aniq qilib sun'iy intellekt nima ekanligini tushuntir."
inputs = tokenizer(prompt, return_tensors="pt").to(model.device)

with torch.no_grad():
    out = model.generate(
        **inputs,
        max_new_tokens=256,
        do_sample=True,
        temperature=0.6,
        top_p=0.95,
        top_k=20,
    )

print(tokenizer.decode(out[0], skip_special_tokens=True))

Chat formatting

This repository includes a chat_template.jinja. Some environments may not automatically load it into the tokenizer; if tokenizer.chat_template is empty, you can set it manually:

from pathlib import Path
from transformers import AutoTokenizer

repo_id = "NeuronUz/NeuronAI-Uzbek"

tokenizer = AutoTokenizer.from_pretrained(repo_id, trust_remote_code=True)

if not getattr(tokenizer, "chat_template", None):
    tokenizer.chat_template = Path("chat_template.jinja").read_text(encoding="utf-8")

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "Uzbek tilida menga salom ber."},
]

text = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
print(text)

If you are running in a notebook or environment where the template file is not present locally, download it from the repo first (or copy the template content directly).

Example prompts

  • Uzbek:

    • "Quyidagi matnni xulosa qil: ..."
    • "Menga Python'da fayl o'qish misolini ko'rsat."
    • "Inglizchadan o'zbekchaga tarjima qil: ..."
  • English:

    • "Explain gradient checkpointing in simple terms."
    • "Summarize this document in bullet points: ..."

License

The license for this release is currently marked as other because the upstream/base and dataset licensing details are not fully specified in this repository. If you want, I can update this section once you confirm the intended license.

Citation

If you use this model, please cite the repository:

@misc{neuronai_uzbek,
  title        = {NeuronAI-Uzbek},
  author       = {NeuronUz},
  howpublished = {\url{https://huggingface.co/NeuronUz/NeuronAI-Uzbek}},
  year         = {2025}
}
Downloads last month
17
Safetensors
Model size
4B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for NeuronUz/NeuronAI-Uzbek

Base model

Qwen/Qwen3-4B-Base
Finetuned
Qwen/Qwen3-4B
Finetuned
(375)
this model