Instructions to use mykor/Konan-LLM-OND-gguf with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- llama-cpp-python
How to use mykor/Konan-LLM-OND-gguf with llama-cpp-python:
# !pip install llama-cpp-python from llama_cpp import Llama llm = Llama.from_pretrained( repo_id="mykor/Konan-LLM-OND-gguf", filename="Konan-LLM-OND-BF16.gguf", )
llm.create_chat_completion( messages = "No input example has been defined for this model task." )
- Notebooks
- Google Colab
- Kaggle
- Local Apps
- llama.cpp
How to use mykor/Konan-LLM-OND-gguf with llama.cpp:
Install from brew
brew install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf mykor/Konan-LLM-OND-gguf:Q4_K_M # Run inference directly in the terminal: llama-cli -hf mykor/Konan-LLM-OND-gguf:Q4_K_M
Install from WinGet (Windows)
winget install llama.cpp # Start a local OpenAI-compatible server with a web UI: llama-server -hf mykor/Konan-LLM-OND-gguf:Q4_K_M # Run inference directly in the terminal: llama-cli -hf mykor/Konan-LLM-OND-gguf:Q4_K_M
Use pre-built binary
# Download pre-built binary from: # https://github.com/ggerganov/llama.cpp/releases # Start a local OpenAI-compatible server with a web UI: ./llama-server -hf mykor/Konan-LLM-OND-gguf:Q4_K_M # Run inference directly in the terminal: ./llama-cli -hf mykor/Konan-LLM-OND-gguf:Q4_K_M
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git cd llama.cpp cmake -B build cmake --build build -j --target llama-server llama-cli # Start a local OpenAI-compatible server with a web UI: ./build/bin/llama-server -hf mykor/Konan-LLM-OND-gguf:Q4_K_M # Run inference directly in the terminal: ./build/bin/llama-cli -hf mykor/Konan-LLM-OND-gguf:Q4_K_M
Use Docker
docker model run hf.co/mykor/Konan-LLM-OND-gguf:Q4_K_M
- LM Studio
- Jan
- Ollama
How to use mykor/Konan-LLM-OND-gguf with Ollama:
ollama run hf.co/mykor/Konan-LLM-OND-gguf:Q4_K_M
- Unsloth Studio new
How to use mykor/Konan-LLM-OND-gguf with Unsloth Studio:
Install Unsloth Studio (macOS, Linux, WSL)
curl -fsSL https://unsloth.ai/install.sh | sh # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for mykor/Konan-LLM-OND-gguf to start chatting
Install Unsloth Studio (Windows)
irm https://unsloth.ai/install.ps1 | iex # Run unsloth studio unsloth studio -H 0.0.0.0 -p 8888 # Then open http://localhost:8888 in your browser # Search for mykor/Konan-LLM-OND-gguf to start chatting
Using HuggingFace Spaces for Unsloth
# No setup required # Open https://huggingface.co/spaces/unsloth/studio in your browser # Search for mykor/Konan-LLM-OND-gguf to start chatting
- Docker Model Runner
How to use mykor/Konan-LLM-OND-gguf with Docker Model Runner:
docker model run hf.co/mykor/Konan-LLM-OND-gguf:Q4_K_M
- Lemonade
How to use mykor/Konan-LLM-OND-gguf with Lemonade:
Pull the model
# Download Lemonade from https://lemonade-server.ai/ lemonade pull mykor/Konan-LLM-OND-gguf:Q4_K_M
Run and chat with the model
lemonade run user.Konan-LLM-OND-gguf-Q4_K_M
List all available models
lemonade list
Konan-LLM-OND
Overview
Konan-LLM-OND, a large language model from Konan Technology Inc., is based on Qwen3-4B-Base. It has been specifically optimized for the Korean language through vocabulary expansion, continual pre-training, and instruction tuning to enhance performance and efficiency.
- Languages: Primarily Korean, with support for English.
- Key Features:
- Expanded Korean Vocabulary: The model's vocabulary has been expanded with additional Korean tokens to improve tokenization efficiency. As a result, Konan-LLM-OND is approximately 30% more token-efficient with Korean input than Qwen3, leading to greater cost-effectiveness and processing speed.
- Continual Pre-training: The model underwent continual pre-training on a large-scale Korean corpus using an expanded vocabulary. This process enhanced its fundamental understanding and text generation capabilities in Korean.
- Supervised Fine-Tuning (SFT): The model was fine-tuned on a high-quality Korean instruction dataset to improve its ability to understand and execute a wide variety of real-world tasks.
Benchmark Results
Model Performance (๏ผ 5B)
| Model | Model size | Korean | English | ||||
|---|---|---|---|---|---|---|---|
| KMMLU | HRM8K | Ko-IFEval | MMLU | GSM8K | IFEval | ||
| Konan-LLM-OND | 4.0B | 50.6 | 46.4 | 68.4 | 68.8 | 86.8 | 73.3 |
| EXAONE-3.5-2.4B-Instruct | 2.4B | 44.2 | 31.8 | 60.5 | 59.1 | 81.5 | 77.7 |
| kanana-1.5-2.1b-instruct-2505 | 2.1B | 32.7 | 27.2 | 56.0 | 52.9 | 68.8 | 64.6 |
| Midm-2.0-Mini-Instruct | 2.3B | 42.4 | 36.2 | 66.8 | 57.4 | 74.8 | 68.3 |
| Qwen3-4B(w/o reasoning) | 4.0B | -(*) | 37.5 | 68.4 | -(*) | 83.9 | 80.0 |
| gemma-3-4b-it | 4.3B | 38.7 | 32.7 | 69.2 | 59.1 | 82.2 | 78.3 |
Model Performance (โฅ 7B)
| Model | Model size | Korean | English | ||||
|---|---|---|---|---|---|---|---|
| KMMLU | HRM8K | Ko-IFEval | MMLU | GSM8K | IFEval | ||
| Konan-LLM-OND | 4.0B | 50.6 | 46.4 | 68.4 | 68.8 | 86.8 | 73.3 |
| A.X-4.0-Light | 7.2B | 55.3 | 44.6 | 71.5 | 70.6 | 87.3 | 81.3 |
| EXAONE-3.5-7.8B-Instruct | 7.8B | 48.0 | 39.3 | 66.8 | 66.8 | 91.4 | 79.9 |
| kanana-1.5-8b-instruct-2505 | 8.0B | 40.4 | 35.5 | 71.1 | 63.1 | 79.3 | 76.8 |
| Midm-2.0-Base-Instruct | 11.5B | 54.2 | 46.0 | 75.0 | 70.2 | 88.9 | 79.7 |
| Qwen3-8B(w/o reasoning) | 8.1B | -(*) | 40.0 | 70.9 | -(*) | 84.0 | 82.8 |
Note:
- The highest scores are shown in bold.
- (*) Qwen3 models often failed to strictly follow the required answer format in the few-shot setting, which made the scores unreliable. After correcting the evaluation pipeline, we will update the scores.
Benchmark Setup
All benchmarks were executed using the following standardized environment.
- Evaluation Framework:
lm-evaluation-harness v0.4.9 - Runtime & Hardware: All models were served with
vLLM v0.9.1on a single NVIDIA GPU. - Inference Mode: For every benchmark, we invoked the
chat_completionsAPI, and scores were computed solely from the generated responses.
Metric Adjustments
- KMMLU was evaluated using the "kmmlu_direct" task in the lm-evaluation-harness.
- MMLU was run with the same configuration as "kmmlu_direct".
- Ko-IFEval was evaluated using the original IFEval protocol, with the dataset sourced from allganize/IFEval-Ko.
Evaluation Protocol
| Benchmark | Scoring Method | Few-shot |
|---|---|---|
| KMMLU | exact_match |
5-shot |
| HRM8K | mean of hrm8k_gsm8k, hrm8k_ksm, hrm8k_math, hrm8k_mmmlu, hrm8k_omni_math |
5-shot |
| Ko-IFEval | mean of prompt_level_strict_acc, inst_level_strict_acc, prompt_level_loose_acc, inst_level_loose_acc |
0-shot |
| MMLU | exact_match |
5-shot |
| GSM8K | exact_match & flexible-extract |
5-shot |
| IFEval | mean of prompt_level_strict_acc, inst_level_strict_acc, prompt_level_loose_acc, inst_level_loose_acc |
0-shot |
Quickstart
Konan-LLM-OND is supported in transformers v4.52.0 and later.
pip install transformers>=4.52.0
The code example below shows you how to get the model to generate content based on given inputs.
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_name = "konantech/Konan-LLM-OND"
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.bfloat16,
device_map="auto",
)
model.eval()
tokenizer = AutoTokenizer.from_pretrained(model_name)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "๋ํ๋ฏผ๊ตญ ์๋๋?"}
]
input_ids = tokenizer.apply_chat_template(messages, add_generation_prompt=True, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(
input_ids,
max_new_tokens=64,
do_sample=False,
)
len_input_prompt = len(input_ids[0])
response = tokenizer.decode(output[0][len_input_prompt:], skip_special_tokens=True)
print(response)
# ๋ํ๋ฏผ๊ตญ ์๋๋ ์์ธ์
๋๋ค.
Citation
@misc{Konan-LLM-OND-2025,
author = {Konan Technology Inc.},
title = {Konan-LLM-OND},
year = {2025},
url = {https://huggingface.co/konantech/Konan-LLM-OND}
}
- Downloads last month
- 155
2-bit
3-bit
4-bit
5-bit
6-bit
8-bit
16-bit