How to use from
llama.cpp
Install from brew
brew install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Yehor/kulyk-en-uk:Q8_0
# Run inference directly in the terminal:
llama-cli -hf Yehor/kulyk-en-uk:Q8_0
Install from WinGet (Windows)
winget install llama.cpp
# Start a local OpenAI-compatible server with a web UI:
llama-server -hf Yehor/kulyk-en-uk:Q8_0
# Run inference directly in the terminal:
llama-cli -hf Yehor/kulyk-en-uk:Q8_0
Use pre-built binary
# Download pre-built binary from:
# https://github.com/ggerganov/llama.cpp/releases
# Start a local OpenAI-compatible server with a web UI:
./llama-server -hf Yehor/kulyk-en-uk:Q8_0
# Run inference directly in the terminal:
./llama-cli -hf Yehor/kulyk-en-uk:Q8_0
Build from source code
git clone https://github.com/ggerganov/llama.cpp.git
cd llama.cpp
cmake -B build
cmake --build build -j --target llama-server llama-cli
# Start a local OpenAI-compatible server with a web UI:
./build/bin/llama-server -hf Yehor/kulyk-en-uk:Q8_0
# Run inference directly in the terminal:
./build/bin/llama-cli -hf Yehor/kulyk-en-uk:Q8_0
Use Docker
docker model run hf.co/Yehor/kulyk-en-uk:Q8_0
Quick Links

A lightweight model to do machine translation from English to Ukrainian based on recently published LFM2 model. Use demo to test it.

Also, there's another model: kulyk-uk-en

Run with Docker (CPU):

docker run -p 3000:3000 --rm ghcr.io/egorsmkv/kulyk-rust:latest

Run using Apptainer (CUDA):

  1. Run it using shell:
apptainer shell --nv ./kulyk.sif

Apptainer> /opt/entrypoints/kulyk --verbose --n-len 1024 --model-path-ue /project/models/kulyk-uk-en.gguf --model-path-eu /project/models/kulyk-en-uk.gguf
  1. Run it as a webservice:
apptainer instance start --nv ./kulyk.sif kulyk-ws

# go to http://localhost:3000

Facts:

  • Fine-tuned with 40M samples (filtered by quality metric) from ~53.5M for 1.4 epochs
  • 354M params
  • Requires 1 GB of RAM to run with bf16
  • BLEU on FLORES-200: 27.24
  • Tokens per second: 229.93 (bs=1), 1664.40 (bs=10), 8392.48 (bs=64)
  • License: lfm1.0

Info:

  • Model name is inherited from name of Sergiy Kulyk who was chargรฉ d'affaires of Ukraine in the United States

Training Info:

  • Learning Rate: 3e-5
  • Learning Rate scheduler type: cosine
  • Warmup Ratio: 0.05
  • Max length: 2048
  • Batch Size: 10
  • packed=True
  • Sentences <= 1000 chars
  • Gradient accumulation steps: 4
  • Used Flash Attention 2
  • Time for epoch: 32 hours
  • 2 cards of NVIDIA RTX 3090 Ti (24G)
  • accelerate with DeepSpeed, offloading into CPU
  • Memory usage: 22.212GB-22.458GB
  • torch 2.7.1

Acknowledgements:

  • Serhiy Stetskovych for providing compute to train this model
  • lang-uk members for their compilation of different MT datasets
Downloads last month
63
Safetensors
Model size
0.4B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for Yehor/kulyk-en-uk

Quantized
(33)
this model
Finetunes
1 model
Quantizations
3 models

Dataset used to train Yehor/kulyk-en-uk

Spaces using Yehor/kulyk-en-uk 3

Evaluation results