gg-hf-g (gg-hf-g)

posted an update 4 days ago

Post

3208

100,000+ models trained with Unsloth have now been open-sourced on 🤗Hugging Face! 🦥

Here are the most popular ones you can run local:
1. TeichAI - GLM-4.7-Flash distilled from Claude 4.5 Opus (high)
2. Zed - Qwen Coder 7B fine-tuned for stronger coding
3. DavidAU - Llama-3.3-8B distilled from Claude 4.5 Opus (high)
4. huihui - gpt-oss made “abliberated”

Links to models:
1. TeichAI: TeichAI/GLM-4.7-Flash-Claude-Opus-4.5-High-Reasoning-Distill-GGUF
2. Zed: zed-industries/zeta
3. DavidAU: DavidAU/Llama3.3-8B-Instruct-Thinking-Claude-4.5-Opus-High-Reasoning
4. huihui: huihui-ai/Huihui-gpt-oss-20b-BF16-abliterated

See all the 100K latest models fine-tuned with Unsloth here: https://huggingface.co/models?other=u

1 reply

·

mrpeerat

authored a paper 4 days ago

SEA-Guard: Culturally Grounded Multilingual Safeguard for Southeast Asia

Paper • 2602.01618 • Published 26 days ago • 2

danielhanchen

posted an update 8 days ago

Post

2618

We collabed with HF on showing how you can use HF Jobs and Unsloth! https://huggingface.co/blog/unsloth-jobs

danielhanchen

posted an update 11 days ago

Post

5621

You can now run Qwen3.5 locally! 💜
Qwen3.5-397B-A17B is an open MoE vision reasoning LLM for agentic coding & chat. It performs on par with Gemini 3 Pro, Claude Opus 4.5 & GPT-5.2.

GGUF: unsloth/Qwen3.5-397B-A17B-GGUF
Run Dynamic 3-bit on a 192GB Mac for 20 tokens/s.

Guide: https://unsloth.ai/docs/models/qwen3.5

9 replies

·

danielhanchen

posted an update 12 days ago

Post

8420

You can now run MiniMax-2.5 locally! 🚀
At 230B parameters, MiniMax-2.5 is the strongest LLM under 700B params, delivering SOTA agentic coding & chat.

Run Dynamic 3/4-bit on a 128GB Mac for 20 tokens/s.
Guide: https://unsloth.ai/docs/models/minimax-2.5
GGUF: unsloth/MiniMax-M2.5-GGUF

1 reply

·

danielhanchen

posted an update 17 days ago

Post

5152

We collaborated with Hugging Face to enable you to train MoE models 12× faster with 35% less VRAM via our new Triton kernels (no accuracy loss). 🤗

Train gpt-oss locally on 12.8GB VRAM with our free notebooks: https://unsloth.ai/docs/new/faster-moe

1 reply

·

danielhanchen

posted an update 22 days ago

Post

3686

We created a tool-calling guide for local LLMs!

Learn how to use any open model like Qwen3-Coder-Next and GLM-4.7-Flash for function calling.

Guide: https://unsloth.ai/docs/basics/tool-calling-guide-for-local-llms

We provide hands-on examples for: story writing, Python execution, terminal tool calls, maths and more.

7 replies

·

mrpeerat

authored a paper 23 days ago

SEA-SafeguardBench: Evaluating AI Safety in SEA Languages and Cultures

Paper • 2512.05501 • Published Dec 5, 2025 • 1

danielhanchen

posted an update 24 days ago

Post

3753

Qwen releases Qwen3-Coder-Next! 💜 Run the locally on 46GB RAM or less.

Thhe model excels at agentic coding & local use. With 256K context, it delivers similar performance to models with 10-20× more active parameters.

GGUF: unsloth/Qwen3-Coder-Next-GGUF
Guide: https://unsloth.ai/docs/models/qwen3-coder-next

10 replies

·

mrpeerat

submitted a paper to Daily Papers 25 days ago

SEA-Guard: Culturally Grounded Multilingual Safeguard for Southeast Asia

Paper • 2602.01618 • Published 26 days ago • 2

alvarobartt

posted an update about 1 month ago

Post

3078

💥 hf-mem v0.4.1 now also estimates KV cache memory requirements for any context length and batch size with the --experimental flag!

uvx hf-mem --model-id ... --experimental will automatically pull the required information from the Hugging Face Hub to include the KV cache estimation, when applicable.

💡 Alternatively, you can also set the --max-model-len, --batch-size and --kv-cache-dtype arguments (à la vLLM) manually if preferred.

1 reply

·

danielhanchen

posted an update about 1 month ago

Post

3439

You can now run Kimi K2.5 locally! 🔥

We shrank the 1T model to 240GB (-60%) via Dynamic 1-bit.
Get >40 tok/s on 242GB or 622GB VRAM/RAM for near full precision.

GGUF: unsloth/Kimi-K2.5-GGUF

Guide: https://unsloth.ai/docs/models/kimi-k2.5

7 replies

·

mlabonne

authored 2 papers about 1 month ago

LFM2 Technical Report

Paper • 2511.23404 • Published Nov 28, 2025 • 53

Zero-Overhead Introspection for Adaptive Test-Time Compute

Paper • 2512.01457 • Published Dec 1, 2025 • 2

danielhanchen

posted an update about 1 month ago

Post

2619

You can now fine-tune embedding models in our free Unsloth notebook! 🤗

Fine-tuning embedding models improves retrieval & RAG by aligning vectors to your domain-specific notion of similarity, improving search, clustering, and recommendations on your data.

⭐ Blog + Notebooks: https://unsloth.ai/docs/new/embedding-finetuning

Unsloth trains embedding models 1.8-3.3x faster with 20% less VRAM, 2x longer context & no accuracy loss vs. FA2 setups.

We'd like to thank Hugging Face and Unsloth contributor: electroglyph for making this possible!

3 replies

·

danielhanchen

posted an update about 1 month ago

Post

2630

Run GLM-4.7-Flash locally on your device with 24GB RAM!🔥

It's the best performing 30B model on SWE-Bench and GPQA. With 200K context, it excels at coding, agents, chat & reasoning.

GGUF: unsloth/GLM-4.7-Flash-GGUF

Guide: https://unsloth.ai/docs/models/glm-4.7-flash

danielhanchen

posted an update about 1 month ago

Post

2863

You can now do reinforcement learning training with 7× longer context and no accuracy loss, via our new batching algorithms.

Long reasoning chains in RL are costly, but now we enable you to train gpt-oss with GRPO & reach 380K context on a 192GB GPU.

Blog: https://unsloth.ai/docs/new/grpo-long-context

RyanMullins

authored a paper about 1 month ago

TranslateGemma Technical Report

Paper • 2601.09012 • Published Jan 13 • 20

mlabonne

posted an update about 2 months ago

Post

6927

New family of 1B models just dropped!

> LiquidAI/LFM2.5-1.2B-Base: 10T → 28T tokens
> LiquidAI/LFM2.5-1.2B-Instruct: new large-scale multi-stage RL
> LiquidAI/LFM2.5-1.2B-JP: our most polite model
> LiquidAI/LFM2.5-VL-1.6B: multi-image multilingual
> LiquidAI/LFM2.5-Audio-1.5B: 8x times faster, no quality loss

Super proud of this release 🤗

3 replies

·

pcuenq

posted an update about 2 months ago

Post

3538

👉 What happened in AI in 2025? 👈

We prepared the 2025 version of the HF AI Timeline Grid, highlighting open vs API-based model releases, and allowing you to browse and filter by access, modality, and release type!

Play with it here:
2025-ai-timeline/2025-ai-timeline

Here's my personal quarterly TL;DR:

1️⃣ Q1 — Learning to Reason
Deepseek not only releases a top-notch reasoning model, but shows how to train them and compete with closed frontier models. OpenAI debuts Deep Research.

Significant milestones: DeepSeek R1 & R1-Zero, Qwen 2.5 VL, OpenAI Deep Research, Gemini 2.5 Pro (experimental)

2️⃣ Q2 — Multimodality and Coding
More LLMs embrace multimodality by default, and there's a surge in coding agents. Strong vision, audio, and generative models emerge.

Significant milestones: Llama 4, Qwen 3, Imagen 4, OpenAI Codex, Google Jules, Claude 4

3️⃣ Q3 — "Gold" rush, OpenAI opens up, the community goes bananas
Flagship models get gold in Math olympiads and hard benchmarks. OpenAI releases strong open source models and Google releases the much anticipated nano-banana for image generation and editing. Agentic workflows become commonplace.

Significant milestones: Gemini and OpenAI IMO Gold, gpt-oss, Gemini 2.5 Flash Image, Grok 4, Claude Sonnet 4.5

4️⃣ Q4 — Mistral returns, leaderboard hill-climbing
Mistral is back with updated model families. All labs release impressive models to wrap up the year!

Significant milestones: Claude Opus 4.5, DeepSeek Math V2, FLUX 2, GPT 5.1, Kimi K2 Thinking, Nano Banana Pro, GLM 4.7, Gemini 3, Mistral 3, MiniMax M2.1 🤯

Credits
🙏 NHLOCAL for the source data https://github.com/NHLOCAL/AiTimeline

🫡 @reach-vb for the original idea, design and recipe

🙌 @ariG23498 and yours truly for compiling and verifying the 2025 edition

🥳 Here's to 2026, wishing it becomes the best year ever for open releases and on-device-first use-cases! 🥂

2 replies

·

gg-hf-g

AI & ML interests

Recent Activity

SEA-Guard: Culturally Grounded Multilingual Safeguard for Southeast Asia

SEA-SafeguardBench: Evaluating AI Safety in SEA Languages and Cultures

SEA-Guard: Culturally Grounded Multilingual Safeguard for Southeast Asia

LFM2 Technical Report

Zero-Overhead Introspection for Adaptive Test-Time Compute

TranslateGemma Technical Report

AI & ML interests

Recent Activity

Team members 136

gg-hf-g's activity