DistilQwen

reaperdoesntknow 's Collections

Convergent Optimizations

Shepherd

DualMind

DistilQwen

DiscoverLM

SAGI - Swarm AGI Language Model

Qemma

DNA-AI

Mixture of Attentions - MoA

🔷 SymbioticAI: A Collection of Symbolic-Transformers for Th

updated about 3 hours ago

H100 BF16. 30B→1.7B/0.6B TKD. Three teachers. 15 models + DISC paper. 10K+ downloads. DOI: 10.57967/hf/8165 & 10.57967/hf/8194

Upvote

reaperdoesntknow/Qwen3-1.7B-Distilled-30B-A3B

Text Generation • 2B • Updated about 3 hours ago • 566 •

Note First in the DistilQwen chain. Foundation for all downstream models.
reaperdoesntknow/Qwen3-1.7B-Distilled-30B-A3B-SFT-GGUF

Text Generation • 2B • Updated about 3 hours ago • 2.8k

Note Edge deployment of the full Instruct pipeline. Apache 2.0.
reaperdoesntknow/Qwen3-1.7B-Distilled-30B-A3B-SFT

2B • Updated about 3 hours ago • 311

Note Instruct distillation + SFT. Full precision. BF16 H100.
reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B

Text Generation • 0.8B • Updated about 3 hours ago • 597 •

Note 50× compression: 30B → 0.6B. Smallest in the distil family.
reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT

Text Generation • 0.8B • Updated about 3 hours ago • 692 • • 2

Note Thinking teacher + SFT at 0.6B. Extended deliberation traces.
reaperdoesntknow/Qwen3-0.6B-Distilled-30B-A3B-Thinking-SFT-GGUF

Text Generation • 0.8B • Updated about 3 hours ago • 2.75k

Note Thinking-SFT at 0.6B quantized. Runs on anything.
reaperdoesntknow/Qwen3-1.7B-Coder-Distilled-SFT

Text Generation • 2B • Updated about 3 hours ago • 584 • 1

Note Coder teacher produces uniquely structured distributions.
reaperdoesntknow/Qwen3-1.7B-Coder-Distilled-SFT-GGUF

Text Generation • 2B • Updated about 3 hours ago • 3.46k • 1

Note Pair with Thinking variant for comparative analysis.
reaperdoesntknow/DistilQwen3-1.7B-uncensored

Text Generation • 2B • Updated about 3 hours ago • 550 •

Note Foundation for research applications requiring unfiltered output.
reaperdoesntknow/TopologicalQwen

Text Generation • 2B • Updated about 3 hours ago • 592 •

Note Topology-aware distillation from 30B-Thinking on physics CoT.
reaperdoesntknow/DiStil-Qwen3-1.7B-uncensored

2B • Updated about 3 hours ago • 320 • 1

Note DISC-informed distillation. Uncensored. Research-focused.
reaperdoesntknow/Disctil-Qwen3-1.7B

Text Generation • 2B • Updated about 3 hours ago • 552 •

Note DISC-refined. Discrepancy-aware training produces cleaner signal.
reaperdoesntknow/DistilQwen3-1.7B-uncensored-GGUF

2B • Updated about 3 hours ago • 2.51k • 1

Note Uncensored base quantized. mradermacher also quantized — 411 downloads.
reaperdoesntknow/Qwen3-1.7B-Thinking-Distil

Text Generation • 2B • Updated about 3 hours ago • 594 • • 1

Note Thinking teacher distillation. Highest downloads in the collection.
reaperdoesntknow/LFM2.5-1.2B-Distilled-SFT

Text Generation • 1B • Updated about 3 hours ago • 510

Note Proves TKD works across architecture families, not just within Qwen.
reaperdoesntknow/Discrepancy_Calculus

Updated about 3 hours ago

Note Continuous Thought Dynamics — mathematical backbone of DualMind.

Upvote