Danny
TheDrunkenSnail
AI & ML interests
None yet
Recent Activity
liked
a model
9 days ago
Sao10K/Lmao_life_updates
liked
a model
4 months ago
openai/gpt-oss-20b
Organizations
reacted to
aiconta's
post with π
9 days ago
reacted to
AtAndDev's
post with π₯π
4 months ago
Post
562
Qwen 3 Coder is a personal attack to k2, and I love it.
It achieves near SOTA on LCB while not having reasoning.
Finally people are understanding that reasoning isnt necessary for high benches...
Qwen ftw!
DECENTRALIZE DECENTRALIZE DECENTRALIZE
It achieves near SOTA on LCB while not having reasoning.
Finally people are understanding that reasoning isnt necessary for high benches...
Qwen ftw!
DECENTRALIZE DECENTRALIZE DECENTRALIZE
reacted to
Wauplin's
post with π₯
5 months ago
Post
3247
Say hello to
We are glad to announce a long-awaited quality-of-life improvement: the Hugging Face CLI has been officially renamed from huggingface-cli to hf!
So... why this change?
Typing huggingface-cli constantly gets old fast. More importantly, the CLIβs command structure became messy as new features were added over time (upload, download, cache management, repo management, etc.). Renaming the CLI is a chance to reorganize commands into a clearer, more consistent format.
We decided not to reinvent the wheel and instead follow a well-known CLI pattern: hf <resource> <action>. Isn't
The full rationale, implementation details, and migration notes are in the blog post: https://huggingface.co/blog/hf-cli
hf: a faster, friendlier Hugging Face CLI β¨We are glad to announce a long-awaited quality-of-life improvement: the Hugging Face CLI has been officially renamed from huggingface-cli to hf!
So... why this change?
Typing huggingface-cli constantly gets old fast. More importantly, the CLIβs command structure became messy as new features were added over time (upload, download, cache management, repo management, etc.). Renaming the CLI is a chance to reorganize commands into a clearer, more consistent format.
We decided not to reinvent the wheel and instead follow a well-known CLI pattern: hf <resource> <action>. Isn't
hf auth login easier to type and remember?The full rationale, implementation details, and migration notes are in the blog post: https://huggingface.co/blog/hf-cli
reacted to
AdinaY's
post with π
5 months ago
Post
2696
KAT-V1 π₯ a LLM that tackles overthinking by switching between reasoning and direct answers, by Kuaishou.
Kwaipilot/KAT-V1-40B
β¨ 40B
β¨ Step-SRPO: smarter reasoning control via RL
β¨ MTP + Distillation: efficient training, lower cost
Kwaipilot/KAT-V1-40B
β¨ 40B
β¨ Step-SRPO: smarter reasoning control via RL
β¨ MTP + Distillation: efficient training, lower cost
reacted to
blaise-tk's
post with π
5 months ago
Post
3528
A few months ago, I shared that I was building with
@deeivihh
something like "the Steam for open source apps"...
π Today, Iβm excited to announce that Dione is now open source and live in public beta!
Our mission is simple: make it easier to discover, use, and contribute to open source applications.
π GitHub: https://github.com/dioneapp/dioneapp
π¬ Join the community: https://discord.gg/JDFJp33vrM
Want to give it a try? Iβd love your feedback! π
π Today, Iβm excited to announce that Dione is now open source and live in public beta!
Our mission is simple: make it easier to discover, use, and contribute to open source applications.
π GitHub: https://github.com/dioneapp/dioneapp
π¬ Join the community: https://discord.gg/JDFJp33vrM
Want to give it a try? Iβd love your feedback! π
reacted to
drwlf's
post with β€οΈπ€
6 months ago
Post
5765
Having an insanely good medical LLM is pointless if it wonβt answer your questions!
So weβve made 2 notebook for abliterating any model in order to achieve a good model that will actually help you!
The notebooks are made using @mlabonne βs abliteration logic and datasets!
Feel free to use them and happy training π
https://github.com/dralexlup/LLM-Abliteration
So weβve made 2 notebook for abliterating any model in order to achieve a good model that will actually help you!
The notebooks are made using @mlabonne βs abliteration logic and datasets!
Feel free to use them and happy training π
https://github.com/dralexlup/LLM-Abliteration
upvoted
a
paper
7 months ago
reacted to
RiverZ's
post with π€
7 months ago
Post
7144
π₯ We're thrilled to share some exciting news about ICEdit! Currently, ICEdit app (
RiverZ/ICEdit) has soared to the second place on the weekly trend list of Hugging Face Space, just trailing behind Qwen3. What's more, it also holds the second position on the overall space trend list. This achievement wouldn't have been possible without your incredible support and love. A huge thank you to each and every one of youβ€!
π The ICEdit community has been incredibly active, and we've seen a plethora of amazing ComfyUI workflows being shared. For instance, with the help of ComfyUI - nunchaku, you can run ICEdit locally with just 4GB of VRAM. This makes it much more accessible for those with limited hardware resources.
π If you're interested in the detailed information, please head over to our repository. We highly encourage you to give these workflows a try and explore the creative possibilities that ICEdit offers.
Github Repo: https://github.com/River-Zhang/ICEdit
Hugging Face Space: RiverZ/ICEdit
π The ICEdit community has been incredibly active, and we've seen a plethora of amazing ComfyUI workflows being shared. For instance, with the help of ComfyUI - nunchaku, you can run ICEdit locally with just 4GB of VRAM. This makes it much more accessible for those with limited hardware resources.
π If you're interested in the detailed information, please head over to our repository. We highly encourage you to give these workflows a try and explore the creative possibilities that ICEdit offers.
Github Repo: https://github.com/River-Zhang/ICEdit
Hugging Face Space: RiverZ/ICEdit
reacted to
eaddario's
post with π
9 months ago
Post
2743
Squeezing out tensor bits?
I have been tinkering with quantization and pruning to reduce model sizes. So far, I've had modest success in producing, on average, 8% smaller versions with negligible loss of quality, and I think further reductions in the 10-15% range are realistic, but I've come across a behaviour I wasn't expecting!
Part of the process I'm following consists of quantizing the embedding and output layers aggressively. Since the embedding layer is more about lookup than complex computation, the vectors representing the relative distances between embeddings are usually preserved well enough making this layer fairly robust to quantization. So far, so good.
The output layer, on the other hand, maps the final hidden state to the vocabulary logits and therefore, small changes in these logits could lead to a different probability distribution over the vocabulary, resulting in incorrect word predictions, or so I thought.
Surprisingly, I'm finding that even at Q2_K the loss of overall capability is minimal. Was this to be expected? or am I missing something?
I have published a version with all the test results if you want to give it a try: eaddario/DeepSeek-R1-Distill-Qwen-7B-GGUF
I'll upload other models as time allows.
Any ideas / clarifications / suggestions are very much welcomed!
I have been tinkering with quantization and pruning to reduce model sizes. So far, I've had modest success in producing, on average, 8% smaller versions with negligible loss of quality, and I think further reductions in the 10-15% range are realistic, but I've come across a behaviour I wasn't expecting!
Part of the process I'm following consists of quantizing the embedding and output layers aggressively. Since the embedding layer is more about lookup than complex computation, the vectors representing the relative distances between embeddings are usually preserved well enough making this layer fairly robust to quantization. So far, so good.
The output layer, on the other hand, maps the final hidden state to the vocabulary logits and therefore, small changes in these logits could lead to a different probability distribution over the vocabulary, resulting in incorrect word predictions, or so I thought.
Surprisingly, I'm finding that even at Q2_K the loss of overall capability is minimal. Was this to be expected? or am I missing something?
I have published a version with all the test results if you want to give it a try: eaddario/DeepSeek-R1-Distill-Qwen-7B-GGUF
I'll upload other models as time allows.
Any ideas / clarifications / suggestions are very much welcomed!