When

WhenceFade

https://mufan.me

WhenMelancholy

AI & ML interests

Generative Models.

Recent Activity

upvoted a paper about 18 hours ago

WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling

upvoted a paper 15 days ago

Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss

updated a dataset about 2 months ago

WhenceFade/dataset-mix-cached

View all activity

Organizations

upvoted a paper about 18 hours ago

WorldPlay: Towards Long-Term Geometric Consistency for Real-Time Interactive World Modeling

Paper • 2512.14614 • Published 30 days ago • 69

upvoted a paper 15 days ago

Coupling Experts and Routers in Mixture-of-Experts via an Auxiliary Loss

Paper • 2512.23447 • Published 17 days ago • 94

upvoted an article 4 months ago

Article

Open-R1: a fully open reproduction of DeepSeek-R1

Jan 28, 2025

•

887

upvoted a paper 5 months ago

Complex Logical Instruction Generation

Paper • 2508.09125 • Published Aug 12, 2025 • 40

upvoted a paper 6 months ago

Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination

Paper • 2507.10532 • Published Jul 14, 2025 • 89

upvoted an article 8 months ago

Article

How to train a Language Model with Megatron-LM

Sep 7, 2022

•

upvoted a collection 8 months ago

Qwen3

Collection

84 items • Updated 15 days ago • 1.57k

upvoted a paper 9 months ago

Do PhD-level LLMs Truly Grasp Elementary Addition? Probing Rule Learning vs. Memorization in Large Language Models

Paper • 2504.05262 • Published Apr 7, 2025 • 11

upvoted an article 10 months ago

Article

How to generate text: using different decoding methods for language generation with Transformers

Mar 1, 2020

•

283

upvoted a paper 10 months ago

GRNFormer: A Biologically-Guided Framework for Integrating Gene Regulatory Networks into RNA Foundation Models

Paper • 2503.01682 • Published Mar 3, 2025 • 1

When

AI & ML interests

Recent Activity

Organizations

WhenceFade's activity

Open-R1: a fully open reproduction of DeepSeek-R1

How to train a Language Model with Megatron-LM

How to generate text: using different decoding methods for language generation with Transformers