UltraRAG: A Modular and Automated Toolkit for Adaptive Retrieval-Augmented Generation Paper • 2504.08761 • Published Mar 31, 2025 • 7
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts Paper • 2409.16040 • Published Sep 24, 2024 • 16
ReMoE: Fully Differentiable Mixture-of-Experts with ReLU Routing Paper • 2412.14711 • Published Dec 19, 2024 • 16
Aria: An Open Multimodal Native Mixture-of-Experts Model Paper • 2410.05993 • Published Oct 8, 2024 • 111
view article Article Introducing RWKV - An RNN with the advantages of a transformer +2 BlinkDL, Hazzzardous, sgugger, ybelkada • May 15, 2023 • 25
TransformerFAM: Feedback attention is working memory Paper • 2404.09173 • Published Apr 14, 2024 • 43
A Time Series is Worth 64 Words: Long-term Forecasting with Transformers Paper • 2211.14730 • Published Nov 27, 2022 • 3
view article Article Patch Time Series Transformer in Hugging Face +3 namctin, wmgifford, ajati, vijaye12, kashif • Feb 1, 2024 • 14