Cerebras REAP Collection Sparse MoE models compressed using REAP (Router-weighted Expert Activation Pruning) method • 19 items • Updated 18 days ago • 78
view article Article MLA: Redefining KV-Cache Through Low-Rank Projections and On-Demand Decompression Feb 4, 2025 • 18
Compressing KV Cache for Long-Context LLM Inference with Inter-Layer Attention Similarity Paper • 2412.02252 • Published Dec 3, 2024 • 2
TransMLA: Multi-head Latent Attention Is All You Need Paper • 2502.07864 • Published Feb 11, 2025 • 57
Kimi k1.5: Scaling Reinforcement Learning with LLMs Paper • 2501.12599 • Published Jan 22, 2025 • 126
Hibiki fr-en Collection Hibiki is a model for streaming speech translation , which can run on device! See https://github.com/kyutai-labs/hibiki. • 7 items • Updated 14 days ago • 55