Running Featured 1.04k Can You Run It? LLM version π 1.04k Determine GPU requirements for running large language models
Towards Efficient Generative Large Language Model Serving: A Survey from Algorithms to Systems Paper β’ 2312.15234 β’ Published Dec 23, 2023 β’ 3
EfficientQAT: Efficient Quantization-Aware Training for Large Language Models Paper β’ 2407.11062 β’ Published Jul 10, 2024 β’ 10
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters Paper β’ 2408.03314 β’ Published Aug 6, 2024 β’ 63
Paused 36 Transformer Calculator π 36 Calculate memory, parameters, and FLOPs for transformer models