quant
updated
SqueezeLLM: Dense-and-Sparse Quantization
Paper
• 2306.07629
• Published • 4
Norm Tweaking: High-performance Low-bit Quantization of Large Language
Models
Paper
• 2309.02784
• Published • 2
Extreme Compression of Large Language Models via Additive Quantization
Paper
• 2401.06118
• Published • 14
BiLLM: Pushing the Limit of Post-Training Quantization for LLMs
Paper
• 2402.04291
• Published • 50
OneBit: Towards Extremely Low-bit Large Language Models
Paper
• 2402.11295
• Published • 24
APTQ: Attention-aware Post-Training Mixed-Precision Quantization for
Large Language Models
Paper
• 2402.14866
• Published
EasyQuant: An Efficient Data-free Quantization Algorithm for LLMs
Paper
• 2403.02775
• Published • 13
GPTVQ: The Blessing of Dimensionality for LLM Quantization
Paper
• 2402.15319
• Published • 22
COMQ: A Backpropagation-Free Algorithm for Post-Training Quantization
Paper
• 2403.07134
• Published
OWQ: Outlier-Aware Weight Quantization for Efficient Fine-Tuning and
Inference of Large Language Models
Paper
• 2306.02272
• Published
QuantEase: Optimization-based Quantization for Language Models
Paper
• 2309.01885
• Published • 4
SliceGPT: Compress Large Language Models by Deleting Rows and Columns
Paper
• 2401.15024
• Published • 73
decoupleQ: Towards 2-bit Post-Training Uniform Quantization via
decoupling Parameters into Integer and Floating Points
Paper
• 2404.12759
• Published
SpinQuant: LLM quantization with learned rotations
Paper
• 2405.16406
• Published • 3
Rotation and Permutation for Advanced Outlier Management and Efficient
Quantization of LLMs
Paper
• 2406.01721
• Published
Attention-aware Post-training Quantization without Backpropagation
Paper
• 2406.13474
• Published • 1
Accuracy is Not All You Need
Paper
• 2407.09141
• Published • 3
FlatQuant: Flatness Matters for LLM Quantization
Paper
• 2410.09426
• Published • 15