RedHatAI/Llama-3.3-70B-Instruct-quantized.w8a8
Text Generation • 71B • Updated • 15.8k • 14
OpenSource and AI
SNLP: Layer-Parallel Inference via Structured Newton Corrections
S2D2: Fast Decoding for Diffusion LLMs via Training-Free Self-Speculation