Team-ACE/ToolACE
Viewer • Updated • 11.3k • 11.7k • 178
Fine-tuned EAGLE-3 speculative decoding draft head for tool-calling workloads.
| Metric (c=1) | BF16 baseline | + EAGLE3 FT | Speedup |
|---|---|---|---|
| E2EL p50 | 323.9 ms | 175.4 ms | 1.85x |
| Output tok/s | 150.4 | 271.3 | 1.80x |
| TTFT p50 | 15.5 ms | 18.7 ms | 0.83x |
EAGLE-3 is lossless: output is identical to the target model by construction. No quality loss.
vllm serve kenkaneki/Qwen3-8B-ToolACE \
--speculative-config '{"model":"kenkaneki/Qwen3-8B-ToolACE-speculator.eagle3","num_speculative_tokens":3,"method":"eagle3"}' \
--enable-auto-tool-choice --tool-call-parser hermes
python scripts/finetune_eagle.py
Hardware: NVIDIA H100 80GB