Qwen3-8B ToolACE EAGLE-3 Speculator

Fine-tuned EAGLE-3 speculative decoding draft head for tool-calling workloads.

Base

Results (H100, ToolACE prompts)

Metric (c=1) BF16 baseline + EAGLE3 FT Speedup
E2EL p50 323.9 ms 175.4 ms 1.85x
Output tok/s 150.4 271.3 1.80x
TTFT p50 15.5 ms 18.7 ms 0.83x

EAGLE-3 is lossless: output is identical to the target model by construction. No quality loss.

Validation accuracy

  • Token 0: 82.2% (vs 61.9% trained from scratch on ToolACE-only)
  • Token 1: 63.6% (vs 38.7%)
  • Token 2: 49.2% (vs 24.0%)

Usage with vLLM

vllm serve kenkaneki/Qwen3-8B-ToolACE \
  --speculative-config '{"model":"kenkaneki/Qwen3-8B-ToolACE-speculator.eagle3","num_speculative_tokens":3,"method":"eagle3"}' \
  --enable-auto-tool-choice --tool-call-parser hermes

Training

python scripts/finetune_eagle.py

Hardware: NVIDIA H100 80GB

Code: github.com/aimedvedevq/toolaceqwen

Downloads last month
5
Safetensors
Model size
1B params
Tensor type
I64
·
BF16
·
BOOL
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kenkaneki/Qwen3-8B-ToolACE-speculator.eagle3

Finetuned
Qwen/Qwen3-8B
Finetuned
(1)
this model

Dataset used to train kenkaneki/Qwen3-8B-ToolACE-speculator.eagle3

Collection including kenkaneki/Qwen3-8B-ToolACE-speculator.eagle3