Qwen3-8B ToolACE EAGLE-3 Speculator

Fine-tuned EAGLE-3 speculative decoding draft head for tool-calling workloads.

Base

Target model: kenkaneki/Qwen3-8B-ToolACE
Draft base: RedHatAI/Qwen3-8B-speculator.eagle3
Fine-tuning data: Team-ACE/ToolACE (3000 samples, 1 epoch)
Framework: speculators

Results (H100, ToolACE prompts)

Metric (c=1)	BF16 baseline	+ EAGLE3 FT	Speedup
E2EL p50	323.9 ms	175.4 ms	1.85x
Output tok/s	150.4	271.3	1.80x
TTFT p50	15.5 ms	18.7 ms	0.83x

EAGLE-3 is lossless: output is identical to the target model by construction. No quality loss.

Validation accuracy

Token 0: 82.2% (vs 61.9% trained from scratch on ToolACE-only)
Token 1: 63.6% (vs 38.7%)
Token 2: 49.2% (vs 24.0%)

Usage with vLLM

vllm serve kenkaneki/Qwen3-8B-ToolACE \
  --speculative-config '{"model":"kenkaneki/Qwen3-8B-ToolACE-speculator.eagle3","num_speculative_tokens":3,"method":"eagle3"}' \
  --enable-auto-tool-choice --tool-call-parser hermes

Training

python scripts/finetune_eagle.py

Hardware: NVIDIA H100 80GB

Code: github.com/aimedvedevq/toolaceqwen

Downloads last month: 5

Safetensors

Model size

1B params

Tensor type

I64

BF16

BOOL

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for kenkaneki/Qwen3-8B-ToolACE-speculator.eagle3

Base model

Qwen/Qwen3-8B-Base

Finetuned

Qwen/Qwen3-8B

Finetuned

kenkaneki/Qwen3-8B-ToolACE

Finetuned

(1)

this model

Dataset used to train kenkaneki/Qwen3-8B-ToolACE-speculator.eagle3

Collection including kenkaneki/Qwen3-8B-ToolACE-speculator.eagle3

toolася

Collection

toolcalling sft+grpo+specdecoding • 3 items • Updated Mar 19