@stas on Hugging Face: "After many months of intense work the Snowflake AI Research team is happy to…"

Join the conversation

Join the community of Machine Learners and AI enthusiasts.

posted an update 1 day ago

Post

1490

After many months of intense work the
Snowflake AI Research team is happy to present to you the new open source project: Arctic RL

https://snowflake.com/en/blog/engineering/arctic-rl-open-source-backend/

- Arctic RL integrates with VeRL and SkyRL today; enable ZoRRo with one config flag, no code changes required
- ZoRRo delivers up to 6x actor-update acceleration and a 3.5x end-to-end training speedup, reducing Arctic-Text2SQL-R2 training from ~5 days to ~36 hours on 32 H200 GPUs
- Arctic-Text2SQL-R2 achieved higher accuracy scores (48.7) than Gemini 3.1 Pro (47.9) and Claude 4.7 (47.3) on Snowflake's evaluated enterprise SQL benchmark under the tested conditions
- Two open source recipes ship with this release: a text-to-SQL recipe that improved BIRD dev accuracy from 59.92% to 70.35%, and a multi-hop QA recipe that improved average accuracy from 69.6% to 72.3%

dipankarsarkar

about 20 hours ago

The 3.5x end-to-end number is the part people skim past, and it is the whole story.

A text-to-SQL model edging Gemini 3.1 Pro is not an architecture win, it is a faster-iteration win. 5 days down to 36 hours means ~3x more experiments per week, and that compounds into the accuracy gap.

The "one config flag, no code changes" line is what makes it real. Most RL speedups die because integrating them burns more eng time than they save.

Where does ZoRRo's 6x actor-update speedup actually come from? Overlapping rollout generation with the optimizer step, or the actor/learner weight-sync?

stas

about 16 hours ago

Thank you for the kind words, Dipankar

The lion share of speed up comes from prompt deduplication during generation and training.

dipankarsarkar

about 16 hours ago

Prompt dedup. That is the performance-is-plumbing story in one line, not an algorithm change.

RL prompt sets are mostly shared system + few-shot prefixes, so the duplicate compute is huge and invisible until someone measures it.

Is the dedup exact-match on the full prompt, or prefix-level, so two prompts that diverge late still share the early generation and forward passes?

stas

about 15 hours ago

At the moment it's a an exact-match, later will support partial match as well.

In this post