Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
stas 
posted an update 1 day ago
Post
1490
After many months of intense work the
Snowflake AI Research team is happy to present to you the new open source project: Arctic RL

https://snowflake.com/en/blog/engineering/arctic-rl-open-source-backend/

- Arctic RL integrates with VeRL and SkyRL today; enable ZoRRo with one config flag, no code changes required
- ZoRRo delivers up to 6x actor-update acceleration and a 3.5x end-to-end training speedup, reducing Arctic-Text2SQL-R2 training from ~5 days to ~36 hours on 32 H200 GPUs
- Arctic-Text2SQL-R2 achieved higher accuracy scores (48.7) than Gemini 3.1 Pro (47.9) and Claude 4.7 (47.3) on Snowflake's evaluated enterprise SQL benchmark under the tested conditions
- Two open source recipes ship with this release: a text-to-SQL recipe that improved BIRD dev accuracy from 59.92% to 70.35%, and a multi-hop QA recipe that improved average accuracy from 69.6% to 72.3%

The 3.5x end-to-end number is the part people skim past, and it is the whole story.

A text-to-SQL model edging Gemini 3.1 Pro is not an architecture win, it is a faster-iteration win. 5 days down to 36 hours means ~3x more experiments per week, and that compounds into the accuracy gap.

The "one config flag, no code changes" line is what makes it real. Most RL speedups die because integrating them burns more eng time than they save.

Where does ZoRRo's 6x actor-update speedup actually come from? Overlapping rollout generation with the optimizer step, or the actor/learner weight-sync?

·

Thank you for the kind words, Dipankar

The lion share of speed up comes from prompt deduplication during generation and training.

Prompt dedup. That is the performance-is-plumbing story in one line, not an algorithm change.

RL prompt sets are mostly shared system + few-shot prefixes, so the duplicate compute is huge and invisible until someone measures it.

Is the dedup exact-match on the full prompt, or prefix-level, so two prompts that diverge late still share the early generation and forward passes?

·

At the moment it's a an exact-match, later will support partial match as well.