Tesslate/OmniCoder-9B · Agent trajectories vs synthetic data: error recovery patterns in production

Agent trajectories vs synthetic data: error recovery patterns in production

#13

by O96a - opened 15 days ago

Training on 425K curated agentic trajectories from Claude Opus 4.6 and GPT-5.x scaffolding is a compelling approach — learning read-before-write patterns and LSP diagnostic response from real agent traces rather than synthetic task descriptions.

The Terminal-Bench improvement (+61% over base Qwen3.5-9B) suggests these patterns transfer well. A few questions for production deployment:

The error recovery behavior — have you tested recovery rates when the model encounters novel error types not in the training trajectories? In my experience with LangGraph agents, models often struggle with unseen LSP errors that don't match their training distribution.
The minimal edit diffs vs full rewrites — this is exactly what production code agents need. Have you measured the token savings on typical edit operations? For 262K context, edit efficiency directly impacts cost.
For the 425K trajectory dataset — what's the breakdown between successful vs failed trajectories? Learning from failed attempts (with proper scaffolding) often improves robustness, but can also propagate bad patterns if not filtered.

Impressive GPQA Diamond results (83.8% pass@1). Looking forward to testing against agent orchestration benchmarks like BFCL.

For teams integrating OmniCoder: the Apache 2.0 license and GGUF availability make this a strong candidate for local coding agents where frontier model APIs aren't feasible.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment