mradermacher/Qwen3-14B-ARPO-DeepSearch-GGUF Reinforcement Learning • 15B • Updated Aug 12, 2025 • 101 • 5