We target improved agent reinforcement learning in terms of stability (S), efficiency (E), and generalization (G).