ReLIFT, a training method that interleaves RL with online FT, achieving superior performance and efficiency compared to using RL or SFT alone.
RoadMa
RoadQAQ
AI & ML interests
None yet
Recent Activity
updated
a dataset
about 24 hours ago
RoadQAQ/sft_for_rl
published
a dataset
about 24 hours ago
RoadQAQ/sft_for_rl
updated
a collection
3 months ago
Data for DataFlex