---
license: mit
tags:
- RLinf
language:
- en
metrics:
- accuracy
base_model:
- RLinf/RLinf-OpenVLAOFT-LIBERO-130-Base-Lora
pipeline_tag: reinforcement-learning
model-index:
- name: RLinf-OpenVLAOFT-LIBERO-130
results:
- task:
type: VLA # Required. Example: automatic-speech-recognition
dataset:
type: libero_130 # Required. Example: common_voice. Use dataset id from https://hf.co/datasets
name: libero_130 # Required. A pretty name for the dataset. Example: Common Voice (French)
metrics:
- type: accuracy # Required. Example: wer. Use metric id from https://hf.co/metrics
value: 97.85 # Required. Example: 20.90
---
RLinf: Reinforcement Learning Infrastructure for Agentic AI
[RLinf](https://github.com/RLinf/RLinf) is a flexible and scalable open-source infrastructure designed for post-training foundation models (LLMs, VLMs, VLAs) via reinforcement learning. The 'inf' in RLinf stands for Infrastructure, highlighting its role as a robust backbone for next-generation training. It also stands for Infinite, symbolizing the system’s support for open-ended learning, continuous generalization, and limitless possibilities in intelligence development.
## Model Description
The RLinf-openvlaoft-libero series is trained on RLinf/RLinf-OpenVLAOFT-LIBERO-xxx-Base-Lora (including libero90 and libero130) and Haozhan72/Openvla-oft-SFT-libero-xxx-traj1 (including libero10, libero-object, libero-goal and libero-spatial), using the same base models and training datasets as verl. Training with RLinf yields SOTA performance.
We use a mask to focus on valid action tokens, and compute token-level loss based on the Group Relative Policy Optimization (GRPO) advantage function, in order to enhance the model’s performance on spatial reasoning, object generalization, instruction generalization, and long-horizon tasks.
## Evaluation and Results
We trained four models using RLinf:
- [RLinf-OpenVLAOFT-GRPO-LIBERO-90](https://huggingface.co/RLinf/RLinf-OpenVLAOFT-GRPO-LIBERO-90) Model (based on [RLinf/RLinf-OpenVLAOFT-LIBERO-90-Base-Lora]((https://huggingface.co/RLinf/RLinf-OpenVLAOFT-LIBERO-90-Base-Lora)))
- Recommended sampling settings: `temperature = 1.6`, `top_p = 1.0`
- [RLinf-OpenVLAOFT-LIBERO-130](https://huggingface.co/RLinf/RLinf-OpenVLAOFT-LIBERO-130) Model (based on [RLinf/RLinf-OpenVLAOFT-LIBERO-130-Base-Lora]((https://huggingface.co/RLinf/RLinf-OpenVLAOFT-LIBERO-130-Base-Lora)))
- Recommended sampling settings: `temperature = 1.6`, `top_p = 1.0`
- [RLinf-OpenVLAOFT-GRPO-LIBERO-object](https://huggingface.co/RLinf/RLinf-OpenVLAOFT-GRPO-LIBERO-object) Model (based on [Haozhan72/Openvla-oft-SFT-libero-object-traj1](https://huggingface.co/Haozhan72/Openvla-oft-SFT-libero-object-traj1))
- Recommended sampling settings: `temperature = 1.6`, `top_p = 1.0`
- [RLinf-OpenVLAOFT-GRPO-LIBERO-spatial](https://huggingface.co/RLinf/RLinf-OpenVLAOFT-GRPO-LIBERO-spatial) Model (based on [Haozhan72/Openvla-oft-SFT-libero-spatial-traj1](https://huggingface.co/Haozhan72/Openvla-oft-SFT-libero-spatial-traj1))
- Recommended sampling settings: `temperature = 1.6`, `top_p = 1.0`
- [RLinf-OpenVLAOFT-GRPO-LIBERO-goal](https://huggingface.co/RLinf/RLinf-OpenVLAOFT-GRPO-LIBERO-goal) Model (based on [Haozhan72/Openvla-oft-SFT-libero-goal-traj1]((https://huggingface.co/Haozhan72/Openvla-oft-SFT-libero-goal-traj1)))
- Recommended sampling settings: `temperature = 1.6`, `top_p = 1.0`
- [RLinf-OpenVLAOFT-GRPO-LIBERO-long](https://huggingface.co/RLinf/RLinf-OpenVLAOFT-GRPO-LIBERO-long) Model (based on [Haozhan72/Openvla-oft-SFT-libero10-traj1]((https://huggingface.co/Haozhan72/Openvla-oft-SFT-libero10-traj1)))
- Recommended sampling settings: `temperature = 1.6`, `top_p = 1.0`
### Benchmark Results
Sft models for LIBERO-90 and LIBERO-130 are trained by ourself following training reciepe from [OpenVLA-OFT](https://github.com/moojink/openvla-oft/blob/main/vla-scripts/finetune.py). And other sft models are from [SimpleVLA-RL](https://huggingface.co/collections/Haozhan72/simplevla-rl-6833311430cd9df52aeb1f86).
> We evaluate each model according to its training configuration. Using libero_seed = 0 and evaluating 500 episodes for the Object, Spatial, Goal, and Long suites, 4,500 episodes for LIBERO-90, and 6,500 episodes for LIBERO-130.
> For the SFT-trained (LoRA-base) models, we set do_sample = False.
> For the RL-trained models, we set do_sample = True, temperature = 1.6, and enable rollout_epoch=2, and the final results are reported as the average across the two runs.
| Model | Object | Spatial | Goal | Long | 90 | Average |
| ------------------ | ------ | ------- | ----- | ----- | ------- |------- |
| sft models | 28.83 | 52.22 | 49.40 | 14.92 | 79.28 | 66.07 |
| trained with RLinf | [97.68](https://huggingface.co/RLinf/RLinf-OpenVLAOFT-GRPO-LIBERO-object) | [94.76](https://huggingface.co/RLinf/RLinf-OpenVLAOFT-GRPO-LIBERO-spatial) | [93.96](https://huggingface.co/RLinf/RLinf-OpenVLAOFT-GRPO-LIBERO-goal) | [90.93](https://huggingface.co/RLinf/RLinf-OpenVLAOFT-GRPO-LIBERO-long) | [96.44](https://huggingface.co/RLinf/RLinf-OpenVLAOFT-GRPO-LIBERO-90) | 95.79 |
Besides, we train [one model](https://huggingface.co/RLinf/RLinf-OpenVLAOFT-LIBERO-130) (we named it libero-130 model) for all tasks in libero.
| libero-130 model | Object | Spatial | Goal | Long | 90 | 130(all) |
| ------------------ | ------ | ------- | ----- | ----- | ------- |------- |
| sft models | 50.20 | 51.61 | 49.40 | 11.90 | 42.67 | 42.09 |
| trained with RLinf | 99.60 | 98.69 | 98.09 | 93.45 | 98.02 | 97.85 |
## How to Use
Please integrate the provided model with the [RLinf](https://github.com/RLinf/RLinf) codebase. To do so, modify the following parameters in the configuration file ``examples/embodiment/config/libero_10_grpo_openvlaoft.yaml``:
- Set ``rollout.model.model_path``, ``actor.model.model_path``, and ``actor.tokenizer.tokenizer_model`` to the path of the model checkpoint.
Note: If you intend to evaluate the model directly, make sure to set ``actor.model.is_lora`` to ``false``.
## License
This code repository and the model weights are licensed under the MIT License.