NitroGen: An Open Foundation Model for Generalist Gaming Agents

NitroGen is a unified vision-to-action foundation model designed to play video games directly from raw frames. It is a generalist agent trained via large-scale behavior cloning on 40,000 hours of gameplay across over 1,000 games. It maps RGB video footage to gamepad actions.

NitroGen works best on games designed for gamepad controls (e.g., action, platformer, and racing games) and is less effective on games that rely heavily on mouse and keyboard (e.g., RTS, MOBA).

Sample Usage

Installation

To use NitroGen, clone and install the repository:

git clone https://github.com/MineDojo/NitroGen.git
cd NitroGen
pip install -e .

Inference

  1. Download the checkpoint from Hugging Face:
hf download nvidia/NitroGen ng.pt
  1. Start the inference server:
python scripts/serve.py <path_to_ng.pt>  
  1. Run the agent on the game of your choice (currently supports Windows games):
python scripts/play.py --process '<game_executable_name>.exe'

Model Details

  • Architecture: Vision Transformer (SigLip2) + Diffusion Matching Transformer (DiT).
  • Parameters: $4.93 \times 10^8$.
  • Inputs: 256x256 RGB images.
  • Outputs: Gamepad actions (21x16 shape: two 2D continuous vectors for joysticks, 17 binary buttons).
  • Training: Trained on 40,000 hours of internet-scale gameplay videos.

Citation

If you find NitroGen useful in your research, please cite:

@misc{magne2026nitrogen,
      title={NitroGen: An Open Foundation Model for Generalist Gaming Agents}, 
      author={Loïc Magne and Anas Awadalla and Guanzhi Wang and Yinzhen Xu and Joshua Belofsky and Fengyuan Hu and Joohwan Kim and Ludwig Schmidt and Georgia Gkioxari and Jan Kautz and Yisong Yue and Yejin Choi and Yuke Zhu and Linxi "Jim" Fan},
      year={2026},
      eprint={2601.02427},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2601.02427}, 
}

License

Governing Terms: NVIDIA License. The model uses a SigLip2 backbone which is licensed under Apache 2.0.

Downloads last month

-

Downloads are not tracked for this model. How to track
Video Preview
loading

Dataset used to train nvidia/NitroGen

Spaces using nvidia/NitroGen 2

Paper for nvidia/NitroGen