## PromptHMR Official implementation for the paper (CVPR25): \ **PromptHMR: Promptable Human Mesh Recovery** [Yufu Wang](https://yufu-wang.github.io), [Yu Sun](https://www.yusun.work), [Priyanka Patel](https://pixelite1201.github.io), [Kostas Daniilidis](https://www.cis.upenn.edu/~kostas/), [Michael J. Black](https://ps.is.mpg.de/person/black), [Muhammed Kocabas](https://ps.is.mpg.de/person/mkocabas)\ [[Project Page](https://yufu-wang.github.io/phmr-page)] [[Arxiv](https://arxiv.org/abs/2504.06397)] https://github.com/user-attachments/assets/72bcaed3-e3ab-4fe6-8a60-dba853fd4883 https://github.com/user-attachments/assets/f88c3eb2-5c00-4922-92d9-51b10b6e22ba https://github.com/user-attachments/assets/2b4bcee2-2163-4ecc-a5a4-1a3f921b913d ## Installation 1. Clone this project. ```Bash git clone https://github.com/yufu-wang/phmr_dev ``` 2. Run installation script to create a conda environment and install requirements. We provide two options: either torch==2.4.0+cu121 (pass `--pt_version=2.4`) or torch==2.6.0+cu126 (pass `--pt_version=2.6`). (Optional) If you want to run the world-coordinate multi-human video pipeline, you will need additional thirdparty packages. ```Bash Usage: scripts/install.sh --pt_version [--world-video=] Options: --pt_version PyTorch version to install (2.4 or 2.6) --world-video Download required wheels for world-coordinate multi-human video (default: false) --help Show this help message Examples: scripts/install.sh --pt_version=2.4 scripts/install.sh --pt_version=2.6 scripts/install.sh --pt_version=2.4 --world-video=true scripts/install.sh --pt_version=2.6 --world-video=false ``` ## Prepare data Run the following commands to download all models and checkpoints into the `data/` directory. The first command will prompt you to register and log in to access each version of SMPL. ```Bash # SMPLX family models bash scripts/fetch_smplx.sh # Checkpoints and annotations bash scripts/fetch_data.sh ``` ## Demos **For monocular reconstruction**, the demo saves results in a new folder named after the input image. It also uses Viser to visualize the results—open the output link in a browser to view them in 3D. If the browser cannot connect, please troubleshoot Viser (e.g., port forwarding is required if you're running on a remote server). ```bash # 1. Single view reconstruction python scripts/demo_phmr.py --image data/examples/example_1.jpg --gravity_align ``` **For world-coordinate video reconstruction**, first install the precompiled wheels as described in our installation guide. After installation, run the example commands below—results will be visualized using Viser. If you're working with a long sequence containing many people, consider using `--viser_total` to limit the number of frames visualized, or `--viser_subsample` to subsample frames. For other hyperparameters, refer to `pipeline/config.yaml`. ```bash # 2. Video world-coordinate reconstruction # Example 1: simple example python scripts/demo_video.py --input_video data/examples/boxing.mp4 # Example 2: with static camera python scripts/demo_video.py --input_video data/examples/dance_1.mp4 --static_camera --viser_subsample 4 # Example 3: moving camera python scripts/demo_video.py --input_video data/examples/dance_2.mp4 --viser_subsample 3 ``` Note that this script will output MCS and GLB files. You can drag and drop the MCS file to [https://me.meshcapade.com/editor](https://me.meshcapade.com/editor) to view the results. You can import the GLB file into the Blender editor to visualize human and camera motion. **Note on Viser**: If you're running the demo on a server and viewing it through a local browser, the meshes are streamed to the browser's memory, which can be slow for long sequences with many people. A potential workaround is to download the `results.pkl` file and modify the demo to load and visualize the results locally. ## Evaluation Please update the dataset directory in `data_config.py`, and then run the following command for pose and shape evaluation. ```bash # Available datasets: EMDB, 3DPW_TEST, HI4D_TEST, RICH_TEST python scripts/eval_phmr.py --dataset EMDB ``` ## Training Due to licensing agreements, we currently do not plan to release the training code. For details related to training, please refer to the paper and its supplementary materials. ## Acknowledgements We benefit greatly from the following open source works, from which we adapted parts of our code. - [SAM](https://github.com/facebookresearch/segment-anything): promptable architecture - [MultiHMR](https://github.com/naver/multi-hmr) & [BEV](https://www.yusun.work/BEV/BEV.html): multi-person baseline - [GVHMR](https://github.com/zju3dv/GVHMR): video head design - [CamHMR](https://github.com/pixelite1201/CameraHMR): annotations - [BUDDI](https://github.com/muelea/buddi): two-person interation - [viser](https://github.com/nerfstudio-project/viser) & [Gloss](https://github.com/Meshcapade/gloss): visualization In addition, the pipeline includes [Detectron2](https://github.com/facebookresearch/detectron2), [SAM2](https://github.com/facebookresearch/sam2), [DROID-SLAM](https://github.com/princeton-vl/DROID-SLAM), [Metric3D](https://github.com/YvanYin/Metric3D), [ViTPose](https://github.com/ViTAE-Transformer/ViTPose) and [SPEC](https://github.com/mkocabas/SPEC). ## Citation ```bibtex @article{wang2025prompthmr, title={PromptHMR: Promptable Human Mesh Recovery}, author={Wang, Yufu and Sun, Yu and Patel, Priyanka and Daniilidis, Kostas and Black, Michael J and Kocabas, Muhammed}, journal={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, year={2025} } ```