| | --- |
| | tags: |
| | - Human Mesh Recovery |
| | - Human Pose and Shape Estimation |
| | - Multi-Person Mesh Recovery |
| | arxiv: '2411.19824' |
| | license: apache-2.0 |
| | --- |
| | |
| | # SAT-HMR |
| |
|
| | Offical [Pytorch](https://pytorch.org/) implementation of our paper: |
| |
|
| | <h3 align="center">SAT-HMR: Real-Time Multi-Person 3D Mesh Estimation via Scale-Adaptive Tokens <br> (CVPR 2025)</h3> |
| |
|
| | <h4 align="center" style="text-decoration: none;"> |
| | <a href="https://github.com/ChiSu001/", target="_blank"><b>Chi Su</b></a> |
| | , |
| | <a href="https://shirleymaxx.github.io/", target="_blank"><b>Xiaoxuan Ma</b></a> |
| | , |
| | <a href="https://scholar.google.com/citations?user=DoUvUz4AAAAJ&hl=en", target="_blank"><b>Jiajun Su</b></a> |
| | , |
| | <a href="https://cfcs.pku.edu.cn/english/people/faculty/yizhouwang/index.htm", target="_blank"><b>Yizhou Wang</b></a> |
| |
|
| | </h4> |
| |
|
| | <h3 align="center"> |
| | <a href="https://arxiv.org/abs/2411.19824", target="_blank">Paper</a> | |
| | <a href="https://ChiSu001.github.io/SAT-HMR", target="_blank">Project Page</a> | |
| | <a href="https://youtu.be/wLfNrDYFAns", target="_blank">Video</a> | |
| | <a href="https://github.com/ChiSu001/SAT-HMR", target="_blank">GitHub</a> |
| | </h3> |
| |
|
| | <!-- <div align="center"> |
| | <img src="figures/results.png" width="70%"> |
| | <img src="figures/results_3d.gif" width="29%"> |
| | </div> --> |
| |
|
| |
|
| | <!-- <h3> Overview of SAT-HMR </h3> --> |
| |
|
| | <p align="center"> |
| | <img src="figures/pipeline.png"/> |
| | </p> |
| |
|
| | <!-- <p align="center"> |
| | <img src="figures/pipeline.png" style="height: 300px; object-fit: cover;"/> |
| | </p> --> |
| |
|
| | ## Installation |
| |
|
| | We tested with python 3.11, PyTorch 2.4.1 and CUDA 12.1. |
| |
|
| | 1. Create a conda environment. |
| | ```bash |
| | conda create -n sathmr python=3.11 -y |
| | conda activate sathmr |
| | ``` |
| |
|
| | 2. Install [PyTorch](https://pytorch.org/) and [xFormers](https://github.com/facebookresearch/xformers). |
| | ```bash |
| | # Install PyTorch. It is recommended that you follow [official instruction](https://pytorch.org/) and adapt the cuda version to yours. |
| | conda install pytorch==2.4.1 torchvision==0.19.1 torchaudio==2.4.1 pytorch-cuda=12.1 -c pytorch -c nvidia |
| | |
| | # Install xFormers. It is recommended that you follow [official instruction](https://github.com/facebookresearch/xformers) and adapt the cuda version to yours. |
| | pip install -U xformers==0.0.28.post1 --index-url https://download.pytorch.org/whl/cu121 |
| | ``` |
| |
|
| | 3. Install other dependencies. |
| | ```bash |
| | pip install -r requirements.txt |
| | ``` |
| |
|
| | 4. You may need to modify `chumpy` package to avoid errors. For detailed instructions, please check [this guidance](docs/fix_chumpy.md). |
| |
|
| | ## Download Models & Weights |
| |
|
| | 1. Download SMPL-related weights. |
| | - Download `basicModel_f_lbs_10_207_0_v1.0.0.pkl`, `basicModel_m_lbs_10_207_0_v1.0.0.pkl`, and `basicModel_neutral_lbs_10_207_0_v1.0.0.pkl` from [here](https://smpl.is.tue.mpg.de/) (female & male) and [here](http://smplify.is.tue.mpg.de/) (neutral) to `${Project}/weights/smpl_data/smpl`. Please rename them as `SMPL_FEMALE.pkl`, `SMPL_MALE.pkl`, and `SMPL_NEUTRAL.pkl`, respectively. |
| | - Download others from [Google drive](https://drive.google.com/drive/folders/1wmd_pjmmDn3eSl3TLgProgZgCQZgtZIC?usp=sharing) and put them to `${Project}/weights/smpl_data/smpl`. |
| |
|
| | 2. Download DINOv2 pretrained weights from [their official repository](https://github.com/facebookresearch/dinov2?tab=readme-ov-file#pretrained-models). We use `ViT-B/14 distilled (without registers)`. Please put `dinov2_vitb14_pretrain.pth` to `${Project}/weights/dinov2`. These weights will be used to initialize our encoder. **You can skip this step if you are not going to train SAT-HMR.** |
| |
|
| | 3. Download pretrained weights for inference and evaluation from [Google drive](https://drive.google.com/file/d/12tGbqcrJ8YACcrfi5qslZNEciIHxcScZ/view?usp=sharing) or [🤗HuggingFace](https://huggingface.co/ChiSu001/SAT-HMR/blob/main/weights/sat_hmr/sat_644.pth). Please put them to `${Project}/weights/sat_hmr`. |
| |
|
| | Now the `weights` directory structure should be like this. |
| |
|
| | ``` |
| | ${Project} |
| | |-- weights |
| | |-- dinov2 |
| | | `-- dinov2_vitb14_pretrain.pth |
| | |-- sat_hmt |
| | | `-- sat_644.pth |
| | `-- smpl_data |
| | `-- smpl |
| | |-- body_verts_smpl.npy |
| | |-- J_regressor_h36m_correct.npy |
| | |-- SMPL_FEMALE.pkl |
| | |-- SMPL_MALE.pkl |
| | |-- smpl_mean_params.npz |
| | `-- SMPL_NEUTRAL.pkl |
| | ``` |
| |
|
| | ## Inference on Images |
| | <h4> Inference with 1 GPU</h4> |
| |
|
| | We provide some demo images in `${Project}/demo`. You can run SAT-HMR on all images on a single GPU via: |
| |
|
| |
|
| | ```bash |
| | python main.py --mode infer --cfg demo |
| | ``` |
| |
|
| | Results with overlayed meshes will be saved in `${Project}/demo_results`. |
| |
|
| | You can specify your own inference configuration by modifing `${Project}/configs/run/demo.yaml`: |
| |
|
| | - `input_dir` specifies the input image folder. |
| | - `output_dir` specifies the output folder. |
| | - `conf_thresh` specifies a list of confidence thresholds used for detection. SAT-HMR will run inference using thresholds in the list, respectively. |
| | - `infer_batch_size` specifies the batch size used for inference (on a single GPU). |
| |
|
| | <h4> Inference with Multiple GPUs</h4> |
| |
|
| | You can also try distributed inference on multiple GPUs if your input folder contains a large number of images. |
| | Since we use [🤗 Accelerate](https://huggingface.co/docs/accelerate/index) to launch our distributed configuration, first you may need to configure [🤗 Accelerate](https://huggingface.co/docs/accelerate/index) for how the current system is setup for distributed process. To do so run the following command and answer the questions prompted to you: |
| |
|
| | ```bash |
| | accelerate config |
| | ``` |
| |
|
| | Then run: |
| | ```bash |
| | accelerate launch main.py --mode infer --cfg demo |
| | ``` |
| |
|
| | <!-- ## Datasets Preparation |
| |
|
| | Coming soon. |
| |
|
| | ## Training and Evaluation |
| |
|
| | Coming soon. --> |
| |
|
| | ## Citing |
| |
|
| | If you find this code useful for your research, please consider citing our paper: |
| | ```bibtex |
| | @InProceedings{Su_2025_CVPR, |
| | author = {Su, Chi and Ma, Xiaoxuan and Su, Jiajun and Wang, Yizhou}, |
| | title = {SAT-HMR: Real-Time Multi-Person 3D Mesh Estimation via Scale-Adaptive Tokens}, |
| | booktitle = {Proceedings of the Computer Vision and Pattern Recognition Conference (CVPR)}, |
| | month = {June}, |
| | year = {2025}, |
| | pages = {16796-16806} |
| | } |
| | ``` |
| |
|
| | ## Acknowledgement |
| | This repo is built on the excellent work [DINOv2](https://github.com/facebookresearch/dinov2), [DAB-DETR](https://github.com/IDEA-Research/DAB-DETR), [DINO](https://github.com/IDEA-Research/DINO) and [🤗 Accelerate](https://huggingface.co/docs/accelerate/index). Thanks for these great projects. |