# Wavelet Policy

> π **Abstract:**
> Most imitation learning policies treat the problem as a time-series prediction task, directly mapping high-dimensional observationsβsuch as visual input and proprioceptionβinto action space. While time-series methods focus on spatial-domain modeling, they often overlook inherent temporal patterns in action sequences. To address this, we recast imitation learning policies in the frequency domain and propose **Wavelet Policy**. Our approach applies discrete wavelet transforms (WT) for feature preprocessing, then uses a Single-Encoder-Multiple-Decoder (SE2MD) architecture to extract multi-scale frequency-domain features. To further enrich feature mapping and boost capacity, we insert a Learnable Frequency-Domain Filter (LFDF) after each frequency decoder, improving robustness under varying visual conditions. Experiments show that Wavelet Policy outperforms state-of-the-art end-to-end methods by over 10 % across four challenging robotic-arm tasks while keeping model size comparable. In long-horizon settings, its performance degrades more gracefully as task complexity increases. The code will be released publicly.
---
## π Striving for a Simple and Efficient Embodied Intelligence Model

---
## π» System Requirements
| Component | Requirement |
| ---------------- | ---------------------------------------- |
| Operating System | Ubuntu 20.04 |
| GPU Memory | Training: β₯ 4 GB; Inference: β₯ 2 GB |
| Disk Space | 100β200 GB (datasets) |
| GPU Support | NVIDIA GPU with CUDA support recommended |
> π For detailed hardware specs, see the βExperimental Setupβ section in the paper.
---
## βοΈ Environment Configuration
We recommend using a `conda` environment. Quick install steps:
```bash
git clone https://github.com/lurenjia384/Wavelet_Policy
cd Wavelet_Policy
# Create conda environment
conda create -n Wavelet_Policy python=3.7.16 -y
conda activate Wavelet_Policy
# Install dependencies
pip install -r requirements.txt
```
---
## π Project Structure
```bash
Wavelet_Policy/
βββ images # Images for GitHub display
βββ pre_model # Pretrained models
βββ log # Log files
βββ assets # Robot fixtures or CAD assets
βββ pytorch_wavelets # Wavelet transform utilities
βββ vid_path # Saved inference videos
βββ infer.py # Inference script
βββ model.py # Network definitions
βββ utils.py # Configuration and helper code
βββ requirements.txt # Python dependencies
βββ LICENSE
βββ README.md
```
---
## π Pre-trained Model Download
Pre-trained weights and configurations are available on Hugging Face:
π [WaveletPolicy-base](https://huggingface.co/lurenjia384/wavelet_policy_model)
| Cameras | Dataset | Params (M) | Download (closed) |
| :-----: | :----------------: | :--------: | :------------------------------------------------------------------------------------------------------------------------------- |
| One | Transfer Cube | 17.22 | [best\_model\_11.pt](https://huggingface.co/lurenjia384/wavelet_policy_model/resolve/main/task_1/best_model_11.pt?download=true) |
| Two | Transfer Cube | β | coming soon |
| One | Bimanual Insertion | 17.22 | coming soon |
| Two | Bimanual Insertion | β | coming soon |
| One | Transfer Plus | 17.22 | [best\_model\_13.pt](https://huggingface.co/lurenjia384/wavelet_policy_model/resolve/main/task_3/best_model_13.pt?download=true) |
| Two | Transfer Plus | β | coming soon |
| One | Stack Two Blocks | 17.22 | coming soon |
| Two | Stack Two Blocks | β | coming soon |
After downloading, place the model files under:
```
Wavelet_Policy/
βββ pre_model
```
You can also load models directly with `huggingface_hub` (no need for `--netdir` or `--stats_path`):
```python
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(
repo_id="lurenjia384/wavelet_policy_model",
filename="task_3/best_model_13.pt"
)
stats_path = hf_hub_download(
repo_id="lurenjia384/wavelet_policy_model",
filename="task_3/task_3.pkl"
)
```
---
## π Quick Start
**Run inference:**
```bash
python infer.py --task_name {task_name} \
--stats_path {data_pkl_path.pkl} \
--netdir {model_path.pt} \
--no_visualization {0|1}
```
**ExampleοΌ** Transfer Plus task, with visualization enabled:
```bash
python infer.py --task_name sim_transfer_cube_scripted_plus --no_visualization 0
```
If there are still difficulties, you can refer to the demonstration video[Quick-Start Demo](https://youtu.be/WnUJzu8MQBk).
Other valid values for `--task_name` are:
* `sim_transfer_cube_scripted` β Transfer Cube
* `sim_insertion_scripted` β Bimanual Insertion
* `Put` β Stack Two Blocks
**Train the model:**
Coming soonβ¦
---
## π Experimental Results
**Table:** Success rates (%) of Wavelet Policy vs. five baselines across four tasks and three stages
| Model | TC-1 | TC-2 | TC-3 | BI-1 | BI-2 | BI-3 | TP-1 | TP-2 | TP-3 | ST-1 | ST-2 | ST-3 |
| --------- | -----: | -----: | -----: | -----: | -----: | -----: | -----: | -----: | -----: | -----: | -----: | -----: |
| DP (DDIM) | 9 | 6 | 4 | 4 | 3 | 1 | 2 | 1 | 1 | 1 | 1 | 1 |
| ACT | 98 | 96 | 94 | 81 | 73 | 68 | 66 | 57 | 57 | 85 | 67 | 50 |
| NL-ACT | 94 | 91 | 90 | 83 | 74 | 70 | 62 | 55 | 55 | 82 | 65 | 48 |
| HACT-Vq | 98 | 98 | 97 | 87 | 82 | 76 | **79** | 68 | 68 | 90 | 76 | 55 |
| InterACT | **98** | 88 | 82 | **88** | 78 | 44 | β | β | β | β | β | β |
| Ours | 98 | **98** | **97** | 87 | **82** | **78** | 78 | **70** | **70** | **96** | **79** | **59** |
> **Note:** Bold entries denote the best performance in each column.
> `β` indicates no experiment for that method.
> `TC`: Transfer Cube; `BI`: Bimanual Insertion; `TP`: Transfer Plus; `ST`: Stack Two Blocks.
> The number after each task name indicates the stage.
> See the paper for full experimental details.