Wavelet Policy
Imitation Policy Learning in the Frequency Domain with Wavelet Transforms
Quick-Start Demo
Β·
Paper on arXiv
Β·
Presentation Video
Wavelet Policy

π Abstract: Most imitation learning policies treat the problem as a time-series prediction task, directly mapping high-dimensional observationsβsuch as visual input and proprioceptionβinto action space. While time-series methods focus on spatial-domain modeling, they often overlook inherent temporal patterns in action sequences. To address this, we recast imitation learning policies in the frequency domain and propose Wavelet Policy. Our approach applies discrete wavelet transforms (WT) for feature preprocessing, then uses a Single-Encoder-Multiple-Decoder (SE2MD) architecture to extract multi-scale frequency-domain features. To further enrich feature mapping and boost capacity, we insert a Learnable Frequency-Domain Filter (LFDF) after each frequency decoder, improving robustness under varying visual conditions. Experiments show that Wavelet Policy outperforms state-of-the-art end-to-end methods by over 10 % across four challenging robotic-arm tasks while keeping model size comparable. In long-horizon settings, its performance degrades more gracefully as task complexity increases. The code will be released publicly.
π Striving for a Simple and Efficient Embodied Intelligence Model

π» System Requirements
| Component | Requirement |
|---|---|
| Operating System | Ubuntu 20.04 |
| GPU Memory | Training: β₯ 4 GB; Inference: β₯ 2 GB |
| Disk Space | 100β200 GB (datasets) |
| GPU Support | NVIDIA GPU with CUDA support recommended |
π For detailed hardware specs, see the βExperimental Setupβ section in the paper.
βοΈ Environment Configuration
We recommend using a conda environment. Quick install steps:
git clone https://github.com/lurenjia384/Wavelet_Policy
cd Wavelet_Policy
# Create conda environment
conda create -n Wavelet_Policy python=3.7.16 -y
conda activate Wavelet_Policy
# Install dependencies
pip install -r requirements.txt
π Project Structure
Wavelet_Policy/
βββ images # Images for GitHub display
βββ pre_model # Pretrained models
βββ log # Log files
βββ assets # Robot fixtures or CAD assets
βββ pytorch_wavelets # Wavelet transform utilities
βββ vid_path # Saved inference videos
βββ infer.py # Inference script
βββ model.py # Network definitions
βββ utils.py # Configuration and helper code
βββ requirements.txt # Python dependencies
βββ LICENSE
βββ README.md
π Pre-trained Model Download
Pre-trained weights and configurations are available on Hugging Face:
π WaveletPolicy-base
| Cameras | Dataset | Params (M) | Download (closed) |
|---|---|---|---|
| One | Transfer Cube | 17.22 | best_model_11.pt |
| Two | Transfer Cube | β | coming soon |
| One | Bimanual Insertion | 17.22 | coming soon |
| Two | Bimanual Insertion | β | coming soon |
| One | Transfer Plus | 17.22 | best_model_13.pt |
| Two | Transfer Plus | β | coming soon |
| One | Stack Two Blocks | 17.22 | coming soon |
| Two | Stack Two Blocks | β | coming soon |
After downloading, place the model files under:
Wavelet_Policy/
βββ pre_model
You can also load models directly with huggingface_hub (no need for --netdir or --stats_path):
from huggingface_hub import hf_hub_download
model_path = hf_hub_download(
repo_id="lurenjia384/wavelet_policy_model",
filename="task_3/best_model_13.pt"
)
stats_path = hf_hub_download(
repo_id="lurenjia384/wavelet_policy_model",
filename="task_3/task_3.pkl"
)
π Quick Start
Run inference:
python infer.py --task_name {task_name} \
--stats_path {data_pkl_path.pkl} \
--netdir {model_path.pt} \
--no_visualization {0|1}
ExampleοΌ Transfer Plus task, with visualization enabled:
python infer.py --task_name sim_transfer_cube_scripted_plus --no_visualization 0
If there are still difficulties, you can refer to the demonstration videoQuick-Start Demo.
Other valid values for --task_name are:
sim_transfer_cube_scriptedβ Transfer Cubesim_insertion_scriptedβ Bimanual InsertionPutβ Stack Two Blocks
Train the model: Coming soonβ¦
π Experimental Results
Table: Success rates (%) of Wavelet Policy vs. five baselines across four tasks and three stages
| Model | TC-1 | TC-2 | TC-3 | BI-1 | BI-2 | BI-3 | TP-1 | TP-2 | TP-3 | ST-1 | ST-2 | ST-3 |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| DP (DDIM) | 9 | 6 | 4 | 4 | 3 | 1 | 2 | 1 | 1 | 1 | 1 | 1 |
| ACT | 98 | 96 | 94 | 81 | 73 | 68 | 66 | 57 | 57 | 85 | 67 | 50 |
| NL-ACT | 94 | 91 | 90 | 83 | 74 | 70 | 62 | 55 | 55 | 82 | 65 | 48 |
| HACT-Vq | 98 | 98 | 97 | 87 | 82 | 76 | 79 | 68 | 68 | 90 | 76 | 55 |
| InterACT | 98 | 88 | 82 | 88 | 78 | 44 | β | β | β | β | β | β |
| Ours | 98 | 98 | 97 | 87 | 82 | 78 | 78 | 70 | 70 | 96 | 79 | 59 |
Note: Bold entries denote the best performance in each column.
βindicates no experiment for that method.TC: Transfer Cube;BI: Bimanual Insertion;TP: Transfer Plus;ST: Stack Two Blocks. The number after each task name indicates the stage. See the paper for full experimental details.