Wavelet Policy

🔍 Abstract: Most imitation learning policies treat the problem as a time-series prediction task, directly mapping high-dimensional observations—such as visual input and proprioception—into action space. While time-series methods focus on spatial-domain modeling, they often overlook inherent temporal patterns in action sequences. To address this, we recast imitation learning policies in the frequency domain and propose Wavelet Policy. Our approach applies discrete wavelet transforms (WT) for feature preprocessing, then uses a Single-Encoder-Multiple-Decoder (SE2MD) architecture to extract multi-scale frequency-domain features. To further enrich feature mapping and boost capacity, we insert a Learnable Frequency-Domain Filter (LFDF) after each frequency decoder, improving robustness under varying visual conditions. Experiments show that Wavelet Policy outperforms state-of-the-art end-to-end methods by over 10 % across four challenging robotic-arm tasks while keeping model size comparable. In long-horizon settings, its performance degrades more gracefully as task complexity increases. The code will be released publicly.

📝 Striving for a Simple and Efficient Embodied Intelligence Model

💻 System Requirements

Component	Requirement
Operating System	Ubuntu 20.04
GPU Memory	Training: ≥ 4 GB; Inference: ≥ 2 GB
Disk Space	100–200 GB (datasets)
GPU Support	NVIDIA GPU with CUDA support recommended

📌 For detailed hardware specs, see the “Experimental Setup” section in the paper.

⚙️ Environment Configuration

We recommend using a conda environment. Quick install steps:

git clone https://github.com/lurenjia384/Wavelet_Policy
cd Wavelet_Policy

# Create conda environment
conda create -n Wavelet_Policy python=3.7.16 -y
conda activate Wavelet_Policy

# Install dependencies
pip install -r requirements.txt

📁 Project Structure

Wavelet_Policy/
├── images              # Images for GitHub display
├── pre_model           # Pretrained models
├── log                 # Log files
├── assets              # Robot fixtures or CAD assets
├── pytorch_wavelets    # Wavelet transform utilities
├── vid_path            # Saved inference videos
├── infer.py            # Inference script
├── model.py            # Network definitions
├── utils.py            # Configuration and helper code
├── requirements.txt    # Python dependencies
├── LICENSE
└── README.md

🔗 Pre-trained Model Download

Pre-trained weights and configurations are available on Hugging Face:

👉 WaveletPolicy-base

Cameras	Dataset	Params (M)	Download (closed)
One	Transfer Cube	17.22	best_model_11.pt
Two	Transfer Cube	—	coming soon
One	Bimanual Insertion	17.22	coming soon
Two	Bimanual Insertion	—	coming soon
One	Transfer Plus	17.22	best_model_13.pt
Two	Transfer Plus	—	coming soon
One	Stack Two Blocks	17.22	coming soon
Two	Stack Two Blocks	—	coming soon

After downloading, place the model files under:

Wavelet_Policy/
├── pre_model

You can also load models directly with huggingface_hub (no need for --netdir or --stats_path):

from huggingface_hub import hf_hub_download

model_path = hf_hub_download(
    repo_id="lurenjia384/wavelet_policy_model",
    filename="task_3/best_model_13.pt"
)
stats_path = hf_hub_download(
    repo_id="lurenjia384/wavelet_policy_model",
    filename="task_3/task_3.pkl"
)

🚀 Quick Start

Run inference:

python infer.py --task_name {task_name} \
                --stats_path {data_pkl_path.pkl} \
                --netdir {model_path.pt} \
                --no_visualization {0|1}

Example： Transfer Plus task, with visualization enabled:

python infer.py --task_name sim_transfer_cube_scripted_plus --no_visualization 0

If there are still difficulties, you can refer to the demonstration videoQuick-Start Demo.

Other valid values for --task_name are:

sim_transfer_cube_scripted — Transfer Cube
sim_insertion_scripted — Bimanual Insertion
Put — Stack Two Blocks

Train the model: Coming soon…

📊 Experimental Results

Table: Success rates (%) of Wavelet Policy vs. five baselines across four tasks and three stages

Model	TC-1	TC-2	TC-3	BI-1	BI-2	BI-3	TP-1	TP-2	TP-3	ST-1	ST-2	ST-3
DP (DDIM)	9	6	4	4	3	1	2	1	1	1	1	1
ACT	98	96	94	81	73	68	66	57	57	85	67	50
NL-ACT	94	91	90	83	74	70	62	55	55	82	65	48
HACT-Vq	98	98	97	87	82	76	79	68	68	90	76	55
InterACT	98	88	82	88	78	44	—	—	—	—	—	—
Ours	98	98	97	87	82	78	78	70	70	96	79	59

Note: Bold entries denote the best performance in each column. — indicates no experiment for that method. TC: Transfer Cube; BI: Bimanual Insertion; TP: Transfer Plus; ST: Stack Two Blocks. The number after each task name indicates the stage. See the paper for full experimental details.

Downloads last month: -; Downloads are not tracked for this model. How to track

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support