π€ ACT for Push-T (Baseline Benchmark)
π― Research Purpose
Important Note: This model was trained primarily for academic comparisonβevaluating the performance difference between ACT and Diffusion Policy algorithms under identical training conditions (using the lerobot/pusht dataset). This is a benchmark experiment designed to analyze different algorithms' learning capabilities for this specific manipulation task, not to train a highly successful practical model.
Summary: This model represents the ACT (Action Chunking with Transformers) baseline trained on the Push-T task. It serves as a comparative benchmark for our research on Diffusion Policies. Despite 200k steps of training, ACT struggled to model the multimodal action distribution required for high-precision alignment in this task.
- π§© Task: Push-T (Simulated)
- π§ Algorithm: ACT (Action Chunking with Transformers)
- π Training Steps: 200,000
- π Author: Graduate Student, UESTC (University of Electronic Science and Technology of China)
π¬ Benchmark Results (Baseline)
This model establishes the baseline performance. Unlike Diffusion Policy, ACT tends to average out multimodal action possibilities, leading to "stiff" behavior or failure to perform fine-grained adjustments at the boundaries.
π Evaluation Metrics (50 Episodes)
| Metric | Value | Interpretation | Status |
|---|---|---|---|
| Success Rate | 0.0% | Failed to meet the strict >95% overlap criteria. | β |
| Avg Max Reward | 0.51 | Partially covers the target (~50%), but lacks precision. | π§ |
| Avg Sum Reward | 55.48 | Trajectories are valid but often stall or drift. | π |
Analysis: While the model learned the general reaching and pushing motion (Reward > 0.5), it consistently failed the final stage of the task. This highlights ACT's limitation in handling tasks requiring high-precision correction from multimodal demonstrations compared to Generative Policies.
βοΈ Model Details
| Parameter | Description |
|---|---|
| Architecture | ResNet18 (Backbone) + Transformer Encoder-Decoder |
| Action Chunking | 100 steps |
| VAE Enabled | Yes (Latent Dim: 32) |
| Input | Single Camera (84x84) + Agent Position |
π§ Training Configuration
For reproducibility, here are the key parameters used during the training session.
- Batch Size: 64
- Optimizer: AdamW (
lr=2e-5) - Scheduler: Constant
- Vision: ResNet18 (Pretrained ImageNet)
- Precision: Mixed Precision (AMP) enabled
Original Training Command (My Training Mode)
python -m lerobot.scripts.lerobot_train
--config_path act_pusht.yaml
--dataset.repo_id lerobot/pusht
--job_name aloha_sim_insertion_human_ACT_PushT
--wandb.enable true
--policy.repo_id Lemon-03/ACT_PushT_test
act_pusht.yaml
π Click to view full act_pusht.yaml configuration
# @package _global_
# Basic Settings
seed: 100000
job_name: ACT-PushT
steps: 200000
eval_freq: 10000
save_freq: 50000
log_freq: 250
batch_size: 64
# Dataset
dataset:
repo_id: lerobot/pusht
# Evaluation
eval:
n_episodes: 50
batch_size: 8
# Environment
env:
type: pusht
task: PushT-v0
fps: 10
# Policy Configuration
policy:
type: act
# Vision Backbone
vision_backbone: resnet18
pretrained_backbone_weights: ResNet18_Weights.IMAGENET1K_V1
replace_final_stride_with_dilation: false
# Transformer Params
pre_norm: false
dim_model: 512
n_heads: 8
dim_feedforward: 3200
feedforward_activation: relu
n_encoder_layers: 4
n_decoder_layers: 1
# VAE Params
use_vae: true
latent_dim: 32
n_vae_encoder_layers: 4
# Action Chunking
chunk_size: 100
n_action_steps: 100
n_obs_steps: 1
# Training & Loss
dropout: 0.1
kl_weight: 10.0
# Optimizer
optimizer_lr: 2e-5
optimizer_lr_backbone: 2e-5
optimizer_weight_decay: 2e-4
use_amp: true
π Evaluate (My Evaluation Mode)
Run the following command in your terminal to evaluate the model for 50 episodes and save the visualization videos:
python -m lerobot.scripts.lerobot_eval \
--policy.type act \
--policy.pretrained_path outputs/train/2025-12-02/00-28-32_pusht_ACT_PushT/checkpoints/last/pretrained_model \
--eval.n_episodes 50 \
--eval.batch_size 10 \
--env.type pusht \
--env.task PushT-v0
To evaluate this model locally, run the following command:
python -m lerobot.scripts.lerobot_eval \
--policy.type act \
--policy.pretrained_path Lemon-03/pusht_ACT_PushT_test \
--eval.n_episodes 50 \
--eval.batch_size 10 \
--env.type pusht \
--env.task PushT-v0
- Downloads last month
- 16