πŸ€– ACT for Push-T (Baseline Benchmark)

LeRobot Task UESTC License

🎯 Research Purpose

Important Note: This model was trained primarily for academic comparisonβ€”evaluating the performance difference between ACT and Diffusion Policy algorithms under identical training conditions (using the lerobot/pusht dataset). This is a benchmark experiment designed to analyze different algorithms' learning capabilities for this specific manipulation task, not to train a highly successful practical model.

Summary: This model represents the ACT (Action Chunking with Transformers) baseline trained on the Push-T task. It serves as a comparative benchmark for our research on Diffusion Policies. Despite 200k steps of training, ACT struggled to model the multimodal action distribution required for high-precision alignment in this task.

  • 🧩 Task: Push-T (Simulated)
  • 🧠 Algorithm: ACT (Action Chunking with Transformers)
  • πŸ”„ Training Steps: 200,000
  • πŸŽ“ Author: Graduate Student, UESTC (University of Electronic Science and Technology of China)

πŸ”¬ Benchmark Results (Baseline)

This model establishes the baseline performance. Unlike Diffusion Policy, ACT tends to average out multimodal action possibilities, leading to "stiff" behavior or failure to perform fine-grained adjustments at the boundaries.

πŸ“Š Evaluation Metrics (50 Episodes)

Metric Value Interpretation Status
Success Rate 0.0% Failed to meet the strict >95% overlap criteria. ❌
Avg Max Reward 0.51 Partially covers the target (~50%), but lacks precision. 🚧
Avg Sum Reward 55.48 Trajectories are valid but often stall or drift. πŸ“‰

Analysis: While the model learned the general reaching and pushing motion (Reward > 0.5), it consistently failed the final stage of the task. This highlights ACT's limitation in handling tasks requiring high-precision correction from multimodal demonstrations compared to Generative Policies.


βš™οΈ Model Details

Parameter Description
Architecture ResNet18 (Backbone) + Transformer Encoder-Decoder
Action Chunking 100 steps
VAE Enabled Yes (Latent Dim: 32)
Input Single Camera (84x84) + Agent Position

πŸ”§ Training Configuration

For reproducibility, here are the key parameters used during the training session.

  • Batch Size: 64
  • Optimizer: AdamW (lr=2e-5)
  • Scheduler: Constant
  • Vision: ResNet18 (Pretrained ImageNet)
  • Precision: Mixed Precision (AMP) enabled

Original Training Command (My Training Mode)

python -m lerobot.scripts.lerobot_train
  --config_path act_pusht.yaml
  --dataset.repo_id lerobot/pusht
  --job_name aloha_sim_insertion_human_ACT_PushT
  --wandb.enable true
  --policy.repo_id Lemon-03/ACT_PushT_test

act_pusht.yaml

πŸ“„ Click to view full act_pusht.yaml configuration
# @package _global_

# Basic Settings
seed: 100000
job_name: ACT-PushT
steps: 200000
eval_freq: 10000
save_freq: 50000
log_freq: 250
batch_size: 64

# Dataset
dataset:
  repo_id: lerobot/pusht

# Evaluation
eval:
  n_episodes: 50
  batch_size: 8

# Environment
env:
  type: pusht
  task: PushT-v0
  fps: 10

# Policy Configuration
policy:
  type: act

  # Vision Backbone
  vision_backbone: resnet18
  pretrained_backbone_weights: ResNet18_Weights.IMAGENET1K_V1
  replace_final_stride_with_dilation: false
  
  # Transformer Params
  pre_norm: false
  dim_model: 512
  n_heads: 8
  dim_feedforward: 3200
  feedforward_activation: relu
  n_encoder_layers: 4
  n_decoder_layers: 1
  
  # VAE Params
  use_vae: true
  latent_dim: 32
  n_vae_encoder_layers: 4

  # Action Chunking
  chunk_size: 100
  n_action_steps: 100
  n_obs_steps: 1

  # Training & Loss
  dropout: 0.1
  kl_weight: 10.0
  
  # Optimizer
  optimizer_lr: 2e-5
  optimizer_lr_backbone: 2e-5
  optimizer_weight_decay: 2e-4
  
  use_amp: true

πŸš€ Evaluate (My Evaluation Mode)

Run the following command in your terminal to evaluate the model for 50 episodes and save the visualization videos:

python -m lerobot.scripts.lerobot_eval \
  --policy.type act \
  --policy.pretrained_path outputs/train/2025-12-02/00-28-32_pusht_ACT_PushT/checkpoints/last/pretrained_model \
  --eval.n_episodes 50 \
  --eval.batch_size 10 \
  --env.type pusht \
  --env.task PushT-v0

To evaluate this model locally, run the following command:

python -m lerobot.scripts.lerobot_eval \
  --policy.type act \
  --policy.pretrained_path Lemon-03/pusht_ACT_PushT_test \
  --eval.n_episodes 50 \
  --eval.batch_size 10 \
  --env.type pusht \
  --env.task PushT-v0
Downloads last month
16
Video Preview
loading

Dataset used to train Lemon-03/ACT_PushT_test