my-gradio-momask / HOW_TO_DEBUG.md
nocapdev's picture
Upload folder using huggingface_hub
9bad583 verified

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

πŸ› How to Debug Your HF Space

Your Situation

βœ… Deployed successfully ⏳ Took long time to respond ❌ Finally showed error


🎯 Step-by-Step Debugging

Step 1: Run Local Diagnosis (30 seconds)

# Check your HF Space status
python debug_hf_space.py

This will tell you:

  • βœ… If Space is running
  • βœ… What hardware it's using (CPU vs GPU)
  • βœ… If model files are uploaded
  • βœ… Common issues

Step 2: Get the Actual Error (MOST IMPORTANT)

Go to your Space and copy the error:

  1. Visit: https://huggingface.co/spaces/nocapdev/my-gradio-momask
  2. Click: "Logs" tab (top right)
  3. Scroll to the bottom
  4. Copy the last 30-50 lines

What to look for:

  • Lines with ERROR or Exception
  • Lines with Traceback
  • The very last error message

Step 3: Common Error Patterns

Error A: "Model checkpoints not found"

ERROR: Model checkpoints not found!
Looking for: ./checkpoints
FileNotFoundError: [Errno 2] No such file or directory

Cause: Model files weren't uploaded to HF Space Solution: Upload the checkpoints (see below)

Error B: "CUDA out of memory"

RuntimeError: CUDA out of memory
torch.cuda.OutOfMemoryError

Cause: Model too large for GPU RAM Solution: Use larger GPU or optimize model

Error C: "Killed" or "SIGKILL"

Killed
Process finished with exit code 137

Cause: Out of RAM (CPU memory) Solution: Upgrade Space RAM or optimize code

Error D: Stuck at "Generating motion tokens..."

[1/4] Generating motion tokens...
[No more output for 20+ minutes]

Cause: Using CPU (very slow, not an error!) Solution: Wait 20-30 minutes OR upgrade to GPU


πŸ”§ Solutions for Common Issues

Solution 1: Upload Model Checkpoints

If error shows: Model checkpoints not found

Option A: Upload via Git (for files <10GB)

# Clone your Space
git clone https://huggingface.co/spaces/nocapdev/my-gradio-momask
cd my-gradio-momask

# Install Git LFS (one time)
git lfs install

# Track large files
git lfs track "checkpoints/**/*.tar"
git lfs track "checkpoints/**/*.pth"
git lfs track "checkpoints/**/*.npy"

# Copy your checkpoints
# FROM: C:\Users\purva\OneDrive\Desktop\momaskhg\checkpoints
# TO: current directory
cp -r /path/to/checkpoints ./

# Commit and push
git add .gitattributes
git add checkpoints/
git commit -m "Add model checkpoints"
git push

Option B: Upload via HF Web UI

  1. Go to: https://huggingface.co/spaces/nocapdev/my-gradio-momask/tree/main
  2. Click "Add file" β†’ "Upload files"
  3. Drag your checkpoints/ folder
  4. Click "Commit"

Note: This works for files <50MB. For larger files, use Git LFS.

Option C: Host Models Separately

Upload models to HF Model Hub, then download in app.py:

from huggingface_hub import snapshot_download

# Add to app.py before initializing generator
if not os.path.exists('./checkpoints'):
    print("Downloading models from HF Hub...")
    snapshot_download(
        repo_id="YOUR_USERNAME/momask-models",
        local_dir="./checkpoints"
    )

Solution 2: Upgrade Hardware (for speed)

If using CPU and it's too slow:

  1. Go to: https://huggingface.co/spaces/nocapdev/my-gradio-momask/settings
  2. Scroll to "Hardware"
  3. Select:
    • T4 small (~$0.60/hour) - Good for this app
    • A10G small (~$3/hour) - Faster
  4. Click "Save"
  5. Wait for rebuild (~2 minutes)

Solution 3: Test Locally First

Before debugging on HF, test locally:

# 1. Test your setup
python test_local.py

# 2. Run app locally
python app.py

# 3. Visit http://localhost:7860
# 4. Try a prompt
# 5. Check terminal for errors

If it works locally but fails on HF:

  • Models probably not uploaded to HF Space
  • Or HF Space using different Python/package versions

πŸ“‹ Debugging Checklist

Run through this checklist:

βœ… Pre-deployment

  • python test_local.py passes
  • App works locally at http://localhost:7860
  • Models in ./checkpoints/ directory
  • python pre_deploy_check.py shows 8/8 PASS

βœ… Post-deployment

  • Space shows "Running" status
  • Logs show "Using device: cpu/cuda"
  • Logs show "Models loaded successfully!"
  • No error messages in logs

βœ… During generation

  • Logs show "[1/4] Generating motion tokens..."
  • Logs show progress through [2/4], [3/4], [4/4]
  • No "Killed" or "SIGKILL" messages

🎯 Quick Diagnosis Commands

# Check HF Space status
python debug_hf_space.py

# Test local setup
python test_local.py

# Validate before deploy
python pre_deploy_check.py

# Deploy with latest fixes
python deploy.py

πŸ“Š Expected Logs (Healthy Run)

Startup (should see this):

Using device: cpu  (or cuda)
Loading models...
βœ“ VQ model loaded
βœ“ Transformer loaded
βœ“ Residual model loaded
βœ“ Length estimator loaded
Models loaded successfully!
Running on local URL: http://0.0.0.0:7860

During generation (should see this):

======================================================================
Generating motion for: 'a person walks forward'
======================================================================
[1/4] Generating motion tokens...
βœ“ Generated 80 frames
[2/4] Converting to BVH format...
βœ“ BVH conversion complete
[3/4] Rendering video...
βœ“ Video saved to ./gradio_outputs/motion_12345.mp4
[4/4] Complete!
======================================================================

πŸ†˜ Still Stuck?

Share these with me:

  1. Output from:

    python debug_hf_space.py
    
  2. Last 50 lines from HF Space Logs

    • Go to Logs tab
    • Copy from bottom
    • Include any ERROR or Traceback
  3. What you see in the browser

    • Screenshot of the error
    • Or copy the error message

Then I can give you the exact fix!


πŸ’‘ Most Likely Issues (90% of cases)

  1. CPU is slow (not an error!)

    • Logs show: "Using device: cpu"
    • Solution: Wait 20 mins OR upgrade to GPU
  2. Models not uploaded

    • Logs show: "Model checkpoints not found"
    • Solution: Upload checkpoints to HF Space
  3. Out of memory

    • Logs show: "Killed" or "SIGKILL"
    • Solution: Upgrade to more RAM

Run python debug_hf_space.py first - it will identify which one!