Spaces:

nocapdev
/

my-gradio-momask

Sleeping

App Files Files Community

my-gradio-momask / HOW_TO_DEBUG.md

nocapdev

Upload folder using huggingface_hub

9bad583 verified 17 days ago

preview code

raw

history blame contribute delete

6.36 kB

A newer version of the Gradio SDK is available: 6.1.0

Upgrade

🐛 How to Debug Your HF Space

Your Situation

✅ Deployed successfully ⏳ Took long time to respond ❌ Finally showed error

🎯 Step-by-Step Debugging

Step 1: Run Local Diagnosis (30 seconds)

# Check your HF Space status
python debug_hf_space.py

This will tell you:

✅ If Space is running
✅ What hardware it's using (CPU vs GPU)
✅ If model files are uploaded
✅ Common issues

Step 2: Get the Actual Error (MOST IMPORTANT)

Go to your Space and copy the error:

Visit: https://huggingface.co/spaces/nocapdev/my-gradio-momask
Click: "Logs" tab (top right)
Scroll to the bottom
Copy the last 30-50 lines

What to look for:

Lines with ERROR or Exception
Lines with Traceback
The very last error message

Step 3: Common Error Patterns

Error A: "Model checkpoints not found"

ERROR: Model checkpoints not found!
Looking for: ./checkpoints
FileNotFoundError: [Errno 2] No such file or directory

Cause: Model files weren't uploaded to HF Space Solution: Upload the checkpoints (see below)

Error B: "CUDA out of memory"

RuntimeError: CUDA out of memory
torch.cuda.OutOfMemoryError

Cause: Model too large for GPU RAM Solution: Use larger GPU or optimize model

Error C: "Killed" or "SIGKILL"

Killed
Process finished with exit code 137

Cause: Out of RAM (CPU memory) Solution: Upgrade Space RAM or optimize code

Error D: Stuck at "Generating motion tokens..."

[1/4] Generating motion tokens...
[No more output for 20+ minutes]

Cause: Using CPU (very slow, not an error!) Solution: Wait 20-30 minutes OR upgrade to GPU

🔧 Solutions for Common Issues

Solution 1: Upload Model Checkpoints

If error shows: Model checkpoints not found

Option A: Upload via Git (for files <10GB)

# Clone your Space
git clone https://huggingface.co/spaces/nocapdev/my-gradio-momask
cd my-gradio-momask

# Install Git LFS (one time)
git lfs install

# Track large files
git lfs track "checkpoints/**/*.tar"
git lfs track "checkpoints/**/*.pth"
git lfs track "checkpoints/**/*.npy"

# Copy your checkpoints
# FROM: C:\Users\purva\OneDrive\Desktop\momaskhg\checkpoints
# TO: current directory
cp -r /path/to/checkpoints ./

# Commit and push
git add .gitattributes
git add checkpoints/
git commit -m "Add model checkpoints"
git push

Option B: Upload via HF Web UI

Go to: https://huggingface.co/spaces/nocapdev/my-gradio-momask/tree/main
Click "Add file" → "Upload files"
Drag your checkpoints/ folder
Click "Commit"

Note: This works for files <50MB. For larger files, use Git LFS.

Option C: Host Models Separately

Upload models to HF Model Hub, then download in app.py:

from huggingface_hub import snapshot_download

# Add to app.py before initializing generator
if not os.path.exists('./checkpoints'):
    print("Downloading models from HF Hub...")
    snapshot_download(
        repo_id="YOUR_USERNAME/momask-models",
        local_dir="./checkpoints"
    )

Solution 2: Upgrade Hardware (for speed)

If using CPU and it's too slow:

Go to: https://huggingface.co/spaces/nocapdev/my-gradio-momask/settings
Scroll to "Hardware"
Select:
- T4 small (~$0.60/hour) - Good for this app
- A10G small (~$3/hour) - Faster
Click "Save"
Wait for rebuild (~2 minutes)

Solution 3: Test Locally First

Before debugging on HF, test locally:

# 1. Test your setup
python test_local.py

# 2. Run app locally
python app.py

# 3. Visit http://localhost:7860
# 4. Try a prompt
# 5. Check terminal for errors

If it works locally but fails on HF:

Models probably not uploaded to HF Space
Or HF Space using different Python/package versions

📋 Debugging Checklist

Run through this checklist:

✅ Pre-deployment

python test_local.py passes
App works locally at http://localhost:7860
Models in ./checkpoints/ directory
python pre_deploy_check.py shows 8/8 PASS

✅ Post-deployment

Space shows "Running" status
Logs show "Using device: cpu/cuda"
Logs show "Models loaded successfully!"
No error messages in logs

✅ During generation

Logs show "[1/4] Generating motion tokens..."
Logs show progress through [2/4], [3/4], [4/4]
No "Killed" or "SIGKILL" messages

🎯 Quick Diagnosis Commands

# Check HF Space status
python debug_hf_space.py

# Test local setup
python test_local.py

# Validate before deploy
python pre_deploy_check.py

# Deploy with latest fixes
python deploy.py

📊 Expected Logs (Healthy Run)

Startup (should see this):

Using device: cpu  (or cuda)
Loading models...
✓ VQ model loaded
✓ Transformer loaded
✓ Residual model loaded
✓ Length estimator loaded
Models loaded successfully!
Running on local URL: http://0.0.0.0:7860

During generation (should see this):

======================================================================
Generating motion for: 'a person walks forward'
======================================================================
[1/4] Generating motion tokens...
✓ Generated 80 frames
[2/4] Converting to BVH format...
✓ BVH conversion complete
[3/4] Rendering video...
✓ Video saved to ./gradio_outputs/motion_12345.mp4
[4/4] Complete!
======================================================================

🆘 Still Stuck?

Share these with me:

Output from:
```
python debug_hf_space.py
```
Last 50 lines from HF Space Logs
- Go to Logs tab
- Copy from bottom
- Include any ERROR or Traceback
What you see in the browser
- Screenshot of the error
- Or copy the error message

Then I can give you the exact fix!

💡 Most Likely Issues (90% of cases)

CPU is slow (not an error!)
- Logs show: "Using device: cpu"
- Solution: Wait 20 mins OR upgrade to GPU
Models not uploaded
- Logs show: "Model checkpoints not found"
- Solution: Upload checkpoints to HF Space
Out of memory
- Logs show: "Killed" or "SIGKILL"
- Solution: Upgrade to more RAM

Run python debug_hf_space.py first - it will identify which one!