Spaces:
Sleeping
A newer version of the Gradio SDK is available:
6.1.0
π§ Troubleshooting: Slow/Hanging Generation
Your Issue
β Deployed successfully β οΈ App loads but generation takes forever or hangs
π Step 1: Check HF Space Logs
How to Access Logs:
- Go to: https://huggingface.co/spaces/nocapdev/my-gradio-momask
- Click "Logs" tab (top right)
- Look for these specific messages:
π― Common Issues & Solutions
Issue A: Models Not Found
Look for in logs:
ERROR: Model checkpoints not found!
Looking for: ./checkpoints
Why: The checkpoints/ directory wasn't uploaded to HF Spaces
Solutions:
Upload models to Space (if <10GB total):
# In your local directory git clone https://huggingface.co/spaces/nocapdev/my-gradio-momask cd my-gradio-momask cp -r /path/to/checkpoints ./ git add checkpoints/ git commit -m "Add model checkpoints" git pushUse Git LFS for large files (recommended):
git lfs install git lfs track "checkpoints/**/*.tar" git lfs track "checkpoints/**/*.pth" git add .gitattributes git add checkpoints/ git commit -m "Add models with LFS" git pushHost on HF Model Hub (best for very large files):
- Upload checkpoints to HF Model Hub
- Modify app.py to download on startup
Issue B: Using CPU Instead of GPU
Look for in logs:
Using device: cpu
Why: HF Spaces free tier uses CPU. Models are very slow on CPU.
Impact: Generation can take 5-30 minutes on CPU vs 10-30 seconds on GPU
Solutions:
Upgrade to GPU Space (costs money):
- Go to Space Settings
- Change hardware to T4 GPU (~$0.60/hour)
- Or use A10G for faster inference
Optimize for CPU (free but slower):
- Reduce
time_stepsfrom 18 to 10 - Use smaller batch processing
- Add timeout warnings
- Reduce
Use CPU optimizations: Add to app.py:
# Set CPU threads torch.set_num_threads(4) # Use CPU-optimized operations torch.set_float32_matmul_precision('medium')
Issue C: Out of Memory
Look for in logs:
Killed
SIGKILL
OutOfMemoryError
Why: Models too large for available RAM
Solutions:
- Upgrade Space hardware (HF Space Settings)
- Reduce model size:
- Use FP16 instead of FP32
- Reduce batch sizes
- Add memory monitoring
Issue D: Stuck During Generation
Look for in logs:
[1/4] Generating motion tokens...
[Nothing else appears]
Why:
- CPU inference is very slow (can take 10-20 minutes)
- Infinite loop in model
- Process timeout
Solutions:
- Wait longer - CPU generation can take 10-30 minutes!
- Check if it's actually running:
- Look for CPU usage in HF Space metrics
- Add timeout:
import signal def timeout_handler(signum, frame): raise TimeoutError("Generation timed out") signal.alarm(600) # 10 minute timeout
π What You Should See in Logs
Healthy Startup:
Using device: cuda # or cpu
Loading models...
β VQ model loaded
β Transformer loaded
β Residual model loaded
β Length estimator loaded
Models loaded successfully!
Running on local URL: http://0.0.0.0:7860
Healthy Generation (with my updates):
======================================================================
Generating motion for: 'a person walks forward'
======================================================================
[1/4] Generating motion tokens...
β Generated 80 frames
[2/4] Converting to BVH format...
β BVH conversion complete
[3/4] Rendering video...
β Video saved to ./gradio_outputs/motion_12345.mp4
[4/4] Complete!
======================================================================
Unhealthy - Models Missing:
ERROR: Model checkpoints not found!
Looking for: ./checkpoints
The model files are not included in this Space.
Unhealthy - Error During Init:
ERROR during initialization:
======================================================================
FileNotFoundError: [Errno 2] No such file or directory: './checkpoints/...'
π Quick Fix: Redeploy with Better Logging
I've updated app.py with:
- β Auto CPU/GPU detection
- β Better error messages
- β Progress indicators
- β Graceful failure if models missing
To deploy the update:
python deploy.py
π‘ Immediate Actions
Action 1: Check Logs NOW
- Go to Logs tab on your Space
- Copy the last 50 lines
- Look for any ERROR messages
- Share them if you need help
Action 2: Verify Models
# On your local machine, check model size
ls -lh checkpoints/
# If very large (>5GB), you'll need Git LFS
Action 3: Expected Timings
| Hardware | Generation Time |
|---|---|
| Free CPU | 10-30 minutes β οΈ |
| T4 GPU | 20-60 seconds β |
| A10G GPU | 10-30 seconds β |
If using free tier: Be patient! First generation takes longer.
π― Next Steps
- Check logs - Most important!
- Redeploy updated app.py - Better error handling
- Share log output - So I can help debug
To redeploy:
python deploy.py
Then check logs again to see the new detailed output!
π Share These from Logs:
Copy and share:
- Lines showing "Using device: X"
- Any lines with "ERROR" or "FAIL"
- Last 20 lines when you submitted a prompt
- Any "Killed" or "SIGKILL" messages
This will help identify the exact issue!