PIWM / HF_SPACES_CACHE_FIX.md
musictimer's picture
Fix initial bugs
02c6351
# ๐Ÿ”ง HF Spaces Cache Permission Fix
## โŒ **Problem:**
```
ERROR:app:Failed to load model: [Errno 13] Permission denied: '/.cache'
```
HF Spaces containers can't write to the root `/.cache` directory, causing model downloads to fail.
## โœ… **Solution Applied:**
### 1. **Fixed Cache Directory in app.py**
- โœ… Set custom cache directory: `/tmp/torch_cache`
- โœ… Added proper permissions handling
- โœ… Fixed OMP_NUM_THREADS environment variable issue
### 2. **Updated Dockerfile**
- โœ… Set environment variables to use `/tmp` for caches
- โœ… Pre-create cache directories
- โœ… Fixed OMP_NUM_THREADS value
### 3. **Key Changes Made:**
#### **app.py Changes:**
```python
# Fixed cache directory for torch.hub
state_dict = torch.hub.load_state_dict_from_url(
model_url,
map_location=device,
model_dir=cache_dir, # Custom cache dir
check_hash=False # Skip hash check for speed
)
# Fixed environment variables
os.environ["OMP_NUM_THREADS"] = "2" # Valid integer
os.environ["TORCH_HOME"] = "/tmp/torch"
os.environ["HF_HOME"] = "/tmp/huggingface"
```
#### **Dockerfile Changes:**
```dockerfile
ENV OMP_NUM_THREADS=2
ENV TORCH_HOME=/tmp/torch
ENV HF_HOME=/tmp/huggingface
ENV TRANSFORMERS_CACHE=/tmp/transformers
RUN mkdir -p /tmp/torch /tmp/huggingface /tmp/transformers
```
## ๐Ÿš€ **Expected Results:**
- โœ… No more "Permission denied: /.cache" errors
- โœ… No more "Invalid value for environment variable OMP_NUM_THREADS" warnings
- โœ… Model downloads work properly on HF Spaces
- โœ… App starts correctly and clicking works
## ๐Ÿ“‹ **To Deploy:**
1. **Commit the changes**: `git add . && git commit -m "Fix HF Spaces cache permissions"`
2. **Push to HF Spaces**: `git push`
3. **Monitor logs**: Check that download succeeds without permission errors
4. **Test**: Click the game area - should work now!
## ๐Ÿ” **Log Messages to Look For:**
### โœ… **Success:**
```
INFO:app:Loading state dict from https://huggingface.co/Etadingrui/diamond-1B/resolve/main/agent_epoch_00003.pt
INFO:app:State dict loaded, applying to agent...
INFO:app:Model has actor_critic weights: False
INFO:app:Actor-critic model exists but has no trained weights - using dummy mode!
INFO:app:WebPlayEnv set to human control mode (no trained weights)
INFO:app:Models initialized successfully!
```
### โŒ **If Still Failing:**
```
ERROR:app:Failed to load model: [Errno 13] Permission denied
```
## ๐ŸŽฏ **What This Fixes:**
1. โœ… **Model downloading** - now uses writable `/tmp` directory
2. โœ… **Environment variables** - OMP_NUM_THREADS is valid
3. โœ… **Game clicking** - works after model loads (even without actor_critic)
4. โœ… **HF Spaces compatibility** - follows container best practices
The app should now work perfectly on HF Spaces! ๐ŸŽ‰