Spaces:

Etadingrui
/

PIWM

Sleeping

App Files Files Community

PIWM / HF_SPACES_CACHE_FIX.md

musictimer

Fix initial bugs

02c6351 3 months ago

preview code

raw

history blame contribute delete

2.79 kB

	# 🔧 HF Spaces Cache Permission Fix

	## ❌ Problem:
	```
	ERROR:app:Failed to load model: [Errno 13] Permission denied: '/.cache'
	```

	HF Spaces containers can't write to the root `/.cache` directory, causing model downloads to fail.

	## ✅ Solution Applied:

	### 1. Fixed Cache Directory in app.py
	- ✅ Set custom cache directory: `/tmp/torch_cache`
	- ✅ Added proper permissions handling
	- ✅ Fixed OMP_NUM_THREADS environment variable issue

	### 2. Updated Dockerfile
	- ✅ Set environment variables to use `/tmp` for caches
	- ✅ Pre-create cache directories
	- ✅ Fixed OMP_NUM_THREADS value

	### 3. Key Changes Made:

	#### app.py Changes:
	```python
	# Fixed cache directory for torch.hub
	state_dict = torch.hub.load_state_dict_from_url(
	model_url,
	map_location=device,
	model_dir=cache_dir, # Custom cache dir
	check_hash=False # Skip hash check for speed
	)

	# Fixed environment variables
	os.environ["OMP_NUM_THREADS"] = "2" # Valid integer
	os.environ["TORCH_HOME"] = "/tmp/torch"
	os.environ["HF_HOME"] = "/tmp/huggingface"
	```

	#### Dockerfile Changes:
	```dockerfile
	ENV OMP_NUM_THREADS=2
	ENV TORCH_HOME=/tmp/torch
	ENV HF_HOME=/tmp/huggingface
	ENV TRANSFORMERS_CACHE=/tmp/transformers

	RUN mkdir -p /tmp/torch /tmp/huggingface /tmp/transformers
	```

	## 🚀 Expected Results:
	- ✅ No more "Permission denied: /.cache" errors
	- ✅ No more "Invalid value for environment variable OMP_NUM_THREADS" warnings
	- ✅ Model downloads work properly on HF Spaces
	- ✅ App starts correctly and clicking works

	## 📋 To Deploy:
	1. Commit the changes: `git add . && git commit -m "Fix HF Spaces cache permissions"`
	2. Push to HF Spaces: `git push`
	3. Monitor logs: Check that download succeeds without permission errors
	4. Test: Click the game area - should work now!

	## 🔍 Log Messages to Look For:
	### ✅ Success:
	```
	INFO:app:Loading state dict from https://huggingface.co/Etadingrui/diamond-1B/resolve/main/agent_epoch_00003.pt
	INFO:app:State dict loaded, applying to agent...
	INFO:app:Model has actor_critic weights: False
	INFO:app:Actor-critic model exists but has no trained weights - using dummy mode!
	INFO:app:WebPlayEnv set to human control mode (no trained weights)
	INFO:app:Models initialized successfully!
	```

	### ❌ If Still Failing:
	```
	ERROR:app:Failed to load model: [Errno 13] Permission denied
	```

	## 🎯 What This Fixes:
	1. ✅ Model downloading - now uses writable `/tmp` directory
	2. ✅ Environment variables - OMP_NUM_THREADS is valid
	3. ✅ Game clicking - works after model loads (even without actor_critic)
	4. ✅ HF Spaces compatibility - follows container best practices

	The app should now work perfectly on HF Spaces! 🎉