Spaces:

Synthesius
/

Cloudzy

Sleeping

App Files Files Community

GitHub Actions commited on Oct 27

Commit

3e8073f

1 Parent(s): 0d29386

🚀 Deploy embedder from GitHub Actions - 2025-10-27 22:54:05

Browse files

Files changed (3) hide show

README.md +276 -0
embedder.py +61 -1
requirements.txt +6 -6

README.md ADDED Viewed

	@@ -0,0 +1,276 @@

+---
+title: MobileCLIP2 Embedder
+emoji: 🖼️
+colorFrom: blue
+colorTo: purple
+sdk: docker
+app_port: 7860
+---
+# MobileCLIP2-S2 Embedding Service
+ONNX-optimized FastAPI service for generating 512-dimensional image embeddings using Apple's MobileCLIP2-S2.
+## Features
+- **Fast**: ONNX Runtime CPU optimizations
+- **Memory Efficient**: <2GB RAM footprint
+- **Batch Processing**: Up to 10 images per request
+- **RESTful API**: Simple HTTP endpoints
+## API Usage
+### Single Image
+```bash
+curl -X POST "https://YOUR_SPACE_URL/embed" \
+  -F "[email protected]"
+```
+**Response:**
+```json
+{
+  "embedding": [0.123, -0.456, ...],  // 512 floats
+  "model": "MobileCLIP-S2",
+  "inference_time_ms": 123.45
+}
+```
+### Batch Processing
+```bash
+curl -X POST "https://YOUR_SPACE_URL/embed/batch" \
+  -F "[email protected]" \
+  -F "[email protected]"
+```
+**Response:**
+```json
+{
+  "embeddings": [[0.123, ...], [0.456, ...]],
+  "count": 2,
+  "total_time_ms": 234.56,
+  "model": "MobileCLIP-S2"
+}
+```
+### Health Check
+```bash
+curl "https://YOUR_SPACE_URL/"
+```
+**Response:**
+```json
+{
+  "status": "healthy",
+  "model": "MobileCLIP-S2",
+  "device": "cpu",
+  "onnx_optimized": true
+}
+```
+### Model Info
+```bash
+curl "https://YOUR_SPACE_URL/info"
+```
+**Response:**
+```json
+{
+  "model": "MobileCLIP-S2",
+  "embedding_dim": 512,
+  "onnx_optimized": true,
+  "max_image_size_mb": 10,
+  "max_batch_size": 10,
+  "image_size": 256
+}
+```
+## Model Details
+- **Model**: MobileCLIP2-S2 (Apple)
+- **Paper**: [MobileCLIP2: Improving Multi-Modal Reinforced Training](http://arxiv.org/abs/2508.20691)
+- **Embedding Dimension**: 512
+- **Input Size**: 256×256
+- **Optimization**: ONNX Runtime CPU
+- **Normalization**: L2 normalized outputs
+## Local Development
+### Prerequisites
+- Python 3.11+
+- Docker & Docker Compose (optional)
+### Setup
+1. **Install dependencies for model conversion:**
+```bash
+cd huggingface_embedder
+pip install torch open_clip_torch ml-mobileclip
+```
+2. **Convert model to ONNX (one-time):**
+```bash
+python model_converter.py --output models
+```
+This will create:
+- `models/mobileclip_s2_visual.onnx` (ONNX model)
+- `models/preprocess_config.json` (preprocessing config)
+3. **Install runtime dependencies:**
+```bash
+pip install -r requirements.txt
+```
+4. **Run locally:**
+```bash
+uvicorn embedder:app --reload --port 7860
+```
+5. **Test the API:**
+```bash
+# Health check
+curl http://localhost:7860/
+# Generate embedding
+curl -X POST http://localhost:7860/embed \
+  -F "file=@test_image.jpg"
+```
+### Docker
+```bash
+# Build and run
+docker compose up
+# Test
+curl -X POST http://localhost:8001/embed \
+  -F "file=@test_image.jpg"
+```
+## HuggingFace Spaces Deployment
+### Initial Setup
+1. **Create new Space:**
+   - Go to https://huggingface.co/spaces
+   - Click "Create new Space"
+   - Select **Docker** as SDK
+   - Set app_port to **7860**
+2. **Add GitHub Secret:**
+   - Go to your GitHub repo Settings → Secrets
+   - Add `HUGGINGFACE_ACCESS_TOKEN` with your HF token
+3. **Deploy:**
+```bash
+# Just push to main branch!
+git push origin main
+```
+**That's it!** The model will be automatically downloaded from HuggingFace Hub (`apple/MobileCLIP-S2`) and converted to ONNX during the Docker build.
+The Space will automatically build and deploy (takes 5-10 minutes for first build).
+### Using GitHub Actions for Sync
+See [Managing Spaces with GitHub Actions](https://huggingface.co/docs/hub/spaces-github-actions) for automatic sync from your GitHub repo.
+## Performance
+### Metrics (CPU: 2 cores, 2GB RAM)
+- **Single Inference**: ~100-200ms
+- **Batch (10 images)**: ~800-1200ms
+- **Memory Usage**: <1.5GB
+- **Throughput**: ~6-10 images/second
+### Memory Optimization
+The ONNX model uses ~50-70% less RAM compared to PyTorch:
+- **PyTorch**: ~2.5GB RAM
+- **ONNX (FP32)**: ~800MB RAM
+- **ONNX (INT8)**: ~400MB RAM (use `--quantize` flag)
+## Error Handling
+| Status | Description |
+|--------|-------------|
+| 200 | Success |
+| 400 | Invalid file type or format |
+| 413 | File too large (>10MB) |
+| 500 | Inference error |
+## Limitations
+- **Max image size**: 10MB per file
+- **Max batch size**: 10 images per request
+- **Supported formats**: JPEG, PNG, WebP
+- **No GPU**: CPU-only inference (sufficient for most use cases)
+## Integration Example
+### Python
+```python
+import requests
+# Single image
+with open("photo.jpg", "rb") as f:
+    response = requests.post(
+        "https://YOUR_SPACE_URL/embed",
+        files={"file": f}
+    )
+embedding = response.json()["embedding"]
+print(f"Embedding shape: {len(embedding)}")  # 512
+```
+### JavaScript
+```javascript
+const formData = new FormData();
+formData.append('file', imageFile);
+const response = await fetch('https://YOUR_SPACE_URL/embed', {
+  method: 'POST',
+  body: formData
+});
+const data = await response.json();
+console.log('Embedding:', data.embedding);
+```
+## License
+- **Code**: MIT License
+- **Model**: [Apple AMLR License](https://huggingface.co/apple/MobileCLIP-S2)
+## Citation
+```bibtex
+@article{mobileclip2,
+  title={MobileCLIP2: Improving Multi-Modal Reinforced Training},
+  author={Faghri, Fartash and Vasu, Pavan Kumar Anasosalu and Koc, Cem and Shankar, Vaishaal and Toshev, Alexander T and Tuzel, Oncel and Pouransari, Hadi},
+  journal={Transactions on Machine Learning Research},
+  year={2025}
+}
+```
+## Support
+For issues or questions:
+- HuggingFace Spaces: https://huggingface.co/docs/hub/spaces
+- Model: https://huggingface.co/apple/MobileCLIP-S2
+- ONNX Runtime: https://onnxruntime.ai/

embedder.py CHANGED Viewed

@@ -10,7 +10,7 @@ from fastapi import FastAPI, File, UploadFile, HTTPException, status
 from fastapi.responses import JSONResponse
 from PIL import Image
 from pydantic import BaseModel, Field
-from open_clip import create_model_and_transforms
 from mobileclip.modules.common.mobileone import reparameterize_model
@@ -38,6 +38,19 @@ class BatchEmbeddingResponse(BaseModel):
     model: str
 class HealthResponse(BaseModel):
     """Health check response."""
     status: str
@@ -304,6 +317,53 @@ async def generate_batch_embeddings(files: List[UploadFile] = File(...)):
         )
 # --- Main ---
 if __name__ == "__main__":
     import uvicorn

 from fastapi.responses import JSONResponse
 from PIL import Image
 from pydantic import BaseModel, Field
+from open_clip import create_model_and_transforms, get_tokenizer
 from mobileclip.modules.common.mobileone import reparameterize_model
     model: str
+class TextEmbeddingRequest(BaseModel):
+    """Text embedding request."""
+    text: str = Field(..., min_length=1, max_length=1000)
+class TextEmbeddingResponse(BaseModel):
+    """Text embedding response."""
+    embedding: List[float] = Field(..., min_length=512, max_length=512)
+    model: str
+    inference_time_ms: float
+    text: str
 class HealthResponse(BaseModel):
     """Health check response."""
     status: str
         )
+@app.post("/embed/text", response_model=TextEmbeddingResponse)
+async def generate_text_embedding(request: TextEmbeddingRequest):
+    """
+    Generate embedding for text query.
+    Args:
+        request: Text to embed
+    Returns:
+        512-dimensional embedding for the text
+    Raises:
+        500: Inference error
+    """
+    start_time = time.time()
+    try:
+        # Tokenize text
+        tokenizer = get_tokenizer(MODEL_NAME)
+        text_tokens = tokenizer([request.text])
+        text_tokens = text_tokens.to(device)
+        # Run inference
+        with torch.no_grad():
+            text_embedding = model.encode_text(text_tokens)
+            text_embedding = normalize_embedding(text_embedding)
+        # Convert to numpy and then to list
+        embedding = text_embedding.cpu().numpy()[0]
+        # Calculate time
+        inference_time = (time.time() - start_time) * 1000
+        return TextEmbeddingResponse(
+            embedding=embedding.tolist(),
+            model=MODEL_NAME,
+            inference_time_ms=round(inference_time, 2),
+            text=request.text
+        )
+    except Exception as e:
+        raise HTTPException(
+            status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
+            detail=f"Text inference failed: {str(e)}"
+        )
 # --- Main ---
 if __name__ == "__main__":
     import uvicorn

requirements.txt CHANGED Viewed

@@ -1,9 +1,9 @@
-fastapi
-uvicorn[standard]
-python-multipart
-pillow
-numpy
-pydantic
 torch
 open_clip_torch
 ml-mobileclip @ git+https://github.com/apple/ml-mobileclip.git

+fastapi==0.120.1
+uvicorn[standard]==0.38.0
+python-multipart==0.0.20
+pillow==12.0.0
+numpy==2.3.4
+pydantic==2.12.3
 torch
 open_clip_torch
 ml-mobileclip @ git+https://github.com/apple/ml-mobileclip.git