Spaces:
Sleeping
Sleeping
metadata
title: MobileCLIP2 Embedder
emoji: ๐ผ๏ธ
colorFrom: blue
colorTo: purple
sdk: docker
app_port: 7860
MobileCLIP2-S2 Embedding Service
ONNX-optimized FastAPI service for generating 512-dimensional image embeddings using Apple's MobileCLIP2-S2.
Features
- Fast: ONNX Runtime CPU optimizations
- Memory Efficient: <2GB RAM footprint
- Batch Processing: Up to 10 images per request
- RESTful API: Simple HTTP endpoints
API Usage
Single Image
curl -X POST "https://YOUR_SPACE_URL/embed" \
-F "[email protected]"
Response:
{
"embedding": [0.123, -0.456, ...], // 512 floats
"model": "MobileCLIP-S2",
"inference_time_ms": 123.45
}
Batch Processing
curl -X POST "https://YOUR_SPACE_URL/embed/batch" \
-F "[email protected]" \
-F "[email protected]"
Response:
{
"embeddings": [[0.123, ...], [0.456, ...]],
"count": 2,
"total_time_ms": 234.56,
"model": "MobileCLIP-S2"
}
Health Check
curl "https://YOUR_SPACE_URL/"
Response:
{
"status": "healthy",
"model": "MobileCLIP-S2",
"device": "cpu",
"onnx_optimized": true
}
Model Info
curl "https://YOUR_SPACE_URL/info"
Response:
{
"model": "MobileCLIP-S2",
"embedding_dim": 512,
"onnx_optimized": true,
"max_image_size_mb": 10,
"max_batch_size": 10,
"image_size": 256
}
Model Details
- Model: MobileCLIP2-S2 (Apple)
- Paper: MobileCLIP2: Improving Multi-Modal Reinforced Training
- Embedding Dimension: 512
- Input Size: 256ร256
- Optimization: ONNX Runtime CPU
- Normalization: L2 normalized outputs
Local Development
Prerequisites
- Python 3.11+
- Docker & Docker Compose (optional)
Setup
- Install dependencies for model conversion:
cd huggingface_embedder
pip install torch open_clip_torch ml-mobileclip
- Convert model to ONNX (one-time):
python model_converter.py --output models
This will create:
models/mobileclip_s2_visual.onnx(ONNX model)models/preprocess_config.json(preprocessing config)
- Install runtime dependencies:
pip install -r requirements.txt
- Run locally:
uvicorn embedder:app --reload --port 7860
- Test the API:
# Health check
curl http://localhost:7860/
# Generate embedding
curl -X POST http://localhost:7860/embed \
-F "file=@test_image.jpg"
Docker
# Build and run
docker compose up
# Test
curl -X POST http://localhost:8001/embed \
-F "file=@test_image.jpg"
HuggingFace Spaces Deployment
Initial Setup
Create new Space:
- Go to https://huggingface.co/spaces
- Click "Create new Space"
- Select Docker as SDK
- Set app_port to 7860
Add GitHub Secret:
- Go to your GitHub repo Settings โ Secrets
- Add
HUGGINGFACE_ACCESS_TOKENwith your HF token
Deploy:
# Just push to main branch!
git push origin main
That's it! The model will be automatically downloaded from HuggingFace Hub (apple/MobileCLIP-S2) and converted to ONNX during the Docker build.
The Space will automatically build and deploy (takes 5-10 minutes for first build).
Using GitHub Actions for Sync
See Managing Spaces with GitHub Actions for automatic sync from your GitHub repo.
Performance
Metrics (CPU: 2 cores, 2GB RAM)
- Single Inference: ~100-200ms
- Batch (10 images): ~800-1200ms
- Memory Usage: <1.5GB
- Throughput: ~6-10 images/second
Memory Optimization
The ONNX model uses ~50-70% less RAM compared to PyTorch:
- PyTorch: ~2.5GB RAM
- ONNX (FP32): ~800MB RAM
- ONNX (INT8): ~400MB RAM (use
--quantizeflag)
Error Handling
| Status | Description |
|---|---|
| 200 | Success |
| 400 | Invalid file type or format |
| 413 | File too large (>10MB) |
| 500 | Inference error |
Limitations
- Max image size: 10MB per file
- Max batch size: 10 images per request
- Supported formats: JPEG, PNG, WebP
- No GPU: CPU-only inference (sufficient for most use cases)
Integration Example
Python
import requests
# Single image
with open("photo.jpg", "rb") as f:
response = requests.post(
"https://YOUR_SPACE_URL/embed",
files={"file": f}
)
embedding = response.json()["embedding"]
print(f"Embedding shape: {len(embedding)}") # 512
JavaScript
const formData = new FormData();
formData.append('file', imageFile);
const response = await fetch('https://YOUR_SPACE_URL/embed', {
method: 'POST',
body: formData
});
const data = await response.json();
console.log('Embedding:', data.embedding);
License
- Code: MIT License
- Model: Apple AMLR License
Citation
@article{mobileclip2,
title={MobileCLIP2: Improving Multi-Modal Reinforced Training},
author={Faghri, Fartash and Vasu, Pavan Kumar Anasosalu and Koc, Cem and Shankar, Vaishaal and Toshev, Alexander T and Tuzel, Oncel and Pouransari, Hadi},
journal={Transactions on Machine Learning Research},
year={2025}
}
Support
For issues or questions:
- HuggingFace Spaces: https://huggingface.co/docs/hub/spaces
- Model: https://huggingface.co/apple/MobileCLIP-S2
- ONNX Runtime: https://onnxruntime.ai/