GitHub Actions commited on
Commit
3e8073f
·
1 Parent(s): 0d29386

🚀 Deploy embedder from GitHub Actions - 2025-10-27 22:54:05

Browse files
Files changed (3) hide show
  1. README.md +276 -0
  2. embedder.py +61 -1
  3. requirements.txt +6 -6
README.md ADDED
@@ -0,0 +1,276 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ title: MobileCLIP2 Embedder
3
+ emoji: 🖼️
4
+ colorFrom: blue
5
+ colorTo: purple
6
+ sdk: docker
7
+ app_port: 7860
8
+ ---
9
+
10
+ # MobileCLIP2-S2 Embedding Service
11
+
12
+ ONNX-optimized FastAPI service for generating 512-dimensional image embeddings using Apple's MobileCLIP2-S2.
13
+
14
+ ## Features
15
+
16
+ - **Fast**: ONNX Runtime CPU optimizations
17
+ - **Memory Efficient**: <2GB RAM footprint
18
+ - **Batch Processing**: Up to 10 images per request
19
+ - **RESTful API**: Simple HTTP endpoints
20
+
21
+ ## API Usage
22
+
23
+ ### Single Image
24
+
25
+ ```bash
26
+ curl -X POST "https://YOUR_SPACE_URL/embed" \
27
28
+ ```
29
+
30
+ **Response:**
31
+ ```json
32
+ {
33
+ "embedding": [0.123, -0.456, ...], // 512 floats
34
+ "model": "MobileCLIP-S2",
35
+ "inference_time_ms": 123.45
36
+ }
37
+ ```
38
+
39
+ ### Batch Processing
40
+
41
+ ```bash
42
+ curl -X POST "https://YOUR_SPACE_URL/embed/batch" \
43
44
45
+ ```
46
+
47
+ **Response:**
48
+ ```json
49
+ {
50
+ "embeddings": [[0.123, ...], [0.456, ...]],
51
+ "count": 2,
52
+ "total_time_ms": 234.56,
53
+ "model": "MobileCLIP-S2"
54
+ }
55
+ ```
56
+
57
+ ### Health Check
58
+
59
+ ```bash
60
+ curl "https://YOUR_SPACE_URL/"
61
+ ```
62
+
63
+ **Response:**
64
+ ```json
65
+ {
66
+ "status": "healthy",
67
+ "model": "MobileCLIP-S2",
68
+ "device": "cpu",
69
+ "onnx_optimized": true
70
+ }
71
+ ```
72
+
73
+ ### Model Info
74
+
75
+ ```bash
76
+ curl "https://YOUR_SPACE_URL/info"
77
+ ```
78
+
79
+ **Response:**
80
+ ```json
81
+ {
82
+ "model": "MobileCLIP-S2",
83
+ "embedding_dim": 512,
84
+ "onnx_optimized": true,
85
+ "max_image_size_mb": 10,
86
+ "max_batch_size": 10,
87
+ "image_size": 256
88
+ }
89
+ ```
90
+
91
+ ## Model Details
92
+
93
+ - **Model**: MobileCLIP2-S2 (Apple)
94
+ - **Paper**: [MobileCLIP2: Improving Multi-Modal Reinforced Training](http://arxiv.org/abs/2508.20691)
95
+ - **Embedding Dimension**: 512
96
+ - **Input Size**: 256×256
97
+ - **Optimization**: ONNX Runtime CPU
98
+ - **Normalization**: L2 normalized outputs
99
+
100
+ ## Local Development
101
+
102
+ ### Prerequisites
103
+
104
+ - Python 3.11+
105
+ - Docker & Docker Compose (optional)
106
+
107
+ ### Setup
108
+
109
+ 1. **Install dependencies for model conversion:**
110
+
111
+ ```bash
112
+ cd huggingface_embedder
113
+ pip install torch open_clip_torch ml-mobileclip
114
+ ```
115
+
116
+ 2. **Convert model to ONNX (one-time):**
117
+
118
+ ```bash
119
+ python model_converter.py --output models
120
+ ```
121
+
122
+ This will create:
123
+ - `models/mobileclip_s2_visual.onnx` (ONNX model)
124
+ - `models/preprocess_config.json` (preprocessing config)
125
+
126
+ 3. **Install runtime dependencies:**
127
+
128
+ ```bash
129
+ pip install -r requirements.txt
130
+ ```
131
+
132
+ 4. **Run locally:**
133
+
134
+ ```bash
135
+ uvicorn embedder:app --reload --port 7860
136
+ ```
137
+
138
+ 5. **Test the API:**
139
+
140
+ ```bash
141
+ # Health check
142
+ curl http://localhost:7860/
143
+
144
+ # Generate embedding
145
+ curl -X POST http://localhost:7860/embed \
146
+ -F "file=@test_image.jpg"
147
+ ```
148
+
149
+ ### Docker
150
+
151
+ ```bash
152
+ # Build and run
153
+ docker compose up
154
+
155
+ # Test
156
+ curl -X POST http://localhost:8001/embed \
157
+ -F "file=@test_image.jpg"
158
+ ```
159
+
160
+ ## HuggingFace Spaces Deployment
161
+
162
+ ### Initial Setup
163
+
164
+ 1. **Create new Space:**
165
+ - Go to https://huggingface.co/spaces
166
+ - Click "Create new Space"
167
+ - Select **Docker** as SDK
168
+ - Set app_port to **7860**
169
+
170
+ 2. **Add GitHub Secret:**
171
+ - Go to your GitHub repo Settings → Secrets
172
+ - Add `HUGGINGFACE_ACCESS_TOKEN` with your HF token
173
+
174
+ 3. **Deploy:**
175
+
176
+ ```bash
177
+ # Just push to main branch!
178
+ git push origin main
179
+ ```
180
+
181
+ **That's it!** The model will be automatically downloaded from HuggingFace Hub (`apple/MobileCLIP-S2`) and converted to ONNX during the Docker build.
182
+
183
+ The Space will automatically build and deploy (takes 5-10 minutes for first build).
184
+
185
+ ### Using GitHub Actions for Sync
186
+
187
+ See [Managing Spaces with GitHub Actions](https://huggingface.co/docs/hub/spaces-github-actions) for automatic sync from your GitHub repo.
188
+
189
+ ## Performance
190
+
191
+ ### Metrics (CPU: 2 cores, 2GB RAM)
192
+
193
+ - **Single Inference**: ~100-200ms
194
+ - **Batch (10 images)**: ~800-1200ms
195
+ - **Memory Usage**: <1.5GB
196
+ - **Throughput**: ~6-10 images/second
197
+
198
+ ### Memory Optimization
199
+
200
+ The ONNX model uses ~50-70% less RAM compared to PyTorch:
201
+
202
+ - **PyTorch**: ~2.5GB RAM
203
+ - **ONNX (FP32)**: ~800MB RAM
204
+ - **ONNX (INT8)**: ~400MB RAM (use `--quantize` flag)
205
+
206
+ ## Error Handling
207
+
208
+ | Status | Description |
209
+ |--------|-------------|
210
+ | 200 | Success |
211
+ | 400 | Invalid file type or format |
212
+ | 413 | File too large (>10MB) |
213
+ | 500 | Inference error |
214
+
215
+ ## Limitations
216
+
217
+ - **Max image size**: 10MB per file
218
+ - **Max batch size**: 10 images per request
219
+ - **Supported formats**: JPEG, PNG, WebP
220
+ - **No GPU**: CPU-only inference (sufficient for most use cases)
221
+
222
+ ## Integration Example
223
+
224
+ ### Python
225
+
226
+ ```python
227
+ import requests
228
+
229
+ # Single image
230
+ with open("photo.jpg", "rb") as f:
231
+ response = requests.post(
232
+ "https://YOUR_SPACE_URL/embed",
233
+ files={"file": f}
234
+ )
235
+
236
+ embedding = response.json()["embedding"]
237
+ print(f"Embedding shape: {len(embedding)}") # 512
238
+ ```
239
+
240
+ ### JavaScript
241
+
242
+ ```javascript
243
+ const formData = new FormData();
244
+ formData.append('file', imageFile);
245
+
246
+ const response = await fetch('https://YOUR_SPACE_URL/embed', {
247
+ method: 'POST',
248
+ body: formData
249
+ });
250
+
251
+ const data = await response.json();
252
+ console.log('Embedding:', data.embedding);
253
+ ```
254
+
255
+ ## License
256
+
257
+ - **Code**: MIT License
258
+ - **Model**: [Apple AMLR License](https://huggingface.co/apple/MobileCLIP-S2)
259
+
260
+ ## Citation
261
+
262
+ ```bibtex
263
+ @article{mobileclip2,
264
+ title={MobileCLIP2: Improving Multi-Modal Reinforced Training},
265
+ author={Faghri, Fartash and Vasu, Pavan Kumar Anasosalu and Koc, Cem and Shankar, Vaishaal and Toshev, Alexander T and Tuzel, Oncel and Pouransari, Hadi},
266
+ journal={Transactions on Machine Learning Research},
267
+ year={2025}
268
+ }
269
+ ```
270
+
271
+ ## Support
272
+
273
+ For issues or questions:
274
+ - HuggingFace Spaces: https://huggingface.co/docs/hub/spaces
275
+ - Model: https://huggingface.co/apple/MobileCLIP-S2
276
+ - ONNX Runtime: https://onnxruntime.ai/
embedder.py CHANGED
@@ -10,7 +10,7 @@ from fastapi import FastAPI, File, UploadFile, HTTPException, status
10
  from fastapi.responses import JSONResponse
11
  from PIL import Image
12
  from pydantic import BaseModel, Field
13
- from open_clip import create_model_and_transforms
14
  from mobileclip.modules.common.mobileone import reparameterize_model
15
 
16
 
@@ -38,6 +38,19 @@ class BatchEmbeddingResponse(BaseModel):
38
  model: str
39
 
40
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41
  class HealthResponse(BaseModel):
42
  """Health check response."""
43
  status: str
@@ -304,6 +317,53 @@ async def generate_batch_embeddings(files: List[UploadFile] = File(...)):
304
  )
305
 
306
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
307
  # --- Main ---
308
  if __name__ == "__main__":
309
  import uvicorn
 
10
  from fastapi.responses import JSONResponse
11
  from PIL import Image
12
  from pydantic import BaseModel, Field
13
+ from open_clip import create_model_and_transforms, get_tokenizer
14
  from mobileclip.modules.common.mobileone import reparameterize_model
15
 
16
 
 
38
  model: str
39
 
40
 
41
+ class TextEmbeddingRequest(BaseModel):
42
+ """Text embedding request."""
43
+ text: str = Field(..., min_length=1, max_length=1000)
44
+
45
+
46
+ class TextEmbeddingResponse(BaseModel):
47
+ """Text embedding response."""
48
+ embedding: List[float] = Field(..., min_length=512, max_length=512)
49
+ model: str
50
+ inference_time_ms: float
51
+ text: str
52
+
53
+
54
  class HealthResponse(BaseModel):
55
  """Health check response."""
56
  status: str
 
317
  )
318
 
319
 
320
+ @app.post("/embed/text", response_model=TextEmbeddingResponse)
321
+ async def generate_text_embedding(request: TextEmbeddingRequest):
322
+ """
323
+ Generate embedding for text query.
324
+
325
+ Args:
326
+ request: Text to embed
327
+
328
+ Returns:
329
+ 512-dimensional embedding for the text
330
+
331
+ Raises:
332
+ 500: Inference error
333
+ """
334
+ start_time = time.time()
335
+
336
+ try:
337
+ # Tokenize text
338
+ tokenizer = get_tokenizer(MODEL_NAME)
339
+ text_tokens = tokenizer([request.text])
340
+ text_tokens = text_tokens.to(device)
341
+
342
+ # Run inference
343
+ with torch.no_grad():
344
+ text_embedding = model.encode_text(text_tokens)
345
+ text_embedding = normalize_embedding(text_embedding)
346
+
347
+ # Convert to numpy and then to list
348
+ embedding = text_embedding.cpu().numpy()[0]
349
+
350
+ # Calculate time
351
+ inference_time = (time.time() - start_time) * 1000
352
+
353
+ return TextEmbeddingResponse(
354
+ embedding=embedding.tolist(),
355
+ model=MODEL_NAME,
356
+ inference_time_ms=round(inference_time, 2),
357
+ text=request.text
358
+ )
359
+
360
+ except Exception as e:
361
+ raise HTTPException(
362
+ status_code=status.HTTP_500_INTERNAL_SERVER_ERROR,
363
+ detail=f"Text inference failed: {str(e)}"
364
+ )
365
+
366
+
367
  # --- Main ---
368
  if __name__ == "__main__":
369
  import uvicorn
requirements.txt CHANGED
@@ -1,9 +1,9 @@
1
- fastapi
2
- uvicorn[standard]
3
- python-multipart
4
- pillow
5
- numpy
6
- pydantic
7
  torch
8
  open_clip_torch
9
  ml-mobileclip @ git+https://github.com/apple/ml-mobileclip.git
 
1
+ fastapi==0.120.1
2
+ uvicorn[standard]==0.38.0
3
+ python-multipart==0.0.20
4
+ pillow==12.0.0
5
+ numpy==2.3.4
6
+ pydantic==2.12.3
7
  torch
8
  open_clip_torch
9
  ml-mobileclip @ git+https://github.com/apple/ml-mobileclip.git