Hugging Face
Models
Datasets
Spaces
Community
Docs
Enterprise
Pricing
Log In
Sign Up
10.1
TFLOPS
1
2
6
Sagar pallai
PRO
sagar007
Follow
cdr6934's profile picture
rosko86's profile picture
pramananda's profile picture
23 followers
ยท
30 following
AI & ML interests
LLM AND STABLE DIFFUSION
Recent Activity
replied
to
their
post
40 minutes ago
๐ I built a Multimodal Vision-Language Model from scratch using Gemma-270M + CLIP! Just finished training my multimodal model on the full LLaVA-Instruct-150K dataset (157K samples) and wanted to share the results! ๐ง What I Built: A vision-language model that can understand images and answer questions about them, combining: - Google Gemma-3-270M (language) - OpenAI CLIP ViT-Large/14 (vision) - LoRA fine-tuning for efficiency ๐ Training Stats: - 157,712 training samples (full LLaVA dataset) - 3 epochs on A100 40GB - ~9 hours training time - Final loss: 1.333 training / 1.430 validation - Only 18.6M trainable params (3.4% of 539M total) ๐ https://huggingface.co/sagar007/multigemma Benchmark Results: - VQA Accuracy: 53.8% - Works great for: animal detection, room identification, scene understanding ๐ **Try it yourself:** - ๐ค Model: https://huggingface.co/sagar007/multigemma - ๐ฎ Demo: https://huggingface.co/spaces/sagar007/Multimodal-Gemma - ๐ป GitHub: https://github.com/sagar431/multimodal-gemma-270m Built with PyTorch Lightning + MLflow for experiment tracking. Full MLOps pipeline with CI/CD! Would love to hear your feedback! ๐ #multimodal #gemma #clip #llava #vision-language #pytorch
posted
an
update
about 7 hours ago
๐ I built a Multimodal Vision-Language Model from scratch using Gemma-270M + CLIP! Just finished training my multimodal model on the full LLaVA-Instruct-150K dataset (157K samples) and wanted to share the results! ๐ง What I Built: A vision-language model that can understand images and answer questions about them, combining: - Google Gemma-3-270M (language) - OpenAI CLIP ViT-Large/14 (vision) - LoRA fine-tuning for efficiency ๐ Training Stats: - 157,712 training samples (full LLaVA dataset) - 3 epochs on A100 40GB - ~9 hours training time - Final loss: 1.333 training / 1.430 validation - Only 18.6M trainable params (3.4% of 539M total) ๐ https://huggingface.co/sagar007/multigemma Benchmark Results: - VQA Accuracy: 53.8% - Works great for: animal detection, room identification, scene understanding ๐ **Try it yourself:** - ๐ค Model: https://huggingface.co/sagar007/multigemma - ๐ฎ Demo: https://huggingface.co/spaces/sagar007/Multimodal-Gemma - ๐ป GitHub: https://github.com/sagar431/multimodal-gemma-270m Built with PyTorch Lightning + MLflow for experiment tracking. Full MLOps pipeline with CI/CD! Would love to hear your feedback! ๐ #multimodal #gemma #clip #llava #vision-language #pytorch
updated
a model
about 7 hours ago
sagar007/multigemma
View all activity
Organizations
sagar007
's models
15
Sort:ย Recently updated
sagar007/multigemma
Image-to-Text
โข
Updated
about 7 hours ago
sagar007/multimodal-gemma-270m-llava
Updated
Sep 20, 2025
sagar007/Lava_phi
Image-to-Text
โข
1B
โข
Updated
Jan 2, 2025
โข
2
sagar007/phi-1_5-finetuned
Updated
Sep 23, 2024
sagar007/phi3.5_finetune
Updated
Sep 3, 2024
โข
1
sagar007/phi2_25k
Updated
Sep 3, 2024
โข
1
sagar007/phi2_finetune
Updated
Sep 3, 2024
โข
1
sagar007/nanoGPT
Updated
Jun 13, 2024
sagar007/new_model-odia
Updated
Dec 28, 2023
sagar007/mistral-finetuned-odia-knowledge
Updated
Dec 28, 2023
sagar007/Odia_mistral_fine_tuning
Updated
Dec 27, 2023
sagar007/mistral-finetuned-odiaknowledge
Updated
Dec 27, 2023
sagar007/mistral-finetuned-odia-60k
Updated
Dec 27, 2023
sagar007/mistral-finetuned-odia
Updated
Dec 27, 2023
sagar007/mistral-finetuned-alpaca
Updated
Dec 27, 2023