Sagar pallai's picture

1 2 6

Sagar pallai PRO

sagar007

MCP-1st-Birthday

·

AI & ML interests

LLM AND STABLE DIFFUSION

Recent Activity

replied to their post 40 minutes ago

🚀 I built a Multimodal Vision-Language Model from scratch using Gemma-270M + CLIP! Just finished training my multimodal model on the full LLaVA-Instruct-150K dataset (157K samples) and wanted to share the results! 🔧 What I Built: A vision-language model that can understand images and answer questions about them, combining: - Google Gemma-3-270M (language) - OpenAI CLIP ViT-Large/14 (vision) - LoRA fine-tuning for efficiency 📊 Training Stats: - 157,712 training samples (full LLaVA dataset) - 3 epochs on A100 40GB - ~9 hours training time - Final loss: 1.333 training / 1.430 validation - Only 18.6M trainable params (3.4% of 539M total) 📈 https://huggingface.co/sagar007/multigemma Benchmark Results: - VQA Accuracy: 53.8% - Works great for: animal detection, room identification, scene understanding 🔗 **Try it yourself:** - 🤗 Model: https://huggingface.co/sagar007/multigemma - 🎮 Demo: https://huggingface.co/spaces/sagar007/Multimodal-Gemma - 💻 GitHub: https://github.com/sagar431/multimodal-gemma-270m Built with PyTorch Lightning + MLflow for experiment tracking. Full MLOps pipeline with CI/CD! Would love to hear your feedback! 🙏 #multimodal #gemma #clip #llava #vision-language #pytorch

posted an update about 7 hours ago

🚀 I built a Multimodal Vision-Language Model from scratch using Gemma-270M + CLIP! Just finished training my multimodal model on the full LLaVA-Instruct-150K dataset (157K samples) and wanted to share the results! 🔧 What I Built: A vision-language model that can understand images and answer questions about them, combining: - Google Gemma-3-270M (language) - OpenAI CLIP ViT-Large/14 (vision) - LoRA fine-tuning for efficiency 📊 Training Stats: - 157,712 training samples (full LLaVA dataset) - 3 epochs on A100 40GB - ~9 hours training time - Final loss: 1.333 training / 1.430 validation - Only 18.6M trainable params (3.4% of 539M total) 📈 https://huggingface.co/sagar007/multigemma Benchmark Results: - VQA Accuracy: 53.8% - Works great for: animal detection, room identification, scene understanding 🔗 **Try it yourself:** - 🤗 Model: https://huggingface.co/sagar007/multigemma - 🎮 Demo: https://huggingface.co/spaces/sagar007/Multimodal-Gemma - 💻 GitHub: https://github.com/sagar431/multimodal-gemma-270m Built with PyTorch Lightning + MLflow for experiment tracking. Full MLOps pipeline with CI/CD! Would love to hear your feedback! 🙏 #multimodal #gemma #clip #llava #vision-language #pytorch

updated a model about 7 hours ago

sagar007/multigemma

View all activity

Organizations

sagar007 's models 15

sagar007/multigemma

Image-to-Text • Updated about 7 hours ago

sagar007/multimodal-gemma-270m-llava

Updated Sep 20, 2025

sagar007/Lava_phi

Image-to-Text • 1B • Updated Jan 2, 2025 • 2

sagar007/phi-1_5-finetuned

Updated Sep 23, 2024

sagar007/phi3.5_finetune

Updated Sep 3, 2024 • 1

sagar007/phi2_25k

Updated Sep 3, 2024 • 1

sagar007/phi2_finetune

Updated Sep 3, 2024 • 1

sagar007/nanoGPT

Updated Jun 13, 2024

sagar007/new_model-odia

Updated Dec 28, 2023

sagar007/mistral-finetuned-odia-knowledge

Updated Dec 28, 2023

sagar007/Odia_mistral_fine_tuning

Updated Dec 27, 2023

sagar007/mistral-finetuned-odiaknowledge

Updated Dec 27, 2023

sagar007/mistral-finetuned-odia-60k

Updated Dec 27, 2023

sagar007/mistral-finetuned-odia

Updated Dec 27, 2023

sagar007/mistral-finetuned-alpaca

Updated Dec 27, 2023