In a Training Loop 🔄

winter.sci.dev

enzoescipy

https://www.winter-sci-dev.com/about/

AI & ML interests

NLP, Embedding

Recent Activity

updated a dataset 7 days ago

enzoescipy/finesse-benchmark-database

reacted to kostakoff's post with 🔥 27 days ago

My home lab for AI models - llmlaba v1 After I began learning MLOps I realized that I needed some kind of home lab, there are a lot of GPUs that I need to learn how to set up and test. So I spent some time to do a researching which platform I could buy or build. My requirements ware: - Limited budget - Power supply 1 kW or higher - Few PCIe slots to be able to install more than one gpu - Zero maintenance cost, I don't want spend a lot of time or money to maintain lab hardware, except for the GPUs I chose the Intel Mac Pro 7.1: - Prices on eBay acceptable - Excelent cooling - 1.4 kW power supply - 7 PCIe slots - Zero maintenance: I don't need to do anything with the Mac Pro hardware; it just works - Classic UEFI boot loader It requires a bit of OS preparation: 1. Install Ubuntu 24.04 (it works with the general PC ISO image) 2. Set up T2 drivers ```bash sudo apt install -y dkms linux-headers-$(uname -r) applesmc-t2 apple-bce lm-sensors ``` 3. Install t2fanrd to manually manage fans (/etc/t2fand.conf) https://wiki.t2linux.org/guides/fan/ 4. Fix PCIe BAR: add pci=realloc to GRUB_CMDLINE_LINUX_DEFAULT so the Linux kernel will properly initializes server GPUs without Graphics Output Protocol 5. Install NVIDIA GPU driver: ```bash sudo apt install nvidia-driver-570 ``` And it works! I was able to run server-grade Nvidia Tesla P100 (required DIY air duct), and consumer Nvidia Titan X, Titan V, GTX 1080 cards on the old Mac Pro 7.1 - even three in parallel. https://huggingface.co/llmlaba

reacted to sagar007's post with 🔥 about 2 months ago

🚀 I built a Multimodal Vision-Language Model from using Gemma-270M + CLIP! Just finished training my multimodal model on the full LLaVA-Instruct-150K dataset (157K samples) and wanted to share the results! 🔧 What I Built: A vision-language model that can understand images and answer questions about them, combining: - Google Gemma-3-270M (language) - OpenAI CLIP ViT-Large/14 (vision) - LoRA fine-tuning for efficiency 📊 Training Stats: - 157,712 training samples (full LLaVA dataset) - 3 epochs on A100 40GB - ~9 hours training time - Final loss: 1.333 training / 1.430 validation - Only 18.6M trainable params (3.4% of 539M total) 📈 https://huggingface.co/sagar007/multigemma Benchmark Results: - VQA Accuracy: 53.8% - Works great for: animal detection, room identification, scene understanding 🔗 **Try it yourself:** - 🤗 Model: https://huggingface.co/sagar007/multigemma - 🎮 Demo: https://huggingface.co/spaces/sagar007/Multimodal-Gemma - 💻 GitHub: https://github.com/sagar431/multimodal-gemma-270m Built with PyTorch Lightning + MLflow for experiment tracking. Full MLOps pipeline with CI/CD! Would love to hear your feedback! 🙏 #multimodal #gemma #clip #llava #vision-language #pytorch

View all activity

Organizations

updated a dataset 7 days ago

enzoescipy/finesse-benchmark-database

Viewer • Updated 7 days ago • 50k • 421

reacted to kostakoff's post with 🔥 27 days ago

Post

3326

sudo apt install -y dkms linux-headers-$(uname -r) applesmc-t2 apple-bce lm-sensors

3. Install t2fanrd to manually manage fans (/etc/t2fand.conf) https://wiki.t2linux.org/guides/fan/
4. Fix PCIe BAR: add pci=realloc to GRUB_CMDLINE_LINUX_DEFAULT so the Linux kernel will properly initializes server GPUs without Graphics Output Protocol
5. Install NVIDIA GPU driver:

sudo apt install nvidia-driver-570

And it works!
I was able to run server-grade Nvidia Tesla P100 (required DIY air duct), and consumer Nvidia Titan X, Titan V, GTX 1080 cards on the old Mac Pro 7.1 - even three in parallel.

llmlaba

3 replies

reacted to sagar007's post with 🔥 about 2 months ago

Post

4177

🚀 I built a Multimodal Vision-Language Model from using Gemma-270M + CLIP!

Just finished training my multimodal model on the full LLaVA-Instruct-150K dataset (157K samples) and wanted to share the results!

🔧 What I Built:
A vision-language model that can understand images and answer questions about them, combining:
- Google Gemma-3-270M (language)
- OpenAI CLIP ViT-Large/14 (vision)
- LoRA fine-tuning for efficiency

📊 Training Stats:
- 157,712 training samples (full LLaVA dataset)
- 3 epochs on A100 40GB
- ~9 hours training time
- Final loss: 1.333 training / 1.430 validation
- Only 18.6M trainable params (3.4% of 539M total)

📈 sagar007/multigemma
Benchmark Results:
- VQA Accuracy: 53.8%
- Works great for: animal detection, room identification, scene understanding

🔗 **Try it yourself:**
- 🤗 Model: sagar007/multigemma
- 🎮 Demo: https://huggingface.co/spaces/sagar007/Multimodal-Gemma
- 💻 GitHub: https://github.com/sagar431/multimodal-gemma-270m

Built with PyTorch Lightning + MLflow for experiment tracking. Full MLOps pipeline with CI/CD!

Would love to hear your feedback! 🙏

#multimodal #gemma #clip #llava #vision-language #pytorch