AI & ML interests

None defined yet.

Recent Activity

prithivMLmods 
posted an update 1 day ago
view post
Post
1047
Introducing Photo-Mate-v2, based on FLUX.1-Kontext-dev, for advanced image manipulation tasks. It supports transforming scenes into top-down/bottom-up perspectives, CAM-right/left-view and its reverse, as well as general kontext-specified object removal. Below is the list of demos and adapters.🔥🤗

➤ Spaces [Demo] : prithivMLmods/Kontext-Photo-Mate-v2

Kontext-Adapters :
✦ Kontext-Bottom-Up-View: prithivMLmods/Kontext-Bottom-Up-View
✦ Kontext-CAM-Right-View: prithivMLmods/Kontext-CAM-Right-View
✦ Kontext-Top-Down-View: prithivMLmods/Kontext-Top-Down-View
✦ Kontext-CAM-Left-View: prithivMLmods/Kontext-CAM-Left-View
✦ Kontext-CAM-Right-View: prithivMLmods/Kontext-CAM-Right-View
✦ Kontext-Unblur-Upscale: prithivMLmods/Kontext-Unblur-Upscale
✦ Kontext-0811-exp: prithivMLmods/Kontext-0811-exp

Photo-Mate Collection:
✦ Kontext CAM Angles: https://huggingface.co/collections/prithivMLmods/kontext-cam-angles
✦ i2i - Kontext (exp): https://huggingface.co/collections/prithivMLmods/i2i-kontext-exp
✦ LZO-1 (Lossless Zoom Operator): https://huggingface.co/collections/prithivMLmods/lzo-1-lossless-zoom-operator

Related-Apps:
✦ Photo-Mate [Version 1.0]: prithivMLmods/Photo-Mate-i2i
✦ Image Generation Apps [Collection]: https://huggingface.co/collections/prithivMLmods/image-generation-apps-collection

To know more about it, visit the app page or the respective model page!
@prithivMLmods
AdinaY 
posted an update 3 days ago
view post
Post
2538
Kimi K2 Thinking is now live on the hub 🔥

moonshotai/Kimi-K2-Thinking

✨ 1T MoE for deep reasoning & tool use
✨ Native INT4 quantization = 2× faster inference
✨ 256K context window
✨ Modified MIT license
AdinaY 
posted an update 4 days ago
view post
Post
292
Chinese open source AI in October wasn’t about bigger models, it was about real world impact 🔥

https://huggingface.co/collections/zh-ai-community/october-2025-china-open-source-highlights

✨ Vision-Language & OCR wave 🌊
- DeepSeek-OCR : 3B
- PaddleOCR-VL : 0.9B
- Qwen3-VL : 2B / 4B / 8B / 32B /30B-A3B
- Open-Bee: Bee-8B-RL
- http://Z.ai Glyph :10B

OCR is industrializing, the real game now is understanding the (long context) document, not just reading it.

✨ Text generation: scale or innovation?
- MiniMax-M2: 229B
- Antgroup Ling-1T & Ring-1T
- Moonshot Kimi-Linear : linear-attention challenger
- Kwaipilot KAT-Dev

Efficiency is the key.

✨ Any-to-Any & World-Model : one step forward to the real world
- BAAI Emu 3.5
- Antgroup Ming-flash-omni
- HunyuanWorld-Mirror: 3D

Aligning with the “world model” globally

✨ Audio & Speech + Video & Visual: released from entertainment labs to delivery platforms
- SoulX-Podcast TTS
- LongCat-Audio-Codec & LongCat-Video by Meituan delivery paltform
- xiabs DreamOmni 2

Looking forward to what's next 🚀
prithivMLmods 
posted an update 6 days ago
view post
Post
1198
A week ago, I shared a post about the latest transformers test implementation of DeepSeek-OCR Compatibility (https://tinyurl.com/ykc4mm66). Now, I’m dropping the most compatible version of it to support the model with the latest transformers. 🤗🔥

➠ DeepSeek-OCR-Latest-BF16.I64: prithivMLmods/DeepSeek-OCR-Latest-BF16.I64
➠ DeepSeek OCR [exp] : prithivMLmods/DeepSeek-OCR-experimental

✅Supports the latest transformers v4.57.1
✅torch: 2.6.0+cu124 (or) the latest version (i.e., torch 2.9.0)
✅cuda version: 12.4
✅users can also opt out of specific attention implementations if desired.

✨Previous version: strangervisionhf/deepseek-ocr-latest-transformers
↗️Related Blog: https://huggingface.co/blog/prithivMLmods/multimodal-ocr-vlms
✨Community Page: strangervisionhf
✨Original Model Page: deepseek-ai/DeepSeek-OCR

To know more about it, visit the app page or the respective model page!
Nymbo 
posted an update 7 days ago
view post
Post
414
I've added an 11th tool to the Nymbo/Tools MCP server, it's for your Obsidian_Vault. I'd argue it's far more context-efficient than any other Obsidian MCP I've seen, and doesn't require any plugins. Also some big improvements to the Web_Search and Web_Fetch tools.

# Obsidian_Vault Tool

It's basically a read-only version of the File_System tool, but it works so well for navigating Obsidian without unnecessary context. It supports recursive (full-text) search across the entire vault, and supports offset so the agent can "scroll" through a document without re-consuming tokens.

Run the server locally and set the OBSIDIAN_VAULT_ROOT environment variable to your vault's root path. If you don't use Obsidian, this is perfectly usable as simply a read-only filesystem.

# Web_Search Improvements

The Web_Search tool previously just used DuckDuckGo as a backend search engine, but now it also supports Bing, Brave, Yahoo, and Wikipedia. Default engine is auto which provides results from all backends in recommended order. Still doesn't require any kind of API or auth for Web_Search.

There's also a new date filter to limit results to those created in the past day, week, month, or year. Oh, and uhh, SafeSearch is now off by default :)

# Web_Fetch Improvements

As context-efficient as the Markdown mode is for web browsing, sometimes it does lose important context in the conversion from HTML to Markdown. So I've added a new HTML mode to the Web_Fetch tool that basically executes a cURL request on the URL, returning the full HTML page if necessary.

# A Note on Claude Skills

I've been having fun with the new File_System and Shell_Command tools. Using Claude Skills doesn't currently work in the public HF space because of environment restrictions, but using Skills works perfectly well running locally.

Happy building ~
AdinaY 
posted an update 9 days ago
prithivMLmods 
posted an update 10 days ago
view post
Post
2532
A small blog post titled - Hall of Multimodal OCR VLMs and Demonstrations has been published on ↗️ https://huggingface.co/blog/prithivMLmods/multimodal-ocr-vlms on behalf of strangervisionhf

It discusses the latest trends in OCR models, the multilingual support offered by modern OCR systems, their unique capabilities, OCR benchmark model comparisons, transformer-based implementations, and strategies for streamlining transformers compatibility.