AI & ML interests

None defined yet.

Recent Activity

Nymbo 
posted an update 9 days ago
view post
Post
450
I've added an 11th tool to the Nymbo/Tools MCP server, it's for your Obsidian_Vault. I'd argue it's far more context-efficient than any other Obsidian MCP I've seen, and doesn't require any plugins. Also some big improvements to the Web_Search and Web_Fetch tools.

# Obsidian_Vault Tool

It's basically a read-only version of the File_System tool, but it works so well for navigating Obsidian without unnecessary context. It supports recursive (full-text) search across the entire vault, and supports offset so the agent can "scroll" through a document without re-consuming tokens.

Run the server locally and set the OBSIDIAN_VAULT_ROOT environment variable to your vault's root path. If you don't use Obsidian, this is perfectly usable as simply a read-only filesystem.

# Web_Search Improvements

The Web_Search tool previously just used DuckDuckGo as a backend search engine, but now it also supports Bing, Brave, Yahoo, and Wikipedia. Default engine is auto which provides results from all backends in recommended order. Still doesn't require any kind of API or auth for Web_Search.

There's also a new date filter to limit results to those created in the past day, week, month, or year. Oh, and uhh, SafeSearch is now off by default :)

# Web_Fetch Improvements

As context-efficient as the Markdown mode is for web browsing, sometimes it does lose important context in the conversion from HTML to Markdown. So I've added a new HTML mode to the Web_Fetch tool that basically executes a cURL request on the URL, returning the full HTML page if necessary.

# A Note on Claude Skills

I've been having fun with the new File_System and Shell_Command tools. Using Claude Skills doesn't currently work in the public HF space because of environment restrictions, but using Skills works perfectly well running locally.

Happy building ~
mrfakename 
posted an update 15 days ago
view post
Post
3263
Trained a model for emotion-controllable TTS based on MiMo audio on LAION's dataset.

Still very early and does have an issue with hallucinating but results seem pretty good so far, given that it is very early into the training run.

Will probably kick off a new run later with some settings tweaked.

Put up a demo here: mrfakename/EmoAct-MiMo

(Turn 🔊 on to hear audio samples)
·
merve 
posted an update 22 days ago
view post
Post
5217
deepseek-ai/DeepSeek-OCR is out! 🔥 my take ⤵️
> pretty insane it can parse and re-render charts in HTML
> it uses CLIP and SAM features concatenated, so better grounding
> very efficient per vision tokens/performance ratio
> covers 100 languages
·
Nymbo 
posted an update 24 days ago
view post
Post
1741
Two new tools added to the Nymbo/Tools MCP server, File_System and Shell_Exec. You can theoretically do basically anything with these two tools, and it should enable support for many Claude Skills.

GPT-5-Codex proves that for many cases, shell commands really are all you need, and Claude Skills seem to lean into this. The thing is, nothing about the design of Claude Skills actually restricts them to proprietary models!

# File_System

There's a new directory inside the repo called Filesystem, that's the agent's "root". It can perform the following actions : list, read, write, append, mkdir, move, copy, delete, info, help. It's able to keep this all within the scope of one tool call by making the Action field required and all other fields optional. Using a filesystem shouldn't require 15 different tools.

Files created in the public HF space live in the space's running container, and gets cleared when the space is restarted. When running the server locally, files are actually stored on disk.

# Shell_Exec

What good is a filesystem if you can't execute commands in that filesystem? This tool automatically detects if the server is running on Windows or Linux, and suggests using the appropriate shell (PowerShell/Bash). Both of these new tools require that the agent uses relative paths, rather than absolute paths. I could be convinced to back pedal on this.

# Closing Thoughts

The File_System and Shell_Exec tools aren't super polished yet, I'll continue to improve the agent's instructions and UX of using the new tools. Most of my testing was done with gpt-oss-20b and if it messes up, it gets the gist after one failed tool call. It should work perfectly fine for the GPU poor.
  • 1 reply
·
m-ric 
posted an update 28 days ago
view post
Post
546
Tokenization is one of the most important processes in AI - yet many would like to kill it 💀

What's tokenization? The neural networks inside LLMs actually only process numbers, not text: tokenization is the process that makes text readable for them, by converting sentences into lists of numbers.

➡️ For instance, "This is tokenization" would be split into "This | is | token | ization", then each of the parts (tokens) are converted to IDs according to a predefined mapping: for instance "ization" could map to id 2438.
Thus "This is tokenization" can become 1335 | 135 | 2980 | 2438 => now the model can process the sentence!

Most tokenizers today use pre-specified mappings called "vocabularies", generally built about the compression algorithme Byte-Pair Encoding (BPE) that learns from a big corpuses of texts an optimized split to efficiently encode any text from the same distribution into a list token IDs.

🤨 Now, these current tokenizers have flaws.
For instance, the rigidity of their mapping creates losses ; the prime example being that a tokenizer designed for English (thus optimized for tokens like "has", "been", "clock", etc) will not have the right tokens to approach Burmese, thus being terribly inefficient at it.

Many alternative approaches have emerged as a result: for instance "tokenizer-free tokenizers". One that I really liked was "entropy-based": it monitors the stream of text, and trigger a split whenever the entropy increases too much, i.e. when something "surprising" happens.

But this great article argues that tokenizers are a lesser evil. Read and decide for yourself!
https://huggingface.co/blog/catherinearnett/in-defense-of-tokenizers
Nymbo 
posted an update 29 days ago
view post
Post
1791
I've made some improvements to my custom Deep_Research tool in the Nymbo/Tools MCP server. I've added a second LLM process and it still takes less than 1 minute to complete!

The original version of my Deep_Research tool would basically dump up to 50 fetched webpages onto the Researcher model (Qwen3-235B), with only a little bit of context shown from each page.

# New "Filterer" Process

The new process includes another LLM call before the researcher process. The Filterer (also Qwen3-235B) gets the query summary and the original 50 pages with low context, and decides which pages are most relevant to the research topic. The Filterer then outputs the URLs to the relevant pages, which are then re-fetched (with more context) and sent to the Researcher.

# Researcher Context

The Researcher now gets only the relevant webpages, then begins writing the report. When testing with 50 initial results, the researcher would often end up with 10-20 results of relevant context.

It still takes less than a minute to accomplish everything, thanks entirely to Cerebras inference. It now takes about 35-45 seconds to complete once the tool is run.

It's also worth noting that both the Filterer and Researcher now are provided the current time/date before they see the content, reducing hallucinations caused by knowledge cutoffs.
m-ric 
posted an update about 1 month ago
view post
Post
4813
STOP EVERYTHING NOW - we might finally have a radical architecture improvement over Transformers!!! 🚨

A lone scientist just proposed Tiny Recursive Model (TRM), and it is literally the most impressive model that I've seen this year.

➡️ Tiny Recursive Model is 7M parameters
➡️ On ARC-AGI, it beats flagship models like Gemini-2.5-pro

Consider how wild this is: Gemini-2.5-pro must be over 10,000x bigger
and had 1,000 as many authors 😂 (Alexia is alone on the paper)

What's this sorcery?
In short: it's a very tiny Transformers, but it loops over itself at two different frequencies, updating two latent variables: one for the proposed answer and one for the reasoning.

@AlexiaJM started from the paper Hierarchical Reasoning Model, published a few months ago, that already showed breakthrough improvement on AGI for its small size (27M)

Hierarchical Reasoning Model had introduced one main feature:
🔎 Deep supervision
In their model, one part (here one layer) would run at high frequency, and another would be lower frequency, running only every n steps.

They had used a recurrent architecture, where these layers would repeat many times ; but to make it work they had to do many approximations, including not fully backpropagating the loss through all layers.

Alexia studied what was useful and what wasn't, and cleaned the architecture as follows :
Why use a recurrent architecture, when you can just make it a loop?
➡️ She made the network recursive, looping over itself

Why use 2 latent variables ?
➡️ She provides a crystal clear explanation : the one that changes frequently is the reasoning, the one that changes at low frequency is the proposed answer.
➡️ She runs ablation studies to validate that 2 is indeed optimal.

This new setup is a much more elegant way to process reasoning than generating huge chains of tokens as all flagship models currently do.

This might be the breakthrough we've been awaiting for so long!
·
christopher 
posted an update about 1 month ago
view post
Post
447
Something very cool is cooking at Lichess
  • 1 reply
·
Molbap 
posted an update about 1 month ago
view post
Post
3062
🚀 New blog: Maintain the unmaintainable – 1M+ Python LOC, 400+ models

How do you stop a million-line library built by thousands of contributors from collapsing under its own weight?
At 🤗 Transformers, we do it with explicit software-engineering tenets, principles that make the codebase hackable at scale.

🔍 Inside the post:
– One Model, One File: readability first — you can still open a modeling file and see the full logic, top to bottom.
– Modular Transformers: visible inheritance that cuts maintenance cost by ~15× while keeping models readable.
– Config-Driven Performance: FlashAttention, tensor parallelism, and attention scheduling are config-level features, not rewrites.

Written with @lysandre ,@pcuenq and @yonigozlan , this is a deep dive into how Transformers stays fast, open, and maintainable.

Read it here → transformers-community/Transformers-tenets
NeoPy 
posted an update about 1 month ago
view post
Post
2272
@lunarflu can you make me out from Hugging Face Discord Community? Because my old email discord was gone yet and all of my email gone too 😔

My Dead Account:
@Blane187
@Ryouko65777


Also delete the account if can 🙏
  • 2 replies
·
Nymbo 
posted an update about 1 month ago
view post
Post
641
I have a few Sora-2 invites - 15509N
  • 1 reply
·
merve 
posted an update about 2 months ago
view post
Post
6633
large AI labs open-sourced a ton of models last week 🔥
here's few picks, find even more here merve/sep-16-releases-68d13ea4c547f02f95842f05 🤝
> IBM released a new Docling model with 258M params based on Granite (A2.0) 📝 ibm-granite/granite-docling-258M
> Xiaomi released 7B audio LM with base and instruct variants (MIT) XiaomiMiMo/mimo-audio-68cc7202692c27dae881cce0
> DecartAI released Lucy Edit, open Nano Banana 🍌 (NC) decart-ai/Lucy-Edit-Dev
> OpenGVLab released a family of agentic computer use models (3B/7B/32B) with the dataset 💻 OpenGVLab/scalecua-68c912cf56f7ff4c8e034003
> Meituan Longcat released thinking version of LongCat-Flash 💭 meituan-longcat/LongCat-Flash-Thinking
  • 2 replies
·
Nymbo 
posted an update about 2 months ago
view post
Post
1071
There's now a custom Deep_Research tool in my Nymbo/Tools MCP server! TL;DR: The agent using the tools writes a summary of your requests and up to five DuckDuckGo searches (up to 50 results). Each of the webpages found in the searches are then fetched and given to our researcher (Qwen3-235B-A22B-Thinking-2507). The researcher sees the summary, searched queries, and fetched links, then writes a thorough research report. The agent using the tool provides the user with a summary of the report and a link to download research_report.txt. The researcher's instructions are similar to some leaked Perplexity sys prompts.

# Deep_Research Tool

It accomplishes everything in under a minute so it doesn't hit MCP's 60 second timeout, mostly thanks to Cerebras. The only thing required to make this work is a HF_READ_TOKEN for inference.

The Deep_Research tool could certainly be improved. It still needs some sort of mechanism for sorting URLs based on importance (I've got some ideas but I don't want it to be the responsibility of the agent using the tool). I'll probably add a second researcher to filter out the bad sources before inferencing the big researcher. I'm hellbent on keeping this all within the scope of one tool call.

# More Fetch/Web Search Improvements

The Search_DuckDuckGo tool has been further enhanced. It now allows the agent to browse through all pages of results. The results also now include published date (if detected). It also now supports every DDG search types! Default DDG search is called text, but it can also now search by news, images, videos, and books.

The Fetch_Webpage tool now specifies how much of the page has been truncated, and cursor index, allowing it to pickup where it left off without re-consuming tokens. The model can now also choose to strip CSS selectors to remove excess noise, and there's a new URL Scraper mode that only returns URLs found on the full page.

More to come soon ~
merve 
posted an update about 2 months ago
view post
Post
3282
IBM just released small swiss army knife for the document models: granite-docling-258M on Hugging Face 🔥

> not only a document converter but also can do document question answering, understand multiple languages 🤯
> best part: released with Apache 2.0 license 👏 use it with your commercial projects!
> it supports transformers, vLLM and MLX from the get-go! 🤗
> built on SigLIP2 & granite-165M

model: ibm-granite/granite-docling-258M
demo: ibm-granite/granite-docling-258m-demo 💗