A set of efficient, Python-based chat interfaces and agents powered by llama.cpp that focus on running quantized models (GGUF) locally.
Qwen2.5-Coder: Family of LLMs excels in code, debugging, etc
Gemma 3: Google's multimodal, multilingual, long context LLM