Asankhaya Sharma's picture

Asankhaya Sharma

codelion

AI & ML interests

Creator of OptiLLM, OpenEvolve, Adaptive Classifier, and Ellora. Pioneering a new category in AI infrastructure: inference-time compute for LLMs.

Recent Activity

reacted to their post with 🤗 about 14 hours ago
MARS Achieves Strong Results on Google DeepMind's IMO-Bench We evaluated OptiLLM's MARS (Multi-Agent Reasoning System) approach on IMO-Bench, Google DeepMind's challenging mathematical reasoning benchmark with International Mathematical Olympiad-level problems. What is MARS? MARS is a multi-agent reasoning technique that works with any LLM. It uses 3 parallel reasoning agents that independently solve problems, then verifies their solutions through consensus and iterative refinement. The key advantage: it's model-agnostic and can be applied to any base model through OptiLLM's inference proxy. Results on IMO-Bench: AnswerBench (400 short-answer problems): MARS: 36.0% (144/400 correct) Baseline: 24.5% (98/400 correct) Improvement: +11.5pp across all domains Category breakdown: - Algebra: 33% (vs 21% baseline) - Combinatorics: 26% (vs 19% baseline) - Geometry: 43% (vs 28% baseline) - Number Theory: 42% (vs 30% baseline) ProofBench (60 proof construction problems): MARS: 26.7% (16/60 correct) Baseline: 18.3% (11/60 correct) Improvement: +8.4pp Category breakdown: - Number Theory: 42.9% (vs 14.3% baseline) - Combinatorics: 37.5% (vs 31.2% baseline) - Algebra: 18.8% (vs 25.0% baseline) - Geometry: 7.1% (vs 0.0% baseline) All results achieved using google/gemini-2.5-flash-lite-preview-09-2025 as the base model. The same MARS approach can enhance reasoning for any model through OptiLLM's OpenAI-compatible API. Datasets available at: AnswerBench: huggingface.co/datasets/Hwilner/imo-answerbench ProofBench: huggingface.co/datasets/Hwilner/imo-proofbench Try it yourself: python optillm.py --approach mars --model google/gemini-2.5-flash-lite-preview-09-2025 Or via API with approach prefix: model: "mars-google/gemini-2.5-flash-lite-preview-09-2025" Full evaluation code and results available at: github.com/algorithmicsuperintelligence/optillm
reacted to their post with 🤗 about 14 hours ago
MARS Achieves Strong Results on Google DeepMind's IMO-Bench We evaluated OptiLLM's MARS (Multi-Agent Reasoning System) approach on IMO-Bench, Google DeepMind's challenging mathematical reasoning benchmark with International Mathematical Olympiad-level problems. What is MARS? MARS is a multi-agent reasoning technique that works with any LLM. It uses 3 parallel reasoning agents that independently solve problems, then verifies their solutions through consensus and iterative refinement. The key advantage: it's model-agnostic and can be applied to any base model through OptiLLM's inference proxy. Results on IMO-Bench: AnswerBench (400 short-answer problems): MARS: 36.0% (144/400 correct) Baseline: 24.5% (98/400 correct) Improvement: +11.5pp across all domains Category breakdown: - Algebra: 33% (vs 21% baseline) - Combinatorics: 26% (vs 19% baseline) - Geometry: 43% (vs 28% baseline) - Number Theory: 42% (vs 30% baseline) ProofBench (60 proof construction problems): MARS: 26.7% (16/60 correct) Baseline: 18.3% (11/60 correct) Improvement: +8.4pp Category breakdown: - Number Theory: 42.9% (vs 14.3% baseline) - Combinatorics: 37.5% (vs 31.2% baseline) - Algebra: 18.8% (vs 25.0% baseline) - Geometry: 7.1% (vs 0.0% baseline) All results achieved using google/gemini-2.5-flash-lite-preview-09-2025 as the base model. The same MARS approach can enhance reasoning for any model through OptiLLM's OpenAI-compatible API. Datasets available at: AnswerBench: huggingface.co/datasets/Hwilner/imo-answerbench ProofBench: huggingface.co/datasets/Hwilner/imo-proofbench Try it yourself: python optillm.py --approach mars --model google/gemini-2.5-flash-lite-preview-09-2025 Or via API with approach prefix: model: "mars-google/gemini-2.5-flash-lite-preview-09-2025" Full evaluation code and results available at: github.com/algorithmicsuperintelligence/optillm
reacted to their post with ➕ about 14 hours ago
MARS Achieves Strong Results on Google DeepMind's IMO-Bench We evaluated OptiLLM's MARS (Multi-Agent Reasoning System) approach on IMO-Bench, Google DeepMind's challenging mathematical reasoning benchmark with International Mathematical Olympiad-level problems. What is MARS? MARS is a multi-agent reasoning technique that works with any LLM. It uses 3 parallel reasoning agents that independently solve problems, then verifies their solutions through consensus and iterative refinement. The key advantage: it's model-agnostic and can be applied to any base model through OptiLLM's inference proxy. Results on IMO-Bench: AnswerBench (400 short-answer problems): MARS: 36.0% (144/400 correct) Baseline: 24.5% (98/400 correct) Improvement: +11.5pp across all domains Category breakdown: - Algebra: 33% (vs 21% baseline) - Combinatorics: 26% (vs 19% baseline) - Geometry: 43% (vs 28% baseline) - Number Theory: 42% (vs 30% baseline) ProofBench (60 proof construction problems): MARS: 26.7% (16/60 correct) Baseline: 18.3% (11/60 correct) Improvement: +8.4pp Category breakdown: - Number Theory: 42.9% (vs 14.3% baseline) - Combinatorics: 37.5% (vs 31.2% baseline) - Algebra: 18.8% (vs 25.0% baseline) - Geometry: 7.1% (vs 0.0% baseline) All results achieved using google/gemini-2.5-flash-lite-preview-09-2025 as the base model. The same MARS approach can enhance reasoning for any model through OptiLLM's OpenAI-compatible API. Datasets available at: AnswerBench: huggingface.co/datasets/Hwilner/imo-answerbench ProofBench: huggingface.co/datasets/Hwilner/imo-proofbench Try it yourself: python optillm.py --approach mars --model google/gemini-2.5-flash-lite-preview-09-2025 Or via API with approach prefix: model: "mars-google/gemini-2.5-flash-lite-preview-09-2025" Full evaluation code and results available at: github.com/algorithmicsuperintelligence/optillm
View all activity

Organizations

meraGPT's profile picture Lambda Security's profile picture National University of Singapore's profile picture Patched's profile picture ZeroGPU Explorers's profile picture MLX Community's profile picture Social Post Explorers's profile picture Hugging Face Discord Community's profile picture Dria's profile picture Adaptive Classifier's profile picture Reasoning datasets competition 's profile picture Cerebras Hugging Face Hackathon's profile picture LeRobot Worldwide Hackathon's profile picture Hugging Face MCP Course's profile picture Agents-MCP-Hackathon's profile picture Hugging Science's profile picture MCP-1st-Birthday's profile picture Algorithmic SuperIntelligence Labs's profile picture