Asankhaya Sharma

codelion

http://asankhaya.github.io/

AI & ML interests

Creator of OptiLLM, OpenEvolve, Adaptive Classifier, and Ellora. Pioneering a new category in AI infrastructure: inference-time compute for LLMs.

Recent Activity

reacted to their post with 🤗 about 14 hours ago

MARS Achieves Strong Results on Google DeepMind's IMO-Bench We evaluated OptiLLM's MARS (Multi-Agent Reasoning System) approach on IMO-Bench, Google DeepMind's challenging mathematical reasoning benchmark with International Mathematical Olympiad-level problems. What is MARS? MARS is a multi-agent reasoning technique that works with any LLM. It uses 3 parallel reasoning agents that independently solve problems, then verifies their solutions through consensus and iterative refinement. The key advantage: it's model-agnostic and can be applied to any base model through OptiLLM's inference proxy. Results on IMO-Bench: AnswerBench (400 short-answer problems): MARS: 36.0% (144/400 correct) Baseline: 24.5% (98/400 correct) Improvement: +11.5pp across all domains Category breakdown: - Algebra: 33% (vs 21% baseline) - Combinatorics: 26% (vs 19% baseline) - Geometry: 43% (vs 28% baseline) - Number Theory: 42% (vs 30% baseline) ProofBench (60 proof construction problems): MARS: 26.7% (16/60 correct) Baseline: 18.3% (11/60 correct) Improvement: +8.4pp Category breakdown: - Number Theory: 42.9% (vs 14.3% baseline) - Combinatorics: 37.5% (vs 31.2% baseline) - Algebra: 18.8% (vs 25.0% baseline) - Geometry: 7.1% (vs 0.0% baseline) All results achieved using google/gemini-2.5-flash-lite-preview-09-2025 as the base model. The same MARS approach can enhance reasoning for any model through OptiLLM's OpenAI-compatible API. Datasets available at: AnswerBench: huggingface.co/datasets/Hwilner/imo-answerbench ProofBench: huggingface.co/datasets/Hwilner/imo-proofbench Try it yourself: python optillm.py --approach mars --model google/gemini-2.5-flash-lite-preview-09-2025 Or via API with approach prefix: model: "mars-google/gemini-2.5-flash-lite-preview-09-2025" Full evaluation code and results available at: github.com/algorithmicsuperintelligence/optillm

reacted to their post with 🤗 about 14 hours ago

reacted to their post with ➕ about 14 hours ago

View all activity

Organizations

Posts 35

Post

MARS Achieves Strong Results on Google DeepMind's IMO-Bench

We evaluated OptiLLM's MARS (Multi-Agent Reasoning System) approach on IMO-Bench, Google DeepMind's challenging mathematical reasoning benchmark with International Mathematical Olympiad-level problems.

What is MARS?

MARS is a multi-agent reasoning technique that works with any LLM. It uses 3 parallel reasoning agents that independently solve problems, then verifies their solutions through consensus and iterative refinement. The key advantage: it's model-agnostic and can be applied to any base model through OptiLLM's inference proxy.

Results on IMO-Bench:

AnswerBench (400 short-answer problems):
MARS: 36.0% (144/400 correct)
Baseline: 24.5% (98/400 correct)
Improvement: +11.5pp across all domains

Category breakdown:
- Algebra: 33% (vs 21% baseline)
- Combinatorics: 26% (vs 19% baseline)
- Geometry: 43% (vs 28% baseline)
- Number Theory: 42% (vs 30% baseline)

ProofBench (60 proof construction problems):
MARS: 26.7% (16/60 correct)
Baseline: 18.3% (11/60 correct)
Improvement: +8.4pp

Category breakdown:
- Number Theory: 42.9% (vs 14.3% baseline)
- Combinatorics: 37.5% (vs 31.2% baseline)
- Algebra: 18.8% (vs 25.0% baseline)
- Geometry: 7.1% (vs 0.0% baseline)

All results achieved using google/gemini-2.5-flash-lite-preview-09-2025 as the base model. The same MARS approach can enhance reasoning for any model through OptiLLM's OpenAI-compatible API.

Datasets available at:
AnswerBench: huggingface.co/datasets/Hwilner/imo-answerbench
ProofBench: huggingface.co/datasets/Hwilner/imo-proofbench

Try it yourself:

python optillm.py --approach mars --model google/gemini-2.5-flash-lite-preview-09-2025

Or via API with approach prefix:

model: "mars-google/gemini-2.5-flash-lite-preview-09-2025"

Full evaluation code and results available at: github.com/algorithmicsuperintelligence/optillm

View all Posts