Model Card for AI71ai/Llama-agrillm-3.3-70B

AI71ai/Llama-agrillm-3.3-70B is an agriculture-focused Foundation Large Language Model (LLM) developed by ai71 with support from leading institutions in the agricultural sector. It is fine-tuned using LoRA on top of meta-llama/Llama-3.3-70B-Instruct.

This model is developed as part of the AgriLLM initiative, a multi-stakeholder collaboration involving the International Affairs Office at the UAE Presidential Court, the Gates Foundation, CGIAR, Embrapa, ECHO, FAO, IFAD, the World Bank, Digital Green, and other leading organizations in the agriculture domain.

The model is fine-tuned on agriculture-specific Q&A pairs to strengthen its ability to understand agricultural contexts, provide accurate agronomic guidance, and generate reliable, expert-aligned responses, while still preserving the broad capabilities of the underlying Llama 3.3-70B base model.

It is primarily designed for use within Retrieval-Augmented Generation (RAG) systems - where it can leverage external agricultural knowledge bases for accuracy and contextual grounding - rather than as a standalone model.

🚜 What is AgriLLM?

AgriLLM is an initiative to provide the global agriculture community with open, foundation AI building blocks that support wider AI adoption and help close the information gap faced by smallholder farmers and agricultural professionals worldwide.

As part of the initiative, four open-source public goods will be released:

A set of fine-tuned LLMs specialized for agriculture
The supervised training dataset of agriculture-focused Q&A pairs used for fine-tuning, enabling anyone to train their own models
An agriculture evaluation benchmark (datasets and metrics) providing a common standard to assess and compare model performance
A corpus of agricultural documents for building Retrieval-Augmented Generation (RAG) pipelines

The initiative’s philosophy is to empower the community with practical, high-quality AI resources - allowing researchers, developers, and institutions to create their own downstream agricultural AI applications built on top of these open building blocks, including the fine-tuned AgriLLMs.

How to Get Started with the Model

Use the code below to get started with the model.

Transformers Code Example

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("AI71ai/Llama-agrillm-3.3-70B")
model = AutoModelForCausalLM.from_pretrained("AI71ai/Llama-agrillm-3.3-70B")
messages = [
    {"role": "user", "content": "How to grow maize in Kenya?"},
]
inputs = tokenizer.apply_chat_template(
    messages,
    add_generation_prompt=True,
    tokenize=True,
    return_dict=True,
    return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Model Card for AI71ai/Llama-agrillm-3.3-70B
Model Details
- Model Description
Uses
Bias, Risks, and Limitations
Training Details
- Training Data
- Training Procedure
Evaluation
- Testing Data, Factors & Metrics
- Results
Environmental Impact
Acknowledgements & Data Sources
Citation
Model Card Contact
How to Get Started with the Model

Model Details

Model Description

🔧 Base Model

Base: meta-llama/Llama-3.3-70B-Instruct
Finetuning Method: LoRA (parameter-efficient finetuning)
Architecture: 70B transformer (Llama3.3 family)

🎯 Objective of Finetuning

Strengthen agricultural reasoning: Improve understanding of agronomic concepts, and the ability to reason about agricultural scenarios
Generate expert-aligned and domain-relevant outputs: Produce advisory responses that reflect best practices, validated knowledge, and use terminology appropriate to the agricultural context
Ensure reliability and safety: Reduce hallucinations, maintain factual accuracy, and preserve groundedness in agriculture-related queries
Leverage retrieved context effectively: Enhance the model’s ability to interpret retrieved information and identify the most relevant agricultural content in RAG applications Preserve general capabilities: Maintain the broad reasoning and generative abilities of the Qwen3-A3B-30B base model

🌾 Model Capabilities

AI71ai/Llama-agrillm-3.3-70B provides foundational AI capabilities that can be applied across the agricultural domain, including:

Question Answering: Responds accurately to agricultural queries based on provided or retrieved information
Summarization: Condenses technical agricultural documents, research papers, and policy briefs into concise summaries
Advisory Generation: Produces structured guidance or recommendations based on domain knowledge
Reasoning: Supports scenario analysis, domain-specific reasoning, and decision support
Context Evaluation: Assesses retrieved content for relevance when generating outputs (optimized for RAG pipelines)

Uses

AI71ai/Llama-agrillm-3.3-70B is primarily intended as a foundation model building block. It is designed to better understand agricultural contexts and perform effectively when connected to internal knowledge bases or used in Retrieval-Augmented Generation (RAG) pipelines. This model is not a standalone solution with universal answers; instead, it provides specialized capabilities that downstream applications can leverage to deliver accurate, grounded, and context-aware outputs in agriculture. The model can assist multiple personas across the agricultural ecosystem. Example capabilities include:

Farmers

Answer questions on crop production (sowing, irrigation, harvesting)
Provide pest and disease management guidance based on symptoms
Recommend fertilizers and nutrient applications
Advise on livestock care

Field Extension Agents

Generate advisory responses for farmers
Support diagnostic workflows and on-field problem-solving
Prepare step-by-step field instructions and protocols
Interpret technical manuals, guidelines, and extension materials

Academics & Researchers

Summarize agricultural literature and research papers
Explain research methodologies and concepts
Analyze and interpret policy briefs and technical reports
Support domain-specific reasoning and scenario modeling

Policymakers & Project Managers

Assist in agricultural program assessments and evaluations
Support impact analysis and data-driven recommendations
Generate evidence-based policy or project briefs
Provide reasoning grounded in agricultural principles and best practices

⚠️ Out-of-Scope Use

This model is not a universal source of answers and is not intended as a standalone solution. It is designed to support agricultural workflows but cannot replace expert knowledge. All outputs should be verified, especially in high-stakes contexts.

Moreover, the model is not intended for:

Medical or veterinary diagnosis
Producing legally binding recommendations or official documents
High-risk decision-making without expert supervision
Replacing certified agronomists, extension agents, or researchers
Providing real-time field measurements or monitoring (e.g., soil moisture, weather, or crop sensor data)
Making financial, legal, or regulatory decisions in agricultural projects

Bias, Risks, and Limitations

Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.

Training Details

Training Data

The model was fine-tuned on 200k high-quality examples (146k was domain specific), combining:

Human expert-generated Q&A pairs - Written or approved by agricultural domain specialists (e.g., agronomists, researchers, extension agents, etc.)
Q&A pairs extracted from real-world interactions - Extracted from forums, email threads, SMS-based extension services, and other practical agricultural communications.
Synthetic Q&A pairs: Generated and curated through controlled extraction from agricultural documents using LLMs with carefully designed prompts to prevent any hallucination
Domain-specific tasks:
- Summarization of agronomy texts
- Reading comprehension of agricultural guidelines
- Soil, crop, and livestock reasoning tasks
- Policy, research, and project-management reasoning
General ability tasks:
- Included to prevent catastrophic forgetting
- Maintains strong general reasoning, math, and language skills

No private or personal data is included. All partner datasets were anonymized and ethically prepared.

Training Procedure

This model was created by performing LoRA (parameter-efficient) finetuning on top of the instruct-foundation model: meta-llama/Llama-3.3-70B-Instruct

Hardware: 8xH100 GPU
CPU: 150 Cores
Training Time: ~7 hours
Batch Size: 8
gradient_accumulation_steps: 4
Model Checkpoint: 350 steps 
lora_rank: 128
lora_alpha: 256
target_modules: 
    - q_proj
    - k_proj
    - v_proj
    - o_proj
    - gate_proj
    - up_proj
    - down_proj
    - lm_head
    - embed_tokens

Evaluation

The first evaluation was performed covering:

Basic generation quality
Domain consistency
Manual SME review of sample outputs
A full benchmark suite (field Q&A accuracy, agronomy reasoning, livestock QA, safety evaluation) will be added in a future update.

Model	Answer Correctness	Factual Correctness (Recall)	Factual Correctness (Precision)
GPT-4o	0.4046921554	0.4960951189	0.2404005006
GPT-4o with RAG	0.4888914377	0.630975	0.3546307885
Llama-3.3-70B	0.3832698282	0.4624625	0.2320625
Llama-3.3-70B Finetuned	0.4739898234	0.3576146789	0.3594901961
Llama-3.3-70B Finetuned with RAG	0.5569369224	0.4872040302	0.5322138365
Qwen3-30B-A3B	0.3773581101	0.4801293661	0.1803589744
Qwen3-30B-A3B Finetuned	0.4877470022	0.3549163449	0.3906931964
Qwen3-30B-A3B Finetuned with RAG	0.5645609396	0.4957125	0.5091625
Falcon3-10B Finetuned	0.4516784406	0.3506540881	0.3499623588
Falcon3-10B Finetuned with RAG	0.5453206181	0.459025	0.5092

Testing Data, Factors & Metrics

Testing Data

We evaluated the model on the dataset available at agrillm-qa-eval-800, which contains 800 Q&A pairs, covering multiple agricultural topics, crops, geographies and tasks.

Factors

The creation of the evaluation set considered multiple dimensions:

User personas: 4 primary personas spanning farmers, extension agents, researchers, and policymakers
Topics and domains: Multiple crops, produce types, and subdomains of agricultural knowledge
Answer types: Single-turn and multi-turn Q&A, as well as domain ‘tasks’ (summarization, classification, etc.)

Metrics

The evaluation pipeline leverages RAGAS with LLM-as-a-judge using GPT-4o, where the model’s responses are automatically assessed against reference answers for correctness, completeness, and language quality.

Example (Agriculture – Rice Irrigation):

Ground-truth facts:

Rice needs standing water during early growth.
Drip irrigation is rarely used for rice.
Water requirement is highest during tillering stage.

LLM Response:

Rice generally grows best with standing water.

Restrictive licenses are good

when we grade the response we find

✅ Rice generally grows best with standing water. ❌ Restrictive licenses are good

factual_correctness_precision: Out of 2 stated facts, 1 is correct (standing water) → Precision = 1/2.

factual_correctness_recall: Out of 3 ground-truth facts, the model mentioned only 1 → Recall = 1/3.

Results

Check our leaderboard for more information: https://huggingface.co/spaces/AI71ai/agrillm-leaderboard

Environmental Impact

Training Hardware

8× NVIDIA H100 GPUs
150-core CPU
500 GB storage
Single-node configuration in a UAE data center
~7 hours

Estimated Energy Consumption

Estimated IT power (GPUs + CPU + system): ≈ 7.1 kW
Data-center Power Usage Effectiveness (PUE): 1.4
Estimated total facility power: ≈ 9.94 kW
Total energy consumed:
9.94 kW×7 h≈69.6 kWh

Estimated Carbon Emissions

UAE grid emission factor: 0.40 kg CO₂e/kWh
Total carbon emissions:
69.6 kWh×0.40 ≈28 kg CO₂e

Summary

Total training energy: ~69.6 kWh
Total training emissions: ~28 kg CO₂e

Assumptions and Methodology

GPU power based on NVIDIA H100 SXM maximum TDP of ~700 W per GPU.
CPU + platform power estimated at ~1.5 kW under load.
IT load assumed to be fully utilized during training.
Data-center overhead modeled using PUE = 1.4.
UAE grid intensity assumed at 0.40 kg CO₂e/kWh.
Estimates include only operational electricity use; hardware manufacturing and external networking emissions are excluded.

Acknowledgements & Data Sources

We gratefully acknowledge the contributions of our partners and collaborators who made this work possible:

The International Affairs Office of the UAE Presidential Court
Gates Foundation
CGIAR– Consultative Group on International Agricultural Research
Embrapa – Empresa Brasileira de Pesquisa Agropecuária
ECHO
FAO – Food and Agriculture Organization of the United Nations
IFAD – International Fund for Agricultural Development
The World Bank
Digital Green
KIADPAI – Khalifa International Award for Date Palm and Agricultural Innovation
KALRO – Kenya Agricultural and Livestock Research Organization
Extension Foundation

Special thanks to all partners for their invaluable support, including:

Data preparation: Curating agricultural documents and Q&A pairs, with manual verification by domain experts Expert guidance: Supporting the verification of synthetic Q&A pairs generated for model fine-tuning AI assistant design: Providing expertise on designing downstream AI applications to test the models Model testing: Manually evaluating model outputs to ensure quality and relevance Field engagement: Collaborating with end-users in agricultural settings to support adoption, and collect current needs and feedback

All datasets used were anonymized and ethically prepared, and no private or personal data was included

Citation

If you find this model useful, please cite us:

@misc{Llama-agrillm-3.3-70B,
      title={Llama-agrillm-3.3-70B}, 
      author={Mamoun Alaoui, Ojas Agarwal, Zafar Shadman, Derek Thomas},
      year={2025},
}

Model Card Contact

Mamoun Alaoui mamoun.alaoui@ai71.ai
Ojas Agarwal ojas.agarwal@ai71.ai
Zafar Shadman zafar.shadman@ai71.ai
Derek Thomas derek.thomas@ai71.ai

Downloads last month: 108

Safetensors

Model size

71B params

Tensor type

F16

Model tree for AI71ai/Llama-agrillm-3.3-70B

Base model

meta-llama/Llama-3.1-70B

Finetuned

meta-llama/Llama-3.3-70B-Instruct