Model Card for AI71ai/Llama-agrillm-3.3-70B
AI71ai/Llama-agrillm-3.3-70B is an agriculture-focused Foundation Large Language Model (LLM) developed by ai71 with support from leading institutions in the agricultural sector. It is fine-tuned using LoRA on top of meta-llama/Llama-3.3-70B-Instruct.
This model is developed as part of the AgriLLM initiative, a multi-stakeholder collaboration involving the International Affairs Office at the UAE Presidential Court, the Gates Foundation, CGIAR, Embrapa, ECHO, FAO, IFAD, the World Bank, Digital Green, and other leading organizations in the agriculture domain.
The model is fine-tuned on agriculture-specific Q&A pairs to strengthen its ability to understand agricultural contexts, provide accurate agronomic guidance, and generate reliable, expert-aligned responses, while still preserving the broad capabilities of the underlying Llama 3.3-70B base model.
It is primarily designed for use within Retrieval-Augmented Generation (RAG) systems - where it can leverage external agricultural knowledge bases for accuracy and contextual grounding - rather than as a standalone model.
π What is AgriLLM?
AgriLLM is an initiative to provide the global agriculture community with open, foundation AI building blocks that support wider AI adoption and help close the information gap faced by smallholder farmers and agricultural professionals worldwide.
As part of the initiative, four open-source public goods will be released:
- A set of fine-tuned LLMs specialized for agriculture
- The supervised training dataset of agriculture-focused Q&A pairs used for fine-tuning, enabling anyone to train their own models
- An agriculture evaluation benchmark (datasets and metrics) providing a common standard to assess and compare model performance
- A corpus of agricultural documents for building Retrieval-Augmented Generation (RAG) pipelines
The initiativeβs philosophy is to empower the community with practical, high-quality AI resources - allowing researchers, developers, and institutions to create their own downstream agricultural AI applications built on top of these open building blocks, including the fine-tuned AgriLLMs.
How to Get Started with the Model
Use the code below to get started with the model.
Transformers Code Example
# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("AI71ai/Llama-agrillm-3.3-70B")
model = AutoModelForCausalLM.from_pretrained("AI71ai/Llama-agrillm-3.3-70B")
messages = [
{"role": "user", "content": "How to grow maize in Kenya?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))
Table of Contents
- Model Card for AI71ai/Llama-agrillm-3.3-70B
- Model Details
- Uses
- Bias, Risks, and Limitations
- Training Details
- Evaluation
- Environmental Impact
- Acknowledgements & Data Sources
- Citation
- Model Card Contact
- How to Get Started with the Model
Model Details
Model Description
π§ Base Model
- Base:
meta-llama/Llama-3.3-70B-Instruct - Finetuning Method: LoRA (parameter-efficient finetuning)
- Architecture: 70B transformer (Llama3.3 family)
π― Objective of Finetuning
- Strengthen agricultural reasoning: Improve understanding of agronomic concepts, and the ability to reason about agricultural scenarios
- Generate expert-aligned and domain-relevant outputs: Produce advisory responses that reflect best practices, validated knowledge, and use terminology appropriate to the agricultural context
- Ensure reliability and safety: Reduce hallucinations, maintain factual accuracy, and preserve groundedness in agriculture-related queries
- Leverage retrieved context effectively: Enhance the modelβs ability to interpret retrieved information and identify the most relevant agricultural content in RAG applications Preserve general capabilities: Maintain the broad reasoning and generative abilities of the Qwen3-A3B-30B base model
πΎ Model Capabilities
AI71ai/Llama-agrillm-3.3-70B provides foundational AI capabilities that can be applied across the agricultural domain, including:
- Question Answering: Responds accurately to agricultural queries based on provided or retrieved information
- Summarization: Condenses technical agricultural documents, research papers, and policy briefs into concise summaries
- Advisory Generation: Produces structured guidance or recommendations based on domain knowledge
- Reasoning: Supports scenario analysis, domain-specific reasoning, and decision support
- Context Evaluation: Assesses retrieved content for relevance when generating outputs (optimized for RAG pipelines)
Uses
AI71ai/Llama-agrillm-3.3-70B is primarily intended as a foundation model building block. It is designed to better understand agricultural contexts and perform effectively when connected to internal knowledge bases or used in Retrieval-Augmented Generation (RAG) pipelines. This model is not a standalone solution with universal answers; instead, it provides specialized capabilities that downstream applications can leverage to deliver accurate, grounded, and context-aware outputs in agriculture. The model can assist multiple personas across the agricultural ecosystem. Example capabilities include:
Farmers
- Answer questions on crop production (sowing, irrigation, harvesting)
- Provide pest and disease management guidance based on symptoms
- Recommend fertilizers and nutrient applications
- Advise on livestock care
Field Extension Agents
- Generate advisory responses for farmers
- Support diagnostic workflows and on-field problem-solving
- Prepare step-by-step field instructions and protocols
- Interpret technical manuals, guidelines, and extension materials
Academics & Researchers
- Summarize agricultural literature and research papers
- Explain research methodologies and concepts
- Analyze and interpret policy briefs and technical reports
- Support domain-specific reasoning and scenario modeling
Policymakers & Project Managers
- Assist in agricultural program assessments and evaluations
- Support impact analysis and data-driven recommendations
- Generate evidence-based policy or project briefs
- Provide reasoning grounded in agricultural principles and best practices
β οΈ Out-of-Scope Use
This model is not a universal source of answers and is not intended as a standalone solution. It is designed to support agricultural workflows but cannot replace expert knowledge. All outputs should be verified, especially in high-stakes contexts.
Moreover, the model is not intended for:
- Medical or veterinary diagnosis
- Producing legally binding recommendations or official documents
- High-risk decision-making without expert supervision
- Replacing certified agronomists, extension agents, or researchers
- Providing real-time field measurements or monitoring (e.g., soil moisture, weather, or crop sensor data)
- Making financial, legal, or regulatory decisions in agricultural projects
Bias, Risks, and Limitations
Significant research has explored bias and fairness issues with language models (see, e.g., Sheng et al. (2021) and Bender et al. (2021)). Predictions generated by the model may include disturbing and harmful stereotypes across protected classes; identity characteristics; and sensitive, social, and occupational groups.
Training Details
Training Data
The model was fine-tuned on 200k high-quality examples (146k was domain specific), combining:
- Human expert-generated Q&A pairs - Written or approved by agricultural domain specialists (e.g., agronomists, researchers, extension agents, etc.)
- Q&A pairs extracted from real-world interactions - Extracted from forums, email threads, SMS-based extension services, and other practical agricultural communications.
- Synthetic Q&A pairs: Generated and curated through controlled extraction from agricultural documents using LLMs with carefully designed prompts to prevent any hallucination
- Domain-specific tasks:
- Summarization of agronomy texts
- Reading comprehension of agricultural guidelines
- Soil, crop, and livestock reasoning tasks
- Policy, research, and project-management reasoning
- General ability tasks:
- Included to prevent catastrophic forgetting
- Maintains strong general reasoning, math, and language skills
No private or personal data is included. All partner datasets were anonymized and ethically prepared.
Training Procedure
This model was created by performing LoRA (parameter-efficient) finetuning on top of the instruct-foundation model: meta-llama/Llama-3.3-70B-Instruct
Hardware: 8xH100 GPU
CPU: 150 Cores
Training Time: ~7 hours
Batch Size: 8
gradient_accumulation_steps: 4
Model Checkpoint: 350 steps
lora_rank: 128
lora_alpha: 256
target_modules:
- q_proj
- k_proj
- v_proj
- o_proj
- gate_proj
- up_proj
- down_proj
- lm_head
- embed_tokens
Evaluation
The first evaluation was performed covering:
- Basic generation quality
- Domain consistency
- Manual SME review of sample outputs
- A full benchmark suite (field Q&A accuracy, agronomy reasoning, livestock QA, safety evaluation) will be added in a future update.
| Model | Answer Correctness | Factual Correctness (Recall) | Factual Correctness (Precision) |
|---|---|---|---|
| GPT-4o | 0.4046921554 | 0.4960951189 | 0.2404005006 |
| GPT-4o with RAG | 0.4888914377 | 0.630975 | 0.3546307885 |
| Llama-3.3-70B | 0.3832698282 | 0.4624625 | 0.2320625 |
| Llama-3.3-70B Finetuned | 0.4739898234 | 0.3576146789 | 0.3594901961 |
| Llama-3.3-70B Finetuned with RAG | 0.5569369224 | 0.4872040302 | 0.5322138365 |
| Qwen3-30B-A3B | 0.3773581101 | 0.4801293661 | 0.1803589744 |
| Qwen3-30B-A3B Finetuned | 0.4877470022 | 0.3549163449 | 0.3906931964 |
| Qwen3-30B-A3B Finetuned with RAG | 0.5645609396 | 0.4957125 | 0.5091625 |
| Falcon3-10B Finetuned | 0.4516784406 | 0.3506540881 | 0.3499623588 |
| Falcon3-10B Finetuned with RAG | 0.5453206181 | 0.459025 | 0.5092 |
Testing Data, Factors & Metrics
Testing Data
We evaluated the model on the dataset available at agrillm-qa-eval-800, which contains 800 Q&A pairs, covering multiple agricultural topics, crops, geographies and tasks.
Factors
The creation of the evaluation set considered multiple dimensions:
- User personas: 4 primary personas spanning farmers, extension agents, researchers, and policymakers
- Topics and domains: Multiple crops, produce types, and subdomains of agricultural knowledge
- Answer types: Single-turn and multi-turn Q&A, as well as domain βtasksβ (summarization, classification, etc.)
Metrics
The evaluation pipeline leverages RAGAS with LLM-as-a-judge using GPT-4o, where the modelβs responses are automatically assessed against reference answers for correctness, completeness, and language quality.
Example (Agriculture β Rice Irrigation):
Ground-truth facts:
- Rice needs standing water during early growth.
- Drip irrigation is rarely used for rice.
- Water requirement is highest during tillering stage.
LLM Response:
- Rice generally grows best with standing water.
- Restrictive licenses are good
when we grade the response we find
β Rice generally grows best with standing water. β Restrictive licenses are good
factual_correctness_precision: Out of 2 stated facts, 1 is correct (standing water) β Precision = 1/2.
factual_correctness_recall: Out of 3 ground-truth facts, the model mentioned only 1 β Recall = 1/3.
Results
Check our leaderboard for more information: https://huggingface.co/spaces/AI71ai/agrillm-leaderboard
Environmental Impact
Training Hardware
- 8Γ NVIDIA H100 GPUs
- 150-core CPU
- 500 GB storage
- Single-node configuration in a UAE data center
- ~7 hours
Estimated Energy Consumption
- Estimated IT power (GPUs + CPU + system): β 7.1 kW
- Data-center Power Usage Effectiveness (PUE): 1.4
- Estimated total facility power: β 9.94 kW
- Total energy consumed:
9.94 kWΓ7 hβ69.6 kWh
Estimated Carbon Emissions
- UAE grid emission factor: 0.40 kg COβe/kWh
- Total carbon emissions:
69.6 kWhΓ0.40 β28 kg COβe
Summary
- Total training energy: ~69.6 kWh
- Total training emissions: ~28 kg COβe
Assumptions and Methodology
- GPU power based on NVIDIA H100 SXM maximum TDP of ~700 W per GPU.
- CPU + platform power estimated at ~1.5 kW under load.
- IT load assumed to be fully utilized during training.
- Data-center overhead modeled using PUE = 1.4.
- UAE grid intensity assumed at 0.40 kg COβe/kWh.
- Estimates include only operational electricity use; hardware manufacturing and external networking emissions are excluded.
Acknowledgements & Data Sources
We gratefully acknowledge the contributions of our partners and collaborators who made this work possible:
- The International Affairs Office of the UAE Presidential Court
- Gates Foundation
- CGIARβ Consultative Group on International Agricultural Research
- Embrapa β Empresa Brasileira de Pesquisa AgropecuΓ‘ria
- ECHO
- FAO β Food and Agriculture Organization of the United Nations
- IFAD β International Fund for Agricultural Development
- The World Bank
- Digital Green
- KIADPAI β Khalifa International Award for Date Palm and Agricultural Innovation
- KALRO β Kenya Agricultural and Livestock Research Organization
- Extension Foundation
Special thanks to all partners for their invaluable support, including:
Data preparation: Curating agricultural documents and Q&A pairs, with manual verification by domain experts Expert guidance: Supporting the verification of synthetic Q&A pairs generated for model fine-tuning AI assistant design: Providing expertise on designing downstream AI applications to test the models Model testing: Manually evaluating model outputs to ensure quality and relevance Field engagement: Collaborating with end-users in agricultural settings to support adoption, and collect current needs and feedback
All datasets used were anonymized and ethically prepared, and no private or personal data was included
Citation
If you find this model useful, please cite us:
@misc{Llama-agrillm-3.3-70B,
title={Llama-agrillm-3.3-70B},
author={Mamoun Alaoui, Ojas Agarwal, Zafar Shadman, Derek Thomas},
year={2025},
}
Model Card Contact
- Mamoun Alaoui mamoun.alaoui@ai71.ai
- Ojas Agarwal ojas.agarwal@ai71.ai
- Zafar Shadman zafar.shadman@ai71.ai
- Derek Thomas derek.thomas@ai71.ai
- Downloads last month
- 108
