Model AGIEval GPT4All TruthfulQA Bigbench Average
Lelantos-DPO-7B 45.47 75 67.05 46.64 58.54
Lelantos-7B 46.01 75 64.93 46.21 58.04

AGIEval

Task Version Metric Value Stderr
agieval_aqua_rat 0 acc 25.20 Β± 2.73
acc_norm 24.02 Β± 2.69
agieval_logiqa_en 0 acc 40.71 Β± 1.93
acc_norm 40.25 Β± 1.92
agieval_lsat_ar 0 acc 24.35 Β± 2.84
acc_norm 23.04 Β± 2.78
agieval_lsat_lr 0 acc 55.69 Β± 2.20
acc_norm 55.49 Β± 2.20
agieval_lsat_rc 0 acc 65.06 Β± 2.91
acc_norm 65.43 Β± 2.91
agieval_sat_en 0 acc 76.70 Β± 2.95
acc_norm 76.70 Β± 2.95
agieval_sat_en_without_passage 0 acc 47.09 Β± 3.49
acc_norm 45.63 Β± 3.48
agieval_sat_math 0 acc 36.36 Β± 3.25
acc_norm 33.18 Β± 3.18

Average: 45.47%

GPT4All

Task Version Metric Value Stderr
arc_challenge 0 acc 62.12 Β± 1.42
acc_norm 63.23 Β± 1.41
arc_easy 0 acc 85.40 Β± 0.72
acc_norm 81.02 Β± 0.80
boolq 1 acc 87.25 Β± 0.58
hellaswag 0 acc 67.97 Β± 0.47
acc_norm 85.48 Β± 0.35
openbookqa 0 acc 36.80 Β± 2.16
acc_norm 47.20 Β± 2.23
piqa 0 acc 81.88 Β± 0.90
acc_norm 83.57 Β± 0.86
winogrande 0 acc 77.27 Β± 1.18

Average: 75.0%

TruthfulQA

Task Version Metric Value Stderr
truthfulqa_mc 1 mc1 49.94 Β± 1.75
mc2 67.05 Β± 1.53

Average: 67.05%

Bigbench

Task Version Metric Value Stderr
bigbench_causal_judgement 0 multiple_choice_grade 58.95 Β± 3.58
bigbench_date_understanding 0 multiple_choice_grade 64.23 Β± 2.50
bigbench_disambiguation_qa 0 multiple_choice_grade 36.43 Β± 3.00
bigbench_geometric_shapes 0 multiple_choice_grade 23.68 Β± 2.25
exact_str_match 3.90 Β± 1.02
bigbench_logical_deduction_five_objects 0 multiple_choice_grade 33.40 Β± 2.11
bigbench_logical_deduction_seven_objects 0 multiple_choice_grade 24.43 Β± 1.63
bigbench_logical_deduction_three_objects 0 multiple_choice_grade 54.33 Β± 2.88
bigbench_movie_recommendation 0 multiple_choice_grade 52.20 Β± 2.24
bigbench_navigate 0 multiple_choice_grade 52.70 Β± 1.58
bigbench_reasoning_about_colored_objects 0 multiple_choice_grade 69.65 Β± 1.03
bigbench_ruin_names 0 multiple_choice_grade 50.22 Β± 2.36
bigbench_salient_translation_error_detection 0 multiple_choice_grade 40.98 Β± 1.56
bigbench_snarks 0 multiple_choice_grade 72.38 Β± 3.33
bigbench_sports_understanding 0 multiple_choice_grade 73.23 Β± 1.41
bigbench_temporal_sequences 0 multiple_choice_grade 39.90 Β± 1.55
bigbench_tracking_shuffled_objects_five_objects 0 multiple_choice_grade 20.88 Β± 1.15
bigbench_tracking_shuffled_objects_seven_objects 0 multiple_choice_grade 17.60 Β± 0.91
bigbench_tracking_shuffled_objects_three_objects 0 multiple_choice_grade 54.33 Β± 2.88

Average: 46.64%

Average score: 58.54%

Downloads last month
612
Safetensors
Model size
7B params
Tensor type
F16
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for SanjiWatsuki/Lelantos-DPO-7B

Merges
3 models
Quantizations
1 model

Spaces using SanjiWatsuki/Lelantos-DPO-7B 21