SanjiWatsuki
/

Lelantos-DPO-7B

Text Generation

text-generation-inference

Model card Files Files and versions

Model	AGIEval	GPT4All	TruthfulQA	Bigbench	Average
Lelantos-DPO-7B	45.47	75	67.05	46.64	58.54
Lelantos-7B	46.01	75	64.93	46.21	58.04

AGIEval

Task	Version	Metric	Value		Stderr
agieval_aqua_rat	0	acc	25.20	±	2.73
		acc_norm	24.02	±	2.69
agieval_logiqa_en	0	acc	40.71	±	1.93
		acc_norm	40.25	±	1.92
agieval_lsat_ar	0	acc	24.35	±	2.84
		acc_norm	23.04	±	2.78
agieval_lsat_lr	0	acc	55.69	±	2.20
		acc_norm	55.49	±	2.20
agieval_lsat_rc	0	acc	65.06	±	2.91
		acc_norm	65.43	±	2.91
agieval_sat_en	0	acc	76.70	±	2.95
		acc_norm	76.70	±	2.95
agieval_sat_en_without_passage	0	acc	47.09	±	3.49
		acc_norm	45.63	±	3.48
agieval_sat_math	0	acc	36.36	±	3.25
		acc_norm	33.18	±	3.18

Average: 45.47%

GPT4All

Task	Version	Metric	Value		Stderr
arc_challenge	0	acc	62.12	±	1.42
		acc_norm	63.23	±	1.41
arc_easy	0	acc	85.40	±	0.72
		acc_norm	81.02	±	0.80
boolq	1	acc	87.25	±	0.58
hellaswag	0	acc	67.97	±	0.47
		acc_norm	85.48	±	0.35
openbookqa	0	acc	36.80	±	2.16
		acc_norm	47.20	±	2.23
piqa	0	acc	81.88	±	0.90
		acc_norm	83.57	±	0.86
winogrande	0	acc	77.27	±	1.18

Average: 75.0%

TruthfulQA

Task	Version	Metric	Value		Stderr
truthfulqa_mc	1	mc1	49.94	±	1.75
		mc2	67.05	±	1.53

Average: 67.05%

Bigbench

Task	Version	Metric	Value		Stderr
bigbench_causal_judgement	0	multiple_choice_grade	58.95	±	3.58
bigbench_date_understanding	0	multiple_choice_grade	64.23	±	2.50
bigbench_disambiguation_qa	0	multiple_choice_grade	36.43	±	3.00
bigbench_geometric_shapes	0	multiple_choice_grade	23.68	±	2.25
		exact_str_match	3.90	±	1.02
bigbench_logical_deduction_five_objects	0	multiple_choice_grade	33.40	±	2.11
bigbench_logical_deduction_seven_objects	0	multiple_choice_grade	24.43	±	1.63
bigbench_logical_deduction_three_objects	0	multiple_choice_grade	54.33	±	2.88
bigbench_movie_recommendation	0	multiple_choice_grade	52.20	±	2.24
bigbench_navigate	0	multiple_choice_grade	52.70	±	1.58
bigbench_reasoning_about_colored_objects	0	multiple_choice_grade	69.65	±	1.03
bigbench_ruin_names	0	multiple_choice_grade	50.22	±	2.36
bigbench_salient_translation_error_detection	0	multiple_choice_grade	40.98	±	1.56
bigbench_snarks	0	multiple_choice_grade	72.38	±	3.33
bigbench_sports_understanding	0	multiple_choice_grade	73.23	±	1.41
bigbench_temporal_sequences	0	multiple_choice_grade	39.90	±	1.55
bigbench_tracking_shuffled_objects_five_objects	0	multiple_choice_grade	20.88	±	1.15
bigbench_tracking_shuffled_objects_seven_objects	0	multiple_choice_grade	17.60	±	0.91
bigbench_tracking_shuffled_objects_three_objects	0	multiple_choice_grade	54.33	±	2.88

Average: 46.64%

Average score: 58.54%

Downloads last month: 612

Safetensors

Model size

7B params

Tensor type

F16

·

Model tree for SanjiWatsuki/Lelantos-DPO-7B

Merges

Quantizations

Spaces using SanjiWatsuki/Lelantos-DPO-7B 21