sumink commited on
Commit
fba0f95
·
verified ·
1 Parent(s): dd73e90

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +11 -0
README.md ADDED
@@ -0,0 +1,11 @@
 
 
 
 
 
 
 
 
 
 
 
 
1
+ Model Overview
2
+ This model is a fine-tuned version of the Llama-3.2-3B model, trained on a curated and optimized dataset derived through Facility Location (FL) techniques. The base model, Llama-3.2-3B, is a state-of-the-art large language model designed for various natural language processing tasks, and it has been further adapted to improve its task-specific performance.
3
+
4
+ Dataset Details
5
+ Original Dataset: The dataset initially consisted of 10,000 samples, combining diverse conversational pairs for instruction tuning and response generation tasks.
6
+ Data Selection Process:
7
+ The Facility Location (FL) algorithm was applied to the original dataset to identify the most representative and diverse samples. This method maximized dataset utility by ensuring a balanced and informative subset while maintaining the richness of the original data.
8
+ As a result, the dataset was reduced to 7,000 high-quality samples, retaining only the most relevant and representative data points.
9
+ Dataset Characteristics:
10
+ Chosen-Response Pairs: 7,000 samples of question-response pairs refined to optimize learning efficiency.
11
+ Diversity & Balance: The FL algorithm ensured the dataset captures diverse language usage and contexts without redundancy.