Create README.md
Browse files
README.md
ADDED
|
@@ -0,0 +1,11 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
Model Overview
|
| 2 |
+
This model is a fine-tuned version of the Llama-3.2-3B model, trained on a curated and optimized dataset derived through Facility Location (FL) techniques. The base model, Llama-3.2-3B, is a state-of-the-art large language model designed for various natural language processing tasks, and it has been further adapted to improve its task-specific performance.
|
| 3 |
+
|
| 4 |
+
Dataset Details
|
| 5 |
+
Original Dataset: The dataset initially consisted of 10,000 samples, combining diverse conversational pairs for instruction tuning and response generation tasks.
|
| 6 |
+
Data Selection Process:
|
| 7 |
+
The Facility Location (FL) algorithm was applied to the original dataset to identify the most representative and diverse samples. This method maximized dataset utility by ensuring a balanced and informative subset while maintaining the richness of the original data.
|
| 8 |
+
As a result, the dataset was reduced to 7,000 high-quality samples, retaining only the most relevant and representative data points.
|
| 9 |
+
Dataset Characteristics:
|
| 10 |
+
Chosen-Response Pairs: 7,000 samples of question-response pairs refined to optimize learning efficiency.
|
| 11 |
+
Diversity & Balance: The FL algorithm ensured the dataset captures diverse language usage and contexts without redundancy.
|