sumink
/

llftfl7

sumink commited on Feb 5, 2025

Commit

fba0f95

verified ·

1 Parent(s): dd73e90

Create README.md

Files changed (1) hide show

README.md ADDED Viewed

+Model Overview
+This model is a fine-tuned version of the Llama-3.2-3B model, trained on a curated and optimized dataset derived through Facility Location (FL) techniques. The base model, Llama-3.2-3B, is a state-of-the-art large language model designed for various natural language processing tasks, and it has been further adapted to improve its task-specific performance.
+Dataset Details
+Original Dataset: The dataset initially consisted of 10,000 samples, combining diverse conversational pairs for instruction tuning and response generation tasks.
+Data Selection Process:
+The Facility Location (FL) algorithm was applied to the original dataset to identify the most representative and diverse samples. This method maximized dataset utility by ensuring a balanced and informative subset while maintaining the richness of the original data.
+As a result, the dataset was reduced to 7,000 high-quality samples, retaining only the most relevant and representative data points.
+Dataset Characteristics:
+Chosen-Response Pairs: 7,000 samples of question-response pairs refined to optimize learning efficiency.
+Diversity & Balance: The FL algorithm ensured the dataset captures diverse language usage and contexts without redundancy.