PEFT variations estnafinema0/llm-course-hw3-dora Text Generation • 0.3B • Updated Apr 11, 2025 • 1 estnafinema0/llm-course-hw3-lora Text Generation • 0.3B • Updated Apr 11, 2025 • 2 estnafinema0/llm-course-hw3-tinyllama-qlora Updated Apr 11, 2025 estnafinema0/llm-course-hw3-tinyllamma-qlora Updated Apr 11, 2025
SmolLM Variation: PPO & DPO Fine-Tuning for RLHF This collection presents the fine-tuning of the SmolLM model using two (RLHF) approaches: DPO and PPO. estnafinema0/trainer_output Text Classification • 0.1B • Updated Mar 30, 2025 • 2 estnafinema0/smolLM-variation-dpo Text Generation • 0.1B • Updated Mar 30, 2025 • 2 estnafinema0/smolLM-variation-ppo Text Generation • 0.1B • Updated Mar 30, 2025 • 4
NER Extraction. Active Learning Approach. estnafinema0/active-learning-nerc-models-kfold Updated Apr 4, 2025 estnafinema0/nerc-extraction 0.1B • Updated Apr 4, 2025 • 3
PEFT variations estnafinema0/llm-course-hw3-dora Text Generation • 0.3B • Updated Apr 11, 2025 • 1 estnafinema0/llm-course-hw3-lora Text Generation • 0.3B • Updated Apr 11, 2025 • 2 estnafinema0/llm-course-hw3-tinyllama-qlora Updated Apr 11, 2025 estnafinema0/llm-course-hw3-tinyllamma-qlora Updated Apr 11, 2025
NER Extraction. Active Learning Approach. estnafinema0/active-learning-nerc-models-kfold Updated Apr 4, 2025 estnafinema0/nerc-extraction 0.1B • Updated Apr 4, 2025 • 3
SmolLM Variation: PPO & DPO Fine-Tuning for RLHF This collection presents the fine-tuning of the SmolLM model using two (RLHF) approaches: DPO and PPO. estnafinema0/trainer_output Text Classification • 0.1B • Updated Mar 30, 2025 • 2 estnafinema0/smolLM-variation-dpo Text Generation • 0.1B • Updated Mar 30, 2025 • 2 estnafinema0/smolLM-variation-ppo Text Generation • 0.1B • Updated Mar 30, 2025 • 4