izzcw
/

cooking_sft_fail_new_mem

+---
+library_name: transformers
+license: llama3.1
+base_model: meta-llama/Llama-3.1-8B-Instruct
+tags:
+- llama-factory
+- generated_from_trainer
+model-index:
+- name: cooking_sft_fail_new_mem
+  results: []
+---
+<!-- This model card has been generated automatically according to the information the Trainer had access to. You
+should probably proofread and complete it, then remove this comment. -->
+# cooking_sft_fail_new_mem
+This model is a fine-tuned version of [meta-llama/Llama-3.1-8B-Instruct](https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct) on the None dataset.
+It achieves the following results on the evaluation set:
+- Loss: 0.2037
+## Model description
+More information needed
+## Intended uses & limitations
+More information needed
+## Training and evaluation data
+More information needed
+## Training procedure
+### Training hyperparameters
+The following hyperparameters were used during training:
+- learning_rate: 1e-05
+- train_batch_size: 1
+- eval_batch_size: 1
+- seed: 42
+- distributed_type: multi-GPU
+- num_devices: 8
+- gradient_accumulation_steps: 16
+- total_train_batch_size: 128
+- total_eval_batch_size: 8
+- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
+- lr_scheduler_type: cosine
+- lr_scheduler_warmup_ratio: 0.1
+- num_epochs: 1.0
+### Training results
+| Training Loss | Epoch  | Step | Validation Loss |
+|:-------------:|:------:|:----:|:---------------:|
+| 0.3821        | 0.0133 | 50   | 0.4735          |
+| 0.302         | 0.0267 | 100  | 0.3178          |
+| 0.2988        | 0.0400 | 150  | 0.3253          |
+| 0.3054        | 0.0533 | 200  | 0.3250          |
+| 0.2967        | 0.0666 | 250  | 0.3232          |
+| 0.3137        | 0.0800 | 300  | 0.3207          |
+| 0.3221        | 0.0933 | 350  | 0.3211          |
+| 0.3188        | 0.1066 | 400  | 0.3204          |
+| 0.308         | 0.1200 | 450  | 0.3149          |
+| 0.3123        | 0.1333 | 500  | 0.3106          |
+| 0.3138        | 0.1466 | 550  | 0.3050          |
+| 0.3032        | 0.1600 | 600  | 0.3046          |
+| 0.2827        | 0.1733 | 650  | 0.3017          |
+| 0.2953        | 0.1866 | 700  | 0.2970          |
+| 0.2854        | 0.1999 | 750  | 0.2924          |
+| 0.2872        | 0.2133 | 800  | 0.2896          |
+| 0.2866        | 0.2266 | 850  | 0.2836          |
+| 0.2925        | 0.2399 | 900  | 0.2794          |
+| 0.2843        | 0.2533 | 950  | 0.2823          |
+| 0.292         | 0.2666 | 1000 | 0.2789          |
+| 0.2775        | 0.2799 | 1050 | 0.2763          |
+| 0.2652        | 0.2933 | 1100 | 0.2717          |
+| 0.27          | 0.3066 | 1150 | 0.2712          |
+| 0.277         | 0.3199 | 1200 | 0.2749          |
+| 0.2681        | 0.3332 | 1250 | 0.2709          |
+| 0.2699        | 0.3466 | 1300 | 0.2718          |
+| 0.2682        | 0.3599 | 1350 | 0.2676          |
+| 0.2668        | 0.3732 | 1400 | 0.2662          |
+| 0.2615        | 0.3866 | 1450 | 0.2689          |
+| 0.2501        | 0.3999 | 1500 | 0.2583          |
+| 0.2545        | 0.4132 | 1550 | 0.2568          |
+| 0.2618        | 0.4265 | 1600 | 0.2523          |
+| 0.2615        | 0.4399 | 1650 | 0.2550          |
+| 0.2512        | 0.4532 | 1700 | 0.2488          |
+| 0.245         | 0.4665 | 1750 | 0.2504          |
+| 0.2503        | 0.4799 | 1800 | 0.2481          |
+| 0.2402        | 0.4932 | 1850 | 0.2450          |
+| 0.2346        | 0.5065 | 1900 | 0.2440          |
+| 0.2413        | 0.5199 | 1950 | 0.2425          |
+| 0.24          | 0.5332 | 2000 | 0.2383          |
+| 0.2398        | 0.5465 | 2050 | 0.2408          |
+| 0.2473        | 0.5598 | 2100 | 0.2384          |
+| 0.2423        | 0.5732 | 2150 | 0.2348          |
+| 0.2294        | 0.5865 | 2200 | 0.2311          |
+| 0.2403        | 0.5998 | 2250 | 0.2323          |
+| 0.2319        | 0.6132 | 2300 | 0.2297          |
+| 0.222         | 0.6265 | 2350 | 0.2288          |
+| 0.2193        | 0.6398 | 2400 | 0.2303          |
+| 0.2252        | 0.6531 | 2450 | 0.2247          |
+| 0.2304        | 0.6665 | 2500 | 0.2211          |
+| 0.2139        | 0.6798 | 2550 | 0.2199          |
+| 0.2186        | 0.6931 | 2600 | 0.2192          |
+| 0.2156        | 0.7065 | 2650 | 0.2183          |
+| 0.2187        | 0.7198 | 2700 | 0.2159          |
+| 0.222         | 0.7331 | 2750 | 0.2174          |
+| 0.2162        | 0.7465 | 2800 | 0.2153          |
+| 0.2253        | 0.7598 | 2850 | 0.2132          |
+| 0.2066        | 0.7731 | 2900 | 0.2134          |
+| 0.2113        | 0.7864 | 2950 | 0.2107          |
+| 0.2107        | 0.7998 | 3000 | 0.2085          |
+| 0.2055        | 0.8131 | 3050 | 0.2097          |
+| 0.2045        | 0.8264 | 3100 | 0.2075          |
+| 0.2172        | 0.8398 | 3150 | 0.2062          |
+| 0.2138        | 0.8531 | 3200 | 0.2075          |
+| 0.194         | 0.8664 | 3250 | 0.2051          |
+| 0.2133        | 0.8798 | 3300 | 0.2051          |
+| 0.2025        | 0.8931 | 3350 | 0.2047          |
+| 0.2088        | 0.9064 | 3400 | 0.2050          |
+| 0.204         | 0.9197 | 3450 | 0.2044          |
+| 0.2059        | 0.9331 | 3500 | 0.2039          |
+| 0.2103        | 0.9464 | 3550 | 0.2039          |
+| 0.2102        | 0.9597 | 3600 | 0.2039          |
+| 0.2051        | 0.9731 | 3650 | 0.2038          |
+| 0.2017        | 0.9864 | 3700 | 0.2037          |
+| 0.2088        | 0.9997 | 3750 | 0.2037          |
+### Framework versions
+- Transformers 4.49.0
+- Pytorch 2.5.1+cu124
+- Datasets 3.2.0
+- Tokenizers 0.21.0

generation_config.json ADDED Viewed

	@@ -0,0 +1,12 @@

+{
+  "bos_token_id": 128000,
+  "do_sample": true,
+  "eos_token_id": [
+    128001,
+    128008,
+    128009
+  ],
+  "temperature": 0.6,
+  "top_p": 0.9,
+  "transformers_version": "4.49.0"
+}