Training in progress, step 100, checkpoint

Browse files

Files changed (14) hide show

last-checkpoint/README.md +202 -0
last-checkpoint/adapter_config.json +34 -0
last-checkpoint/adapter_model.safetensors +3 -0
last-checkpoint/added_tokens.json +5 -0
last-checkpoint/merges.txt +0 -0
last-checkpoint/optimizer.pt +3 -0
last-checkpoint/rng_state.pth +3 -0
last-checkpoint/scheduler.pt +3 -0
last-checkpoint/special_tokens_map.json +20 -0
last-checkpoint/tokenizer.json +3 -0
last-checkpoint/tokenizer_config.json +43 -0
last-checkpoint/trainer_state.json +758 -0
last-checkpoint/training_args.bin +3 -0
last-checkpoint/vocab.json +0 -0

last-checkpoint/README.md ADDED Viewed

	@@ -0,0 +1,202 @@

+---
+base_model: Qwen/Qwen2-0.5B-Instruct
+library_name: peft
+---
+# Model Card for Model ID
+<!-- Provide a quick summary of what the model is/does. -->
+## Model Details
+### Model Description
+<!-- Provide a longer summary of what this model is. -->
+- **Developed by:** [More Information Needed]
+- **Funded by [optional]:** [More Information Needed]
+- **Shared by [optional]:** [More Information Needed]
+- **Model type:** [More Information Needed]
+- **Language(s) (NLP):** [More Information Needed]
+- **License:** [More Information Needed]
+- **Finetuned from model [optional]:** [More Information Needed]
+### Model Sources [optional]
+<!-- Provide the basic links for the model. -->
+- **Repository:** [More Information Needed]
+- **Paper [optional]:** [More Information Needed]
+- **Demo [optional]:** [More Information Needed]
+## Uses
+<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
+### Direct Use
+<!-- This section is for the model use without fine-tuning or plugging into a larger ecosystem/app. -->
+[More Information Needed]
+### Downstream Use [optional]
+<!-- This section is for the model use when fine-tuned for a task, or when plugged into a larger ecosystem/app -->
+[More Information Needed]
+### Out-of-Scope Use
+<!-- This section addresses misuse, malicious use, and uses that the model will not work well for. -->
+[More Information Needed]
+## Bias, Risks, and Limitations
+<!-- This section is meant to convey both technical and sociotechnical limitations. -->
+[More Information Needed]
+### Recommendations
+<!-- This section is meant to convey recommendations with respect to the bias, risk, and technical limitations. -->
+Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
+## How to Get Started with the Model
+Use the code below to get started with the model.
+[More Information Needed]
+## Training Details
+### Training Data
+<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
+[More Information Needed]
+### Training Procedure
+<!-- This relates heavily to the Technical Specifications. Content here should link to that section when it is relevant to the training procedure. -->
+#### Preprocessing [optional]
+[More Information Needed]
+#### Training Hyperparameters
+- **Training regime:** [More Information Needed] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
+#### Speeds, Sizes, Times [optional]
+<!-- This section provides information about throughput, start/end time, checkpoint size if relevant, etc. -->
+[More Information Needed]
+## Evaluation
+<!-- This section describes the evaluation protocols and provides the results. -->
+### Testing Data, Factors & Metrics
+#### Testing Data
+<!-- This should link to a Dataset Card if possible. -->
+[More Information Needed]
+#### Factors
+<!-- These are the things the evaluation is disaggregating by, e.g., subpopulations or domains. -->
+[More Information Needed]
+#### Metrics
+<!-- These are the evaluation metrics being used, ideally with a description of why. -->
+[More Information Needed]
+### Results
+[More Information Needed]
+#### Summary
+## Model Examination [optional]
+<!-- Relevant interpretability work for the model goes here -->
+[More Information Needed]
+## Environmental Impact
+<!-- Total emissions (in grams of CO2eq) and additional considerations, such as electricity usage, go here. Edit the suggested text below accordingly -->
+Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700).
+- **Hardware Type:** [More Information Needed]
+- **Hours used:** [More Information Needed]
+- **Cloud Provider:** [More Information Needed]
+- **Compute Region:** [More Information Needed]
+- **Carbon Emitted:** [More Information Needed]
+## Technical Specifications [optional]
+### Model Architecture and Objective
+[More Information Needed]
+### Compute Infrastructure
+[More Information Needed]
+#### Hardware
+[More Information Needed]
+#### Software
+[More Information Needed]
+## Citation [optional]
+<!-- If there is a paper or blog post introducing the model, the APA and Bibtex information for that should go in this section. -->
+**BibTeX:**
+[More Information Needed]
+**APA:**
+[More Information Needed]
+## Glossary [optional]
+<!-- If relevant, include terms and calculations in this section that can help readers understand the model or model card. -->
+[More Information Needed]
+## More Information [optional]
+[More Information Needed]
+## Model Card Authors [optional]
+[More Information Needed]
+## Model Card Contact
+[More Information Needed]
+### Framework versions
+- PEFT 0.13.2

last-checkpoint/adapter_config.json ADDED Viewed

	@@ -0,0 +1,34 @@

+{
+  "alpha_pattern": {},
+  "auto_mapping": null,
+  "base_model_name_or_path": "Qwen/Qwen2-0.5B-Instruct",
+  "bias": "none",
+  "fan_in_fan_out": null,
+  "inference_mode": true,
+  "init_lora_weights": true,
+  "layer_replication": null,
+  "layers_pattern": null,
+  "layers_to_transform": null,
+  "loftq_config": {},
+  "lora_alpha": 32,
+  "lora_dropout": 0.05,
+  "megatron_config": null,
+  "megatron_core": "megatron.core",
+  "modules_to_save": null,
+  "peft_type": "LORA",
+  "r": 16,
+  "rank_pattern": {},
+  "revision": null,
+  "target_modules": [
+    "o_proj",
+    "gate_proj",
+    "k_proj",
+    "down_proj",
+    "v_proj",
+    "q_proj",
+    "up_proj"
+  ],
+  "task_type": "CAUSAL_LM",
+  "use_dora": false,
+  "use_rslora": false
+}

last-checkpoint/adapter_model.safetensors ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:39e5644753aa51ad5c1d928b5147f80c77e51c3ade2a473a936e4e0c32f7ad6a
+size 35237104

last-checkpoint/added_tokens.json ADDED Viewed

	@@ -0,0 +1,5 @@

+{
+  "<|endoftext|>": 151643,
+  "<|im_end|>": 151645,
+  "<|im_start|>": 151644
+}

last-checkpoint/merges.txt ADDED Viewed

The diff for this file is too large to render. See raw diff

last-checkpoint/optimizer.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8563efbd560f7b019ec849a3c1aacaede5283ba5e02f67e3b1db6278a585e97b
+size 18810036

last-checkpoint/rng_state.pth ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:8b59879a88734bde90ff0252a4aded3466b5f045cb01684a9051b89d7acc0965
+size 14244

last-checkpoint/scheduler.pt ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:e8cc263af07e1053afb6a6c6eaf35ee41204c8ebbec4e02a00155272357d7402
+size 1064

last-checkpoint/special_tokens_map.json ADDED Viewed

	@@ -0,0 +1,20 @@

+{
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>"
+  ],
+  "eos_token": {
+    "content": "<|im_end|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  },
+  "pad_token": {
+    "content": "<|endoftext|>",
+    "lstrip": false,
+    "normalized": false,
+    "rstrip": false,
+    "single_word": false
+  }
+}

last-checkpoint/tokenizer.json ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:bcfe42da0a4497e8b2b172c1f9f4ec423a46dc12907f4349c55025f670422ba9
+size 11418266

last-checkpoint/tokenizer_config.json ADDED Viewed

	@@ -0,0 +1,43 @@

+{
+  "add_prefix_space": false,
+  "added_tokens_decoder": {
+    "151643": {
+      "content": "<|endoftext|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151644": {
+      "content": "<|im_start|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    },
+    "151645": {
+      "content": "<|im_end|>",
+      "lstrip": false,
+      "normalized": false,
+      "rstrip": false,
+      "single_word": false,
+      "special": true
+    }
+  },
+  "additional_special_tokens": [
+    "<|im_start|>",
+    "<|im_end|>"
+  ],
+  "bos_token": null,
+  "chat_template": "{% if not add_generation_prompt is defined %}{% set add_generation_prompt = false %}{% endif %}{% set loop_messages = messages %}{% for message in loop_messages %}{% set content = '<|start_header_id|>' + message['role'] + '<|end_header_id|>\n\n'+ message['content'] | trim + '<|eot_id|>' %}{% if loop.index0 == 0 %}{% set content = bos_token + content %}{% endif %}{{ content }}{% endfor %}{% if add_generation_prompt %}{{ '<|start_header_id|>assistant<|end_header_id|>\n\n' }}{% endif %}",
+  "clean_up_tokenization_spaces": false,
+  "eos_token": "<|im_end|>",
+  "errors": "replace",
+  "model_max_length": 32768,
+  "pad_token": "<|endoftext|>",
+  "split_special_tokens": false,
+  "tokenizer_class": "Qwen2Tokenizer",
+  "unk_token": null
+}

last-checkpoint/trainer_state.json ADDED Viewed

	@@ -0,0 +1,758 @@

+{
+  "best_metric": 2.2323501110076904,
+  "best_model_checkpoint": "miner_id_24/checkpoint-100",
+  "epoch": 0.016095607910991287,
+  "eval_steps": 100,
+  "global_step": 100,
+  "is_hyper_param_search": false,
+  "is_local_process_zero": true,
+  "is_world_process_zero": true,
+  "log_history": [
+    {
+      "epoch": 0.00016095607910991287,
+      "grad_norm": 0.8239232301712036,
+      "learning_rate": 2e-05,
+      "loss": 3.1051,
+      "step": 1
+    },
+    {
+      "epoch": 0.00016095607910991287,
+      "eval_loss": 2.4444730281829834,
+      "eval_runtime": 109.4431,
+      "eval_samples_per_second": 45.686,
+      "eval_steps_per_second": 11.421,
+      "step": 1
+    },
+    {
+      "epoch": 0.00032191215821982574,
+      "grad_norm": 0.6444323062896729,
+      "learning_rate": 4e-05,
+      "loss": 2.2086,
+      "step": 2
+    },
+    {
+      "epoch": 0.00048286823732973866,
+      "grad_norm": 0.9152227640151978,
+      "learning_rate": 6e-05,
+      "loss": 2.3418,
+      "step": 3
+    },
+    {
+      "epoch": 0.0006438243164396515,
+      "grad_norm": 0.8915662169456482,
+      "learning_rate": 8e-05,
+      "loss": 2.9389,
+      "step": 4
+    },
+    {
+      "epoch": 0.0008047803955495644,
+      "grad_norm": 0.6139128804206848,
+      "learning_rate": 0.0001,
+      "loss": 2.2369,
+      "step": 5
+    },
+    {
+      "epoch": 0.0009657364746594773,
+      "grad_norm": 0.7405723929405212,
+      "learning_rate": 0.00012,
+      "loss": 2.5273,
+      "step": 6
+    },
+    {
+      "epoch": 0.0011266925537693903,
+      "grad_norm": 0.7483253479003906,
+      "learning_rate": 0.00014,
+      "loss": 1.8396,
+      "step": 7
+    },
+    {
+      "epoch": 0.001287648632879303,
+      "grad_norm": 0.8752170205116272,
+      "learning_rate": 0.00016,
+      "loss": 3.1448,
+      "step": 8
+    },
+    {
+      "epoch": 0.0014486047119892159,
+      "grad_norm": 0.838828444480896,
+      "learning_rate": 0.00018,
+      "loss": 1.954,
+      "step": 9
+    },
+    {
+      "epoch": 0.0016095607910991288,
+      "grad_norm": 0.6572649478912354,
+      "learning_rate": 0.0002,
+      "loss": 2.1793,
+      "step": 10
+    },
+    {
+      "epoch": 0.0017705168702090417,
+      "grad_norm": 0.7829716205596924,
+      "learning_rate": 0.00019999999161160226,
+      "loss": 2.2376,
+      "step": 11
+    },
+    {
+      "epoch": 0.0019314729493189547,
+      "grad_norm": 0.7279601693153381,
+      "learning_rate": 0.00019999996644641036,
+      "loss": 2.2631,
+      "step": 12
+    },
+    {
+      "epoch": 0.0020924290284288676,
+      "grad_norm": 0.9361767768859863,
+      "learning_rate": 0.00019999992450442857,
+      "loss": 2.2853,
+      "step": 13
+    },
+    {
+      "epoch": 0.0022533851075387805,
+      "grad_norm": 0.7743573188781738,
+      "learning_rate": 0.00019999986578566397,
+      "loss": 2.1707,
+      "step": 14
+    },
+    {
+      "epoch": 0.0024143411866486934,
+      "grad_norm": 0.7922279834747314,
+      "learning_rate": 0.0001999997902901263,
+      "loss": 2.3178,
+      "step": 15
+    },
+    {
+      "epoch": 0.002575297265758606,
+      "grad_norm": 0.8866938352584839,
+      "learning_rate": 0.0001999996980178283,
+      "loss": 2.9112,
+      "step": 16
+    },
+    {
+      "epoch": 0.002736253344868519,
+      "grad_norm": 0.5131164789199829,
+      "learning_rate": 0.00019999958896878547,
+      "loss": 2.1523,
+      "step": 17
+    },
+    {
+      "epoch": 0.0028972094239784318,
+      "grad_norm": 0.6850908994674683,
+      "learning_rate": 0.00019999946314301604,
+      "loss": 1.6821,
+      "step": 18
+    },
+    {
+      "epoch": 0.0030581655030883447,
+      "grad_norm": 0.6150762438774109,
+      "learning_rate": 0.00019999932054054115,
+      "loss": 2.0289,
+      "step": 19
+    },
+    {
+      "epoch": 0.0032191215821982576,
+      "grad_norm": 0.656989336013794,
+      "learning_rate": 0.00019999916116138474,
+      "loss": 1.9743,
+      "step": 20
+    },
+    {
+      "epoch": 0.0033800776613081705,
+      "grad_norm": 0.5388839244842529,
+      "learning_rate": 0.00019999898500557356,
+      "loss": 2.0808,
+      "step": 21
+    },
+    {
+      "epoch": 0.0035410337404180835,
+      "grad_norm": 0.5361989736557007,
+      "learning_rate": 0.00019999879207313706,
+      "loss": 2.1778,
+      "step": 22
+    },
+    {
+      "epoch": 0.0037019898195279964,
+      "grad_norm": 0.6051269769668579,
+      "learning_rate": 0.00019999858236410776,
+      "loss": 1.9169,
+      "step": 23
+    },
+    {
+      "epoch": 0.0038629458986379093,
+      "grad_norm": 0.5815481543540955,
+      "learning_rate": 0.00019999835587852075,
+      "loss": 1.989,
+      "step": 24
+    },
+    {
+      "epoch": 0.004023901977747822,
+      "grad_norm": 0.5598827600479126,
+      "learning_rate": 0.000199998112616414,
+      "loss": 1.8985,
+      "step": 25
+    },
+    {
+      "epoch": 0.004184858056857735,
+      "grad_norm": 0.6512292623519897,
+      "learning_rate": 0.00019999785257782842,
+      "loss": 2.3026,
+      "step": 26
+    },
+    {
+      "epoch": 0.004345814135967648,
+      "grad_norm": 0.625409722328186,
+      "learning_rate": 0.00019999757576280756,
+      "loss": 2.3083,
+      "step": 27
+    },
+    {
+      "epoch": 0.004506770215077561,
+      "grad_norm": 0.6033584475517273,
+      "learning_rate": 0.0001999972821713979,
+      "loss": 1.6905,
+      "step": 28
+    },
+    {
+      "epoch": 0.0046677262941874735,
+      "grad_norm": 0.6385830640792847,
+      "learning_rate": 0.00019999697180364868,
+      "loss": 1.8802,
+      "step": 29
+    },
+    {
+      "epoch": 0.004828682373297387,
+      "grad_norm": 0.6070401072502136,
+      "learning_rate": 0.00019999664465961193,
+      "loss": 2.2706,
+      "step": 30
+    },
+    {
+      "epoch": 0.004989638452407299,
+      "grad_norm": 0.5202571749687195,
+      "learning_rate": 0.0001999963007393426,
+      "loss": 2.063,
+      "step": 31
+    },
+    {
+      "epoch": 0.005150594531517212,
+      "grad_norm": 0.6326085329055786,
+      "learning_rate": 0.00019999594004289838,
+      "loss": 2.2645,
+      "step": 32
+    },
+    {
+      "epoch": 0.005311550610627125,
+      "grad_norm": 1.0121084451675415,
+      "learning_rate": 0.00019999556257033974,
+      "loss": 2.3096,
+      "step": 33
+    },
+    {
+      "epoch": 0.005472506689737038,
+      "grad_norm": 0.7226786017417908,
+      "learning_rate": 0.00019999516832173005,
+      "loss": 2.371,
+      "step": 34
+    },
+    {
+      "epoch": 0.005633462768846951,
+      "grad_norm": 0.5786224007606506,
+      "learning_rate": 0.00019999475729713542,
+      "loss": 2.0462,
+      "step": 35
+    },
+    {
+      "epoch": 0.0057944188479568635,
+      "grad_norm": 0.5754494667053223,
+      "learning_rate": 0.00019999432949662483,
+      "loss": 2.1874,
+      "step": 36
+    },
+    {
+      "epoch": 0.005955374927066777,
+      "grad_norm": 0.6970561146736145,
+      "learning_rate": 0.00019999388492027002,
+      "loss": 2.4703,
+      "step": 37
+    },
+    {
+      "epoch": 0.006116331006176689,
+      "grad_norm": 0.7259479761123657,
+      "learning_rate": 0.00019999342356814563,
+      "loss": 2.3954,
+      "step": 38
+    },
+    {
+      "epoch": 0.006277287085286603,
+      "grad_norm": 0.6019427180290222,
+      "learning_rate": 0.00019999294544032902,
+      "loss": 2.0431,
+      "step": 39
+    },
+    {
+      "epoch": 0.006438243164396515,
+      "grad_norm": 0.7482038140296936,
+      "learning_rate": 0.00019999245053690042,
+      "loss": 2.6967,
+      "step": 40
+    },
+    {
+      "epoch": 0.006599199243506429,
+      "grad_norm": 0.5295757055282593,
+      "learning_rate": 0.0001999919388579428,
+      "loss": 1.744,
+      "step": 41
+    },
+    {
+      "epoch": 0.006760155322616341,
+      "grad_norm": 0.5582593083381653,
+      "learning_rate": 0.00019999141040354213,
+      "loss": 1.8934,
+      "step": 42
+    },
+    {
+      "epoch": 0.006921111401726254,
+      "grad_norm": 0.515265166759491,
+      "learning_rate": 0.00019999086517378696,
+      "loss": 1.8594,
+      "step": 43
+    },
+    {
+      "epoch": 0.007082067480836167,
+      "grad_norm": 0.5043320655822754,
+      "learning_rate": 0.00019999030316876878,
+      "loss": 1.853,
+      "step": 44
+    },
+    {
+      "epoch": 0.007243023559946079,
+      "grad_norm": 0.591961681842804,
+      "learning_rate": 0.00019998972438858191,
+      "loss": 2.1636,
+      "step": 45
+    },
+    {
+      "epoch": 0.007403979639055993,
+      "grad_norm": 0.5802595019340515,
+      "learning_rate": 0.0001999891288333234,
+      "loss": 2.1222,
+      "step": 46
+    },
+    {
+      "epoch": 0.007564935718165905,
+      "grad_norm": 0.6706711649894714,
+      "learning_rate": 0.00019998851650309324,
+      "loss": 2.2156,
+      "step": 47
+    },
+    {
+      "epoch": 0.007725891797275819,
+      "grad_norm": 0.7220948934555054,
+      "learning_rate": 0.0001999878873979941,
+      "loss": 2.2424,
+      "step": 48
+    },
+    {
+      "epoch": 0.007886847876385732,
+      "grad_norm": 0.5382461547851562,
+      "learning_rate": 0.00019998724151813155,
+      "loss": 1.7047,
+      "step": 49
+    },
+    {
+      "epoch": 0.008047803955495644,
+      "grad_norm": 0.6849808096885681,
+      "learning_rate": 0.00019998657886361395,
+      "loss": 1.8764,
+      "step": 50
+    },
+    {
+      "epoch": 0.008208760034605557,
+      "grad_norm": 0.6215450167655945,
+      "learning_rate": 0.00019998589943455244,
+      "loss": 2.0911,
+      "step": 51
+    },
+    {
+      "epoch": 0.00836971611371547,
+      "grad_norm": 0.7424401640892029,
+      "learning_rate": 0.000199985203231061,
+      "loss": 2.2645,
+      "step": 52
+    },
+    {
+      "epoch": 0.008530672192825382,
+      "grad_norm": 0.5827590227127075,
+      "learning_rate": 0.00019998449025325653,
+      "loss": 2.5635,
+      "step": 53
+    },
+    {
+      "epoch": 0.008691628271935295,
+      "grad_norm": 0.5917181372642517,
+      "learning_rate": 0.0001999837605012585,
+      "loss": 2.306,
+      "step": 54
+    },
+    {
+      "epoch": 0.008852584351045209,
+      "grad_norm": 0.6622992753982544,
+      "learning_rate": 0.00019998301397518943,
+      "loss": 1.8905,
+      "step": 55
+    },
+    {
+      "epoch": 0.009013540430155122,
+      "grad_norm": 0.6982994079589844,
+      "learning_rate": 0.00019998225067517457,
+      "loss": 2.2533,
+      "step": 56
+    },
+    {
+      "epoch": 0.009174496509265034,
+      "grad_norm": 0.48716649413108826,
+      "learning_rate": 0.00019998147060134195,
+      "loss": 1.6194,
+      "step": 57
+    },
+    {
+      "epoch": 0.009335452588374947,
+      "grad_norm": 0.6089860796928406,
+      "learning_rate": 0.00019998067375382245,
+      "loss": 2.1082,
+      "step": 58
+    },
+    {
+      "epoch": 0.00949640866748486,
+      "grad_norm": 0.6059070825576782,
+      "learning_rate": 0.00019997986013274972,
+      "loss": 1.6228,
+      "step": 59
+    },
+    {
+      "epoch": 0.009657364746594774,
+      "grad_norm": 0.6720227599143982,
+      "learning_rate": 0.0001999790297382603,
+      "loss": 1.9188,
+      "step": 60
+    },
+    {
+      "epoch": 0.009818320825704685,
+      "grad_norm": 0.5905241370201111,
+      "learning_rate": 0.0001999781825704935,
+      "loss": 2.3402,
+      "step": 61
+    },
+    {
+      "epoch": 0.009979276904814599,
+      "grad_norm": 0.6345643401145935,
+      "learning_rate": 0.0001999773186295914,
+      "loss": 2.2921,
+      "step": 62
+    },
+    {
+      "epoch": 0.010140232983924512,
+      "grad_norm": 0.6146743297576904,
+      "learning_rate": 0.00019997643791569902,
+      "loss": 2.2883,
+      "step": 63
+    },
+    {
+      "epoch": 0.010301189063034424,
+      "grad_norm": 0.6268457770347595,
+      "learning_rate": 0.00019997554042896408,
+      "loss": 2.1154,
+      "step": 64
+    },
+    {
+      "epoch": 0.010462145142144337,
+      "grad_norm": 0.7566269040107727,
+      "learning_rate": 0.00019997462616953713,
+      "loss": 2.1394,
+      "step": 65
+    },
+    {
+      "epoch": 0.01062310122125425,
+      "grad_norm": 0.7165527939796448,
+      "learning_rate": 0.00019997369513757157,
+      "loss": 2.2735,
+      "step": 66
+    },
+    {
+      "epoch": 0.010784057300364164,
+      "grad_norm": 0.5747542381286621,
+      "learning_rate": 0.00019997274733322363,
+      "loss": 1.9906,
+      "step": 67
+    },
+    {
+      "epoch": 0.010945013379474075,
+      "grad_norm": 0.6594597101211548,
+      "learning_rate": 0.00019997178275665226,
+      "loss": 1.7522,
+      "step": 68
+    },
+    {
+      "epoch": 0.011105969458583989,
+      "grad_norm": 0.6624965667724609,
+      "learning_rate": 0.00019997080140801932,
+      "loss": 1.8204,
+      "step": 69
+    },
+    {
+      "epoch": 0.011266925537693902,
+      "grad_norm": 0.6621134281158447,
+      "learning_rate": 0.00019996980328748943,
+      "loss": 2.28,
+      "step": 70
+    },
+    {
+      "epoch": 0.011427881616803815,
+      "grad_norm": 0.948077380657196,
+      "learning_rate": 0.00019996878839523006,
+      "loss": 2.2255,
+      "step": 71
+    },
+    {
+      "epoch": 0.011588837695913727,
+      "grad_norm": 0.6168230772018433,
+      "learning_rate": 0.00019996775673141145,
+      "loss": 2.2045,
+      "step": 72
+    },
+    {
+      "epoch": 0.01174979377502364,
+      "grad_norm": 0.6702795624732971,
+      "learning_rate": 0.00019996670829620675,
+      "loss": 2.777,
+      "step": 73
+    },
+    {
+      "epoch": 0.011910749854133554,
+      "grad_norm": 0.6251347064971924,
+      "learning_rate": 0.0001999656430897918,
+      "loss": 2.3048,
+      "step": 74
+    },
+    {
+      "epoch": 0.012071705933243465,
+      "grad_norm": 0.5934998393058777,
+      "learning_rate": 0.00019996456111234527,
+      "loss": 2.1877,
+      "step": 75
+    },
+    {
+      "epoch": 0.012232662012353379,
+      "grad_norm": 0.7205228805541992,
+      "learning_rate": 0.00019996346236404874,
+      "loss": 2.4349,
+      "step": 76
+    },
+    {
+      "epoch": 0.012393618091463292,
+      "grad_norm": 0.5706772208213806,
+      "learning_rate": 0.00019996234684508654,
+      "loss": 2.2719,
+      "step": 77
+    },
+    {
+      "epoch": 0.012554574170573206,
+      "grad_norm": 0.6610347628593445,
+      "learning_rate": 0.0001999612145556458,
+      "loss": 2.4465,
+      "step": 78
+    },
+    {
+      "epoch": 0.012715530249683117,
+      "grad_norm": 0.7257715463638306,
+      "learning_rate": 0.00019996006549591647,
+      "loss": 1.9567,
+      "step": 79
+    },
+    {
+      "epoch": 0.01287648632879303,
+      "grad_norm": 0.6126548647880554,
+      "learning_rate": 0.0001999588996660914,
+      "loss": 2.3073,
+      "step": 80
+    },
+    {
+      "epoch": 0.013037442407902944,
+      "grad_norm": 0.637026309967041,
+      "learning_rate": 0.00019995771706636607,
+      "loss": 1.6051,
+      "step": 81
+    },
+    {
+      "epoch": 0.013198398487012857,
+      "grad_norm": 0.7010881900787354,
+      "learning_rate": 0.00019995651769693894,
+      "loss": 2.4159,
+      "step": 82
+    },
+    {
+      "epoch": 0.013359354566122769,
+      "grad_norm": 0.6204814314842224,
+      "learning_rate": 0.00019995530155801126,
+      "loss": 1.9385,
+      "step": 83
+    },
+    {
+      "epoch": 0.013520310645232682,
+      "grad_norm": 0.6239871382713318,
+      "learning_rate": 0.000199954068649787,
+      "loss": 2.2167,
+      "step": 84
+    },
+    {
+      "epoch": 0.013681266724342596,
+      "grad_norm": 0.6641436219215393,
+      "learning_rate": 0.000199952818972473,
+      "loss": 2.0527,
+      "step": 85
+    },
+    {
+      "epoch": 0.013842222803452507,
+      "grad_norm": 0.7974081039428711,
+      "learning_rate": 0.00019995155252627896,
+      "loss": 2.5447,
+      "step": 86
+    },
+    {
+      "epoch": 0.01400317888256242,
+      "grad_norm": 0.5787174105644226,
+      "learning_rate": 0.00019995026931141733,
+      "loss": 1.9581,
+      "step": 87
+    },
+    {
+      "epoch": 0.014164134961672334,
+      "grad_norm": 0.6373708844184875,
+      "learning_rate": 0.00019994896932810338,
+      "loss": 1.7394,
+      "step": 88
+    },
+    {
+      "epoch": 0.014325091040782247,
+      "grad_norm": 0.5949546098709106,
+      "learning_rate": 0.00019994765257655523,
+      "loss": 1.8841,
+      "step": 89
+    },
+    {
+      "epoch": 0.014486047119892159,
+      "grad_norm": 0.6260464787483215,
+      "learning_rate": 0.00019994631905699377,
+      "loss": 1.9624,
+      "step": 90
+    },
+    {
+      "epoch": 0.014647003199002072,
+      "grad_norm": 0.5567226409912109,
+      "learning_rate": 0.00019994496876964274,
+      "loss": 1.7877,
+      "step": 91
+    },
+    {
+      "epoch": 0.014807959278111986,
+      "grad_norm": 0.7285164594650269,
+      "learning_rate": 0.00019994360171472861,
+      "loss": 2.2064,
+      "step": 92
+    },
+    {
+      "epoch": 0.014968915357221899,
+      "grad_norm": 0.7293297052383423,
+      "learning_rate": 0.00019994221789248082,
+      "loss": 2.262,
+      "step": 93
+    },
+    {
+      "epoch": 0.01512987143633181,
+      "grad_norm": 0.649803876876831,
+      "learning_rate": 0.0001999408173031315,
+      "loss": 2.0183,
+      "step": 94
+    },
+    {
+      "epoch": 0.015290827515441724,
+      "grad_norm": 0.7172336578369141,
+      "learning_rate": 0.00019993939994691556,
+      "loss": 1.9329,
+      "step": 95
+    },
+    {
+      "epoch": 0.015451783594551637,
+      "grad_norm": 0.7766873240470886,
+      "learning_rate": 0.0001999379658240709,
+      "loss": 2.6441,
+      "step": 96
+    },
+    {
+      "epoch": 0.015612739673661549,
+      "grad_norm": 0.6740883588790894,
+      "learning_rate": 0.00019993651493483802,
+      "loss": 2.2064,
+      "step": 97
+    },
+    {
+      "epoch": 0.015773695752771464,
+      "grad_norm": 0.6534631848335266,
+      "learning_rate": 0.0001999350472794604,
+      "loss": 2.1439,
+      "step": 98
+    },
+    {
+      "epoch": 0.015934651831881374,
+      "grad_norm": 0.5048854351043701,
+      "learning_rate": 0.00019993356285818423,
+      "loss": 1.71,
+      "step": 99
+    },
+    {
+      "epoch": 0.016095607910991287,
+      "grad_norm": 0.7882394194602966,
+      "learning_rate": 0.00019993206167125856,
+      "loss": 2.6218,
+      "step": 100
+    },
+    {
+      "epoch": 0.016095607910991287,
+      "eval_loss": 2.2323501110076904,
+      "eval_runtime": 109.533,
+      "eval_samples_per_second": 45.648,
+      "eval_steps_per_second": 11.412,
+      "step": 100
+    }
+  ],
+  "logging_steps": 1,
+  "max_steps": 7680,
+  "num_input_tokens_seen": 0,
+  "num_train_epochs": 2,
+  "save_steps": 100,
+  "stateful_callbacks": {
+    "EarlyStoppingCallback": {
+      "args": {
+        "early_stopping_patience": 2,
+        "early_stopping_threshold": 0.0
+      },
+      "attributes": {
+        "early_stopping_patience_counter": 0
+      }
+    },
+    "TrainerControl": {
+      "args": {
+        "should_epoch_stop": false,
+        "should_evaluate": false,
+        "should_log": false,
+        "should_save": true,
+        "should_training_stop": false
+      },
+      "attributes": {}
+    }
+  },
+  "total_flos": 9705847274864640.0,
+  "train_batch_size": 4,
+  "trial_name": null,
+  "trial_params": null
+}

last-checkpoint/training_args.bin ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:31a7519a5398cac75ed7e0cab4e3f1b466e72ff5e48ce28c7f2f8229431c117c
+size 6776

last-checkpoint/vocab.json ADDED Viewed

The diff for this file is too large to render. See raw diff