dfurman
/

Falcon-40B-Chat-v0.1

Text Generation

PEFT

Safetensors

Eval Results

Model card Files Files and versions

xet

Community

dfurman commited on May 31, 2023

Commit

85824ac

1 Parent(s): 5f3b0c5

Update README.md

Browse files

Files changed (1) hide show

README.md +7 -9

README.md CHANGED Viewed

@@ -7,8 +7,8 @@ pipeline_tag: text-generation
 # Falcon-40b-chat-oasst1
 Falcon-40b-chat-oasst1 is a chatbot-like model for dialogue generation. It was built by fine-tuning [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b) on the [OpenAssistant/oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1) dataset.
-This model was fine-tuned in 4-bit using 🤗 [peft](https://github.com/huggingface/peft) adapters, [transformers](https://github.com/huggingface/transformers), and [bitsandbytes](https://github.com/TimDettmers/bitsandbytes).
-- The training relied on a recent method called "Low Rank Adapters" ([LoRA](https://arxiv.org/pdf/2106.09685.pdf)), specifically the [QLoRA](https://arxiv.org/abs/2305.14314) variant. Instead of fine-tuning the entire model you fine-tune lightweight adapters and load them inside the base model at inference.
 - Training took approximately 10 hours and was executed on a workstation with a single A100-SXM NVIDIA GPU with 37 GB of available memory (via Google Colab).
 - See attached [Notebook](https://huggingface.co/dfurman/falcon-40b-chat-oasst1/blob/main/finetune_falcon40b_oasst1_with_bnb_peft.ipynb) for the code (and hyperparams) used to train the model.
@@ -94,15 +94,11 @@ We recommend users of this model to develop guardrails and to take appropriate p
 ### Setup
 ```python
-# Install and import packages
 !pip install -q -U bitsandbytes loralib einops
 !pip install -q -U git+https://github.com/huggingface/transformers.git
 !pip install -q -U git+https://github.com/huggingface/peft.git
 !pip install -q -U git+https://github.com/huggingface/accelerate.git
-import torch
-from peft import PeftModel, PeftConfig
-from transformers import AutoModelForCausalLM, AutoTokenizer
 ```
 ### GPU Inference in 4-bit
@@ -110,6 +106,10 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
 This requires a GPU with at least 27GB memory.
 ```python
 # load the model
 peft_model_id = "dfurman/falcon-40b-chat-oasst1"
 config = PeftConfig.from_pretrained(peft_model_id)
@@ -133,9 +133,7 @@ tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
 tokenizer.pad_token = tokenizer.eos_token
 model = PeftModel.from_pretrained(model, peft_model_id)
-```
-```python
 # run the model
 prompt = """<human>: My name is Daniel. Write a short email to my closest friends inviting them to come to my home on Friday for a dinner party, I will make the food but tell them to BYOB.
 <bot>:"""

 # Falcon-40b-chat-oasst1
 Falcon-40b-chat-oasst1 is a chatbot-like model for dialogue generation. It was built by fine-tuning [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b) on the [OpenAssistant/oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1) dataset.
+- The model was fine-tuned in 4-bit precision using `peft`, `transformers`, and `bitsandbytes`.
+- The training relied on a method called "Low Rank Adapters" ([LoRA](https://arxiv.org/pdf/2106.09685.pdf)), specifically the [QLoRA](https://arxiv.org/abs/2305.14314) variant. Instead of fine-tuning the entire model you fine-tune lightweight adapters and load them inside the base model at inference.
 - Training took approximately 10 hours and was executed on a workstation with a single A100-SXM NVIDIA GPU with 37 GB of available memory (via Google Colab).
 - See attached [Notebook](https://huggingface.co/dfurman/falcon-40b-chat-oasst1/blob/main/finetune_falcon40b_oasst1_with_bnb_peft.ipynb) for the code (and hyperparams) used to train the model.
 ### Setup
 ```python
+# Install packages
 !pip install -q -U bitsandbytes loralib einops
 !pip install -q -U git+https://github.com/huggingface/transformers.git
 !pip install -q -U git+https://github.com/huggingface/peft.git
 !pip install -q -U git+https://github.com/huggingface/accelerate.git
 ```
 ### GPU Inference in 4-bit
 This requires a GPU with at least 27GB memory.
 ```python
+import torch
+from peft import PeftModel, PeftConfig
+from transformers import AutoModelForCausalLM, AutoTokenizer
 # load the model
 peft_model_id = "dfurman/falcon-40b-chat-oasst1"
 config = PeftConfig.from_pretrained(peft_model_id)
 tokenizer.pad_token = tokenizer.eos_token
 model = PeftModel.from_pretrained(model, peft_model_id)
 # run the model
 prompt = """<human>: My name is Daniel. Write a short email to my closest friends inviting them to come to my home on Friday for a dinner party, I will make the food but tell them to BYOB.
 <bot>:"""