Update README.md
Browse files
README.md
CHANGED
|
@@ -7,8 +7,8 @@ pipeline_tag: text-generation
|
|
| 7 |
# Falcon-40b-chat-oasst1
|
| 8 |
|
| 9 |
Falcon-40b-chat-oasst1 is a chatbot-like model for dialogue generation. It was built by fine-tuning [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b) on the [OpenAssistant/oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1) dataset.
|
| 10 |
-
|
| 11 |
-
- The training relied on a
|
| 12 |
- Training took approximately 10 hours and was executed on a workstation with a single A100-SXM NVIDIA GPU with 37 GB of available memory (via Google Colab).
|
| 13 |
- See attached [Notebook](https://huggingface.co/dfurman/falcon-40b-chat-oasst1/blob/main/finetune_falcon40b_oasst1_with_bnb_peft.ipynb) for the code (and hyperparams) used to train the model.
|
| 14 |
|
|
@@ -94,15 +94,11 @@ We recommend users of this model to develop guardrails and to take appropriate p
|
|
| 94 |
|
| 95 |
### Setup
|
| 96 |
```python
|
| 97 |
-
# Install
|
| 98 |
!pip install -q -U bitsandbytes loralib einops
|
| 99 |
!pip install -q -U git+https://github.com/huggingface/transformers.git
|
| 100 |
!pip install -q -U git+https://github.com/huggingface/peft.git
|
| 101 |
!pip install -q -U git+https://github.com/huggingface/accelerate.git
|
| 102 |
-
|
| 103 |
-
import torch
|
| 104 |
-
from peft import PeftModel, PeftConfig
|
| 105 |
-
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 106 |
```
|
| 107 |
|
| 108 |
### GPU Inference in 4-bit
|
|
@@ -110,6 +106,10 @@ from transformers import AutoModelForCausalLM, AutoTokenizer
|
|
| 110 |
This requires a GPU with at least 27GB memory.
|
| 111 |
|
| 112 |
```python
|
|
|
|
|
|
|
|
|
|
|
|
|
| 113 |
# load the model
|
| 114 |
peft_model_id = "dfurman/falcon-40b-chat-oasst1"
|
| 115 |
config = PeftConfig.from_pretrained(peft_model_id)
|
|
@@ -133,9 +133,7 @@ tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
|
|
| 133 |
tokenizer.pad_token = tokenizer.eos_token
|
| 134 |
|
| 135 |
model = PeftModel.from_pretrained(model, peft_model_id)
|
| 136 |
-
```
|
| 137 |
|
| 138 |
-
```python
|
| 139 |
# run the model
|
| 140 |
prompt = """<human>: My name is Daniel. Write a short email to my closest friends inviting them to come to my home on Friday for a dinner party, I will make the food but tell them to BYOB.
|
| 141 |
<bot>:"""
|
|
|
|
| 7 |
# Falcon-40b-chat-oasst1
|
| 8 |
|
| 9 |
Falcon-40b-chat-oasst1 is a chatbot-like model for dialogue generation. It was built by fine-tuning [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b) on the [OpenAssistant/oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1) dataset.
|
| 10 |
+
- The model was fine-tuned in 4-bit precision using `peft`, `transformers`, and `bitsandbytes`.
|
| 11 |
+
- The training relied on a method called "Low Rank Adapters" ([LoRA](https://arxiv.org/pdf/2106.09685.pdf)), specifically the [QLoRA](https://arxiv.org/abs/2305.14314) variant. Instead of fine-tuning the entire model you fine-tune lightweight adapters and load them inside the base model at inference.
|
| 12 |
- Training took approximately 10 hours and was executed on a workstation with a single A100-SXM NVIDIA GPU with 37 GB of available memory (via Google Colab).
|
| 13 |
- See attached [Notebook](https://huggingface.co/dfurman/falcon-40b-chat-oasst1/blob/main/finetune_falcon40b_oasst1_with_bnb_peft.ipynb) for the code (and hyperparams) used to train the model.
|
| 14 |
|
|
|
|
| 94 |
|
| 95 |
### Setup
|
| 96 |
```python
|
| 97 |
+
# Install packages
|
| 98 |
!pip install -q -U bitsandbytes loralib einops
|
| 99 |
!pip install -q -U git+https://github.com/huggingface/transformers.git
|
| 100 |
!pip install -q -U git+https://github.com/huggingface/peft.git
|
| 101 |
!pip install -q -U git+https://github.com/huggingface/accelerate.git
|
|
|
|
|
|
|
|
|
|
|
|
|
| 102 |
```
|
| 103 |
|
| 104 |
### GPU Inference in 4-bit
|
|
|
|
| 106 |
This requires a GPU with at least 27GB memory.
|
| 107 |
|
| 108 |
```python
|
| 109 |
+
import torch
|
| 110 |
+
from peft import PeftModel, PeftConfig
|
| 111 |
+
from transformers import AutoModelForCausalLM, AutoTokenizer
|
| 112 |
+
|
| 113 |
# load the model
|
| 114 |
peft_model_id = "dfurman/falcon-40b-chat-oasst1"
|
| 115 |
config = PeftConfig.from_pretrained(peft_model_id)
|
|
|
|
| 133 |
tokenizer.pad_token = tokenizer.eos_token
|
| 134 |
|
| 135 |
model = PeftModel.from_pretrained(model, peft_model_id)
|
|
|
|
| 136 |
|
|
|
|
| 137 |
# run the model
|
| 138 |
prompt = """<human>: My name is Daniel. Write a short email to my closest friends inviting them to come to my home on Friday for a dinner party, I will make the food but tell them to BYOB.
|
| 139 |
<bot>:"""
|