Upload README.md with huggingface_hub
Browse files
README.md
ADDED
|
@@ -0,0 +1,170 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 1 |
+
---
|
| 2 |
+
language: en
|
| 3 |
+
license: apache-2.0
|
| 4 |
+
model_name: t5-encoder-12.onnx
|
| 5 |
+
tags:
|
| 6 |
+
- validated
|
| 7 |
+
- text
|
| 8 |
+
- machine_comprehension
|
| 9 |
+
- t5
|
| 10 |
+
---
|
| 11 |
+
<!--- SPDX-License-Identifier: Apache-2.0 -->
|
| 12 |
+
|
| 13 |
+
# T5
|
| 14 |
+
|
| 15 |
+
## Use-cases
|
| 16 |
+
Transformer-based language model trained on multiple tasks including summarization, sentiment-analysis, q&a, translation etc.
|
| 17 |
+
The implementation in this repo is an adaptation of the [onnxt5 repo](https://github.com/abelriboulot/onnxt5) which makes the export and use of T5 with ONNX easier.
|
| 18 |
+
|
| 19 |
+
## Description
|
| 20 |
+
[T5](https://arxiv.org/abs/1910.10683) is a transformer model which aims to provide great flexibility and provide better semantic
|
| 21 |
+
understanding through the training of multiple tasks at once.
|
| 22 |
+
|
| 23 |
+
## Model
|
| 24 |
+
|
| 25 |
+
| Model | Download | Download (with sample test data) | ONNX version | Opset version |
|
| 26 |
+
| ----------- | ---------- |--------------| -------------- | -------------- |
|
| 27 |
+
|T5-encoder |[650.6 MB](model/t5-encoder-12.onnx) | [205.0 MB](model/t5-encoder-12.tar.gz)| 1.7 | 12
|
| 28 |
+
|T5-decoder-with-lm-head |[304.9 MB](model/t5-decoder-with-lm-head-12.onnx) | [304.9 MB](model/t5-decoder-with-lm-head-12.tar.gz)| 1.7 | 12
|
| 29 |
+
|
| 30 |
+
|
| 31 |
+
### Source
|
| 32 |
+
Huggingface PyTorch T5 + script changes ==> ONNX T5-encoder
|
| 33 |
+
|
| 34 |
+
Huggingface PyTorch T5 + script changes ==> ONNX T5-decoder-with-lm-head
|
| 35 |
+
|
| 36 |
+
Script changes include:
|
| 37 |
+
- reshaping the Huggingface models to combine the lm head with the decoder to allow for a unified model
|
| 38 |
+
- reshaping the encoder to output the hidden state directly
|
| 39 |
+
|
| 40 |
+
## Inference
|
| 41 |
+
The script for ONNX model conversion and ONNX Runtime inference is [here](dependencies/T5-export.py).
|
| 42 |
+
More complete utilities to export and use the models are maintained in the [onnxt5 repo](https://github.com/abelriboulot/onnxt5).
|
| 43 |
+
|
| 44 |
+
### Input to model
|
| 45 |
+
This implementation takes as inputs a prompt which begins by the task at hand here. Examples of some tasks include ```summarize: <PROMPT>```,
|
| 46 |
+
```translate English to French: <PROMPT>```, ```cola sentence: <PROMPT>```, etc.
|
| 47 |
+
For the full list of task you can refer to the appendix D of the [original paper](https://arxiv.org/pdf/1910.10683.pdf).
|
| 48 |
+
|
| 49 |
+
|
| 50 |
+
### Preprocessing steps
|
| 51 |
+
The easiest way to use the model is to use the onnxt5 utilities (installation instructions: ```pip install onnxt5```).
|
| 52 |
+
|
| 53 |
+
In that case you can use the model with the following piece of code:
|
| 54 |
+
```python
|
| 55 |
+
from onnxt5 import GenerativeT5
|
| 56 |
+
from onnxt5.api import get_encoder_decoder_tokenizer
|
| 57 |
+
decoder_sess, encoder_sess, tokenizer = get_encoder_decoder_tokenizer()
|
| 58 |
+
generative_t5 = GenerativeT5(encoder_sess, decoder_sess, tokenizer, onnx=True)
|
| 59 |
+
prompt = 'translate English to French: I was a victim of a series of accidents.'
|
| 60 |
+
output_text, output_logits = generative_t5(prompt, max_length=100, temperature=0.)
|
| 61 |
+
# output_text: "J'ai été victime d'une série d'accidents."
|
| 62 |
+
```
|
| 63 |
+
|
| 64 |
+
Or if you wish to produce the embeddings of a sentence:
|
| 65 |
+
```python
|
| 66 |
+
from onnxt5.api import get_encoder_decoder_tokenizer, run_embeddings_text
|
| 67 |
+
|
| 68 |
+
decoder_sess, encoder_sess, tokenizer = get_encoder_decoder_tokenizer()
|
| 69 |
+
prompt = 'Listen, Billy Pilgrim has come unstuck in time.'
|
| 70 |
+
encoder_embeddings, decoder_embeddings = run_embeddings_text(encoder_sess, decoder_sess, tokenizer, prompt)
|
| 71 |
+
```
|
| 72 |
+
|
| 73 |
+
Otherwise you can manually create the Generative model with the following:
|
| 74 |
+
|
| 75 |
+
```python
|
| 76 |
+
from onnxruntime import InferenceSession
|
| 77 |
+
from transformers import T5Tokenizer
|
| 78 |
+
from .dependencies.models import GenerativeT5
|
| 79 |
+
|
| 80 |
+
tokenizer = T5Tokenizer.from_pretrained('t5-base')
|
| 81 |
+
|
| 82 |
+
# Start from ORT 1.10, ORT requires explicitly setting the providers parameter if you want to use execution providers
|
| 83 |
+
# other than the default CPU provider (as opposed to the previous behavior of providers getting set/registered by default
|
| 84 |
+
# based on the build flags) when instantiating InferenceSession.
|
| 85 |
+
# For example, if NVIDIA GPU is available and ORT Python package is built with CUDA, then call API as following:
|
| 86 |
+
# InferenceSession(path/to/model, providers=['CUDAExecutionProvider'])
|
| 87 |
+
decoder_sess = InferenceSession(str(path_t5_decoder))
|
| 88 |
+
encoder_sess = InferenceSession(str(path_t5_encoder))
|
| 89 |
+
generative_t5 = GenerativeT5(encoder_sess, decoder_sess, tokenizer, onnx=True)
|
| 90 |
+
generative_t5('translate English to French: I was a victim of a series of accidents.', 21, temperature=0.)[0]
|
| 91 |
+
```
|
| 92 |
+
|
| 93 |
+
### Output of model
|
| 94 |
+
For the T5-encoder model:
|
| 95 |
+
|
| 96 |
+
**last_hidden_state**: Sequence of hidden-states at the last layer of the model. It's a float tensor of size (batch_size, sequence_length, hidden_size).
|
| 97 |
+
|
| 98 |
+
For T5-decoder-with-lm-head model:
|
| 99 |
+
|
| 100 |
+
**logit_predictions**: Prediction scores of the language modeling head. It's a float tensor of size (batch_size, sequence_length, vocab_size).
|
| 101 |
+
|
| 102 |
+
### Postprocessing steps
|
| 103 |
+
For the T5-encoder model:
|
| 104 |
+
|
| 105 |
+
```python
|
| 106 |
+
last_hidden_states = model(input_ids)[0]
|
| 107 |
+
```
|
| 108 |
+
|
| 109 |
+
For the T5-decoder-with-lm-head model:
|
| 110 |
+
|
| 111 |
+
```python
|
| 112 |
+
# To generate the encoder's last hidden state
|
| 113 |
+
encoder_output = encoder_sess.run(None, {"input_ids": input_ids})[0]
|
| 114 |
+
# To generate the full model's embeddings
|
| 115 |
+
decoder_output = decoder_sess.run(None, {
|
| 116 |
+
"input_ids": input_ids,
|
| 117 |
+
"encoder_hidden_states": encoder_output
|
| 118 |
+
})[0]
|
| 119 |
+
```
|
| 120 |
+
|
| 121 |
+
For the generative model, to generate a translation:
|
| 122 |
+
```
|
| 123 |
+
from onnxt5 import GenerativeT5
|
| 124 |
+
from onnxt5.api import get_encoder_decoder_tokenizer
|
| 125 |
+
decoder_sess, encoder_sess, tokenizer = get_encoder_decoder_tokenizer()
|
| 126 |
+
generative_t5 = GenerativeT5(encoder_sess, decoder_sess, tokenizer, onnx=True)
|
| 127 |
+
prompt = 'translate English to French: I was a victim of a series of accidents.'
|
| 128 |
+
output_text, output_logits = generative_t5(prompt, max_length=100, temperature=0.)
|
| 129 |
+
```
|
| 130 |
+
<hr>
|
| 131 |
+
|
| 132 |
+
## Dataset (Train and validation)
|
| 133 |
+
The original model from Google Brain is pretrained on the [Colossal Clean Crawled Corpus](https://www.tensorflow.org/datasets/catalog/c4).
|
| 134 |
+
The pretrained model is referenced in [huggingface/transformers](https://github.com/huggingface/transformers/blob/master/transformers/modeling_t5.py), trained on the same data.
|
| 135 |
+
<hr>
|
| 136 |
+
|
| 137 |
+
## Validation accuracy
|
| 138 |
+
Benchmarking can be run with the following [script](https://github.com/abelriboulot/onnxt5/blob/master/notebooks/benchmark_performance.ipynb) with initial results in this [post](https://kta.io/posts/onnx_t5).
|
| 139 |
+
<hr>
|
| 140 |
+
|
| 141 |
+
|
| 142 |
+
## Publication/Attribution
|
| 143 |
+
This repo is based on the work of Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and
|
| 144 |
+
Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu from Google, as well as the implementation of T5 from the
|
| 145 |
+
huggingface team, the work of the Microsoft ONNX and onnxruntime teams, in particular Tianlei Wu, and the work of Thomas Wolf on generation of text.
|
| 146 |
+
|
| 147 |
+
[Original T5 Paper](https://arxiv.org/pdf/1910.10683.pdf)
|
| 148 |
+
```
|
| 149 |
+
@article{2019t5,
|
| 150 |
+
author = {Colin Raffel and Noam Shazeer and Adam Roberts and Katherine Lee and Sharan Narang and Michael Matena and Yanqi Zhou and Wei Li and Peter J. Liu},
|
| 151 |
+
title = {Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer},
|
| 152 |
+
journal = {arXiv e-prints},
|
| 153 |
+
year = {2019},
|
| 154 |
+
archivePrefix = {arXiv},
|
| 155 |
+
eprint = {1910.10683},
|
| 156 |
+
}
|
| 157 |
+
```
|
| 158 |
+
|
| 159 |
+
## References
|
| 160 |
+
This model is converted directly from [huggingface/transformers](https://github.com/huggingface/transformers/blob/master/src/transformers/modeling_t5.py).
|
| 161 |
+
<hr>
|
| 162 |
+
|
| 163 |
+
## Contributors
|
| 164 |
+
[Abel Riboulot](https://github.com/abelriboulot)
|
| 165 |
+
<hr>
|
| 166 |
+
|
| 167 |
+
## License
|
| 168 |
+
Apache 2.0 License
|
| 169 |
+
<hr>
|
| 170 |
+
|