| | --- |
| | license: mit |
| | datasets: |
| | - COGNANO/VHHCorpus-2M |
| | library_name: transformers |
| | tags: |
| | - biology |
| | - protein |
| | - antibody |
| | - VHH |
| | --- |
| | |
| | ## VHHBERT |
| |
|
| | VHHBERT is a RoBERTa-based model pre-trained on two million VHH sequences in [VHHCorpus-2M](https://huggingface.co/datasets/COGNANO/VHHCorpus-2M). |
| | VHHBERT has the same model parameters as RoBERTa<sub>BASE</sub>, except that it used positional embeddings with a length of 185 to cover the maximum sequence length of 179 in VHHCorpus-2M. |
| | Further details on VHHBERT are described in our paper "[A SARS-CoV-2 Interaction Dataset and VHH Sequence Corpus for Antibody Language Models](https://arxiv.org/abs/2405.18749).” |
| |
|
| | ## Usage |
| |
|
| | The model and tokenizer can be loaded using the `transformers` library. |
| |
|
| | ```python |
| | from transformers import BertTokenizer, RobertaModel |
| | tokenizer = BertTokenizer.from_pretrained("COGNANO/VHHBERT") |
| | model = RobertaModel.from_pretrained("COGNANO/VHHBERT") |
| | ``` |
| |
|
| | ## Links |
| |
|
| | - Pre-training Corpus: https://huggingface.co/datasets/COGNANO/VHHCorpus-2M |
| | - Code: https://github.com/cognano/AVIDa-SARS-CoV-2 |
| | - Paper: https://arxiv.org/abs/2405.18749 |
| |
|
| | ## Citation |
| |
|
| | If you use VHHBERT in your research, please cite the following paper. |
| |
|
| | ```bibtex |
| | @inproceedings{tsuruta2024sars, |
| | title={A {SARS}-{C}o{V}-2 Interaction Dataset and {VHH} Sequence Corpus for Antibody Language Models}, |
| | author={Hirofumi Tsuruta and Hiroyuki Yamazaki and Ryota Maeda and Ryotaro Tamura and Akihiro Imura}, |
| | booktitle={Advances in Neural Information Processing Systems 37}, |
| | year={2024} |
| | } |
| | ``` |