COGNANO
/

VHHBERT

Model card Files Files and versions

VHHBERT / README.md

tsurubee's picture

Update bibtex citation

cd6341d verified over 1 year ago

|

history blame contribute delete

1.53 kB

	---
	license: mit
	datasets:
	- COGNANO/VHHCorpus-2M
	library_name: transformers
	tags:
	- biology
	- protein
	- antibody
	- VHH
	---

	## VHHBERT

	VHHBERT is a RoBERTa-based model pre-trained on two million VHH sequences in [VHHCorpus-2M](https://huggingface.co/datasets/COGNANO/VHHCorpus-2M).
	VHHBERT has the same model parameters as RoBERTa<sub>BASE</sub>, except that it used positional embeddings with a length of 185 to cover the maximum sequence length of 179 in VHHCorpus-2M.
	Further details on VHHBERT are described in our paper "[A SARS-CoV-2 Interaction Dataset and VHH Sequence Corpus for Antibody Language Models](https://arxiv.org/abs/2405.18749).”

	## Usage

	The model and tokenizer can be loaded using the `transformers` library.

	```python
	from transformers import BertTokenizer, RobertaModel
	tokenizer = BertTokenizer.from_pretrained("COGNANO/VHHBERT")
	model = RobertaModel.from_pretrained("COGNANO/VHHBERT")
	```

	## Links

	- Pre-training Corpus: https://huggingface.co/datasets/COGNANO/VHHCorpus-2M
	- Code: https://github.com/cognano/AVIDa-SARS-CoV-2
	- Paper: https://arxiv.org/abs/2405.18749

	## Citation

	If you use VHHBERT in your research, please cite the following paper.

	```bibtex
	@inproceedings{tsuruta2024sars,
	title={A {SARS}-{C}o{V}-2 Interaction Dataset and {VHH} Sequence Corpus for Antibody Language Models},
	author={Hirofumi Tsuruta and Hiroyuki Yamazaki and Ryota Maeda and Ryotaro Tamura and Akihiro Imura},
	booktitle={Advances in Neural Information Processing Systems 37},
	year={2024}
	}
	```