| | --- |
| | license: mit |
| | --- |
| | This repository contains the model weights of the BERT model trained by predicting RDKit physio-chemical properties on 30% of the [GuacaMol](https://pubs.acs.org/doi/abs/10.1021/acs.jcim.8b00839) dataset. |
| | Further information can be found in our [publication](https://arxiv.org/abs/2503.03360). |
| |
|
| | ```python |
| | from transformers import AutoModel, AutoTokenizer |
| | |
| | mols = [ |
| | "CCOc1cc2nn(CCC(C)(C)O)cc2cc1NC(=O)c1cccc(C(F)F)n1", |
| | "CN(c1ncc(F)cn1)[C@H]1CCCNC1", |
| | "CC(C)(Oc1ccc(-c2cnc(N)c(-c3ccc(Cl)cc3)c2)cc1)C(=O)O", |
| | "CC(C)(O)CCn1cc2cc(NC(=O)c3cccc(C(F)(F)F)n3)c(C(C)(C)O)cc2n1", |
| | # ... |
| | ] |
| | |
| | |
| | tokenizer = AutoTokenizer.from_pretrained("UdS-LSV/da4mt-mtr-60") |
| | model = AutoModel.from_pretrained("UdS-LSV/da4mt-mtr-60") |
| | |
| | inputs = tokenizer(mols, add_special_tokens=True, truncation=True, max_length=128, padding="max_length", return_tensors="pt") |
| | embeddings = model(**inputs).last_hidden_state[:, 0, :] |
| | ``` |
| |
|
| |  |
| |
|
| |
|
| | ### See also |
| | - https://huggingface.co/UdS-LSV/da4mt-mlm-30 |
| | - https://huggingface.co/UdS-LSV/da4mt-mlm-60 |
| | - https://huggingface.co/UdS-LSV/da4mt-mtr-30 |