classla
/

xlm-roberta-base-multilingual-text-genre-classifier

@@ -238,94 +238,18 @@ At cross-dataset and cross-lingual experiments, it was shown that the X-GENRE cl
 ## Citation
-If you use the model, please cite the GitHub repository where the fine-tuning experiments are explained:
 ```
- @misc{Kuzman2022,
-  author = {Kuzman, Taja},
-  title = {{Comparison of genre datasets: CORE, GINCO and FTD}},
-  year = {2022},
-  publisher = {GitHub},
-  journal = {GitHub repository},
-  howpublished = {\url{https://github.com/TajaKuzman/Genre-Datasets-Comparison}}
 }
 ```
-and the following paper on which the original model is based:
-```
-@article{DBLP:journals/corr/abs-1911-02116,
-  author    = {Alexis Conneau and
-               Kartikay Khandelwal and
-               Naman Goyal and
-               Vishrav Chaudhary and
-               Guillaume Wenzek and
-               Francisco Guzm{\'{a}}n and
-               Edouard Grave and
-               Myle Ott and
-               Luke Zettlemoyer and
-               Veselin Stoyanov},
-  title     = {Unsupervised Cross-lingual Representation Learning at Scale},
-  journal   = {CoRR},
-  volume    = {abs/1911.02116},
-  year      = {2019},
-  url       = {http://arxiv.org/abs/1911.02116},
-  eprinttype = {arXiv},
-  eprint    = {1911.02116},
-  timestamp = {Mon, 11 Nov 2019 18:38:09 +0100},
-  biburl    = {https://dblp.org/rec/journals/corr/abs-1911-02116.bib},
-  bibsource = {dblp computer science bibliography, https://dblp.org}
-}
-```
-To cite the datasets that were used for fine-tuning:
-CORE dataset:
-```
-@article{egbert2015developing,
-  title={Developing a bottom-up, user-based method of web register classification},
-  author={Egbert, Jesse and Biber, Douglas and Davies, Mark},
-  journal={Journal of the Association for Information Science and Technology},
-  volume={66},
-  number={9},
-  pages={1817--1831},
-  year={2015},
-  publisher={Wiley Online Library}
-}
-```
-GINCO dataset:
-```
-@InProceedings{kuzman-rupnik-ljubei:2022:LREC,
-  author    = {Kuzman, Taja  and  Rupnik, Peter  and  Ljube{\v{s}}i{\'c}, Nikola},
-  title     = {{The GINCO Training Dataset for Web Genre Identification of Documents Out in the Wild}},
-  booktitle      = {Proceedings of the Language Resources and Evaluation Conference},
-  month          = {},
-  year           = {2022},
-  address        = {Marseille, France},
-  publisher      = {European Language Resources Association},
-  pages     = {1584--1594},
-  url       = {https://aclanthology.org/2022.lrec-1.170}
-}
-```
-FTD dataset:
-```
-@article{sharoff2018functional,
-  title={Functional text dimensions for the annotation of web corpora},
-  author={Sharoff, Serge},
-  journal={Corpora},
-  volume={13},
-  number={1},
-  pages={65--95},
-  year={2018},
-  publisher={Edinburgh University Press The Tun-Holyrood Road, 12 (2f) Jackson's Entry~…}
-}
-```
-The datasets are available at:
-1. http://hdl.handle.net/11356/1467 (GINCO)
-2. https://github.com/TurkuNLP/CORE-corpus (CORE)
-3. https://github.com/ssharoff/genre-keras (FTD)

 ## Citation
+If you use the model, please cite the paper which describes creation of the X-GENRE dataset and the genre classifier:
 ```
+@article{kuzman2023automatic,
+  title={Automatic Genre Identification for Robust Enrichment of Massive Text Collections: Investigation of Classification Methods in the Era of Large Language Models},
+  author={Kuzman, Taja and Mozeti{\v{c}}, Igor and Ljube{\v{s}}i{\'c}, Nikola},
+  journal={Machine Learning and Knowledge Extraction},
+  volume={5},
+  number={3},
+  pages={1149--1175},
+  year={2023},
+  publisher={MDPI}
 }
 ```