Update README.md
Browse files
README.md
CHANGED
|
@@ -196,9 +196,11 @@ For language-specific results, see [the AGILE benchmark](https://github.com/Taja
|
|
| 196 |
An example of preparing data for genre identification and post-processing of the results can be found [here](https://github.com/TajaKuzman/Applying-GENRE-on-MaCoCu-bilingual) where we applied X-GENRE classifier to the English part of [MaCoCu](https://macocu.eu/) parallel corpora.
|
| 197 |
|
| 198 |
For reliable results, genre classifier should be applied to documents of sufficient length (the rule of thumb is at least 75 words).
|
| 199 |
-
|
| 200 |
-
|
| 201 |
-
|
|
|
|
|
|
|
| 202 |
|
| 203 |
|
| 204 |
### Use examples
|
|
|
|
| 196 |
An example of preparing data for genre identification and post-processing of the results can be found [here](https://github.com/TajaKuzman/Applying-GENRE-on-MaCoCu-bilingual) where we applied X-GENRE classifier to the English part of [MaCoCu](https://macocu.eu/) parallel corpora.
|
| 197 |
|
| 198 |
For reliable results, genre classifier should be applied to documents of sufficient length (the rule of thumb is at least 75 words).
|
| 199 |
+
Predictions with confidence scores below 0.8 should not be used.
|
| 200 |
+
In our experience annotating large web corpora in various European languages, this occurs in approximately 4% of cases.
|
| 201 |
+
We label these instances as "Mix."
|
| 202 |
+
Furthermore, the label "Other" can be used as another indicator of low confidence of the predictions,
|
| 203 |
+
as it often indicates that the text does not have enough features of any genre, and these predictions can be discarded as well.
|
| 204 |
|
| 205 |
|
| 206 |
### Use examples
|