Commit f4e57b99 authored Nov 22, 2024 by hoepfl

Adding option to check language in corpus data

Unfortunately, fasttext cannot natively detect sme, so it is only possible to check if a line matches either no, nn (the two versions of norwegian) or en.
If this is the case, it can be supposed that the line contains potentially a significant amount of text in this language.

parents

Show whitespace changes

Inline Side-by-side

Please to comment