Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language
Paper
β’ 1905.07213 β’ Published
RuBERT (Russian, cased, 12βlayer, 768βhidden, 12βheads, 180M parameters) was trained on the Russian part of Wikipedia and news data. We used this training data to build a vocabulary of Russian subtokens and took a multilingual version of BERTβbase as an initialization for RuBERT[1].
08.11.2021: upload model with MLM and NSP heads
[1]: Kuratov, Y., Arkhipov, M. (2019). Adaptation of Deep Bidirectional Multilingual Transformers for Russian Language. arXiv preprint arXiv:1905.07213.