Hybrid Emoji-Based Masked Language Models for Zero-Shot Abusive Language Detection - 3IA Côte d’Azur – Interdisciplinary Institute for Artificial Intelligence Accéder directement au contenu
Communication Dans Un Congrès Année : 2020

Hybrid Emoji-Based Masked Language Models for Zero-Shot Abusive Language Detection

Résumé

Recent studies have demonstrated the effectiveness of cross-lingual language model pre-training on different NLP tasks, such as natural language inference and machine translation. In our work, we test this approach on social media data, which are particularly challenging to process within this framework, since the limited length of the textual messages and the irregularity of the language make it harder to learn meaningful encodings. More specifically, we propose a hybrid emoji-based Masked Language Model (MLM) to leverage the common information conveyed by emo-jis across different languages and improve the learned cross-lingual representation of short text messages, with the goal to perform zero-shot abusive language detection. We compare the results obtained with the original MLM to the ones obtained by our method, showing improved performance on German, Italian and Spanish.
Fichier principal
Vignette du fichier
Emoji_Based_Hate_Speech_EMNLP_2020.pdf (140.18 Ko) Télécharger le fichier
Origine : Fichiers produits par l'(les) auteur(s)
Loading...

Dates et versions

hal-02972203 , version 1 (20-10-2020)

Identifiants

  • HAL Id : hal-02972203 , version 1

Citer

Michele Corazza, Stefano Menini, Elena Cabrio, Sara Tonelli, Serena Villata. Hybrid Emoji-Based Masked Language Models for Zero-Shot Abusive Language Detection. EMNLP 2020 - Conference on Empirical Methods in Natural Language Processing, Nov 2020, Virtual, France. ⟨hal-02972203⟩
312 Consultations
272 Téléchargements

Partager

Gmail Facebook X LinkedIn More