论文标题
使用多模式学习基于情绪的仇恨言语检测
Emotion Based Hate Speech Detection using Multimodal Learning
论文作者
论文摘要
近年来,由于在所有年龄段,种族和种族中的广泛使用,监视社交媒体平台上的仇恨言论和令人反感的语言变得至关重要。因此,使用自然语言处理(NLP)进行了大量研究,用于自动检测此类内容。在成功过滤文本数据的同时,没有研究重点是检测多媒体数据中的可恨内容。随着数据存储的便利性和社交媒体平台的指数增长,多媒体内容与文本数据一样增强了Internet。但是,它逃脱了自动过滤系统。仇恨言论和侵犯性可以在多媒体中检测到主要通过三种方式,即视觉,声学和言语。我们的初步研究得出的结论是,分类仇恨言论的最重要特征是说话者的情绪状态及其对口语的影响,因此将我们当前的研究限制在这些方式上。本文提出了第一个多模式深度学习框架,以结合代表情感和语义特征以检测可恶内容的听觉功能。我们的结果表明,结合情感属性会导致对基于文本的模型的显着改善在检测可恨的多媒体内容时。本文还提出了一个新的仇恨语音检测视频数据集(HSDVD),目的是为了多模式学习,因为今天没有这样的数据集。
In recent years, monitoring hate speech and offensive language on social media platforms has become paramount due to its widespread usage among all age groups, races, and ethnicities. Consequently, there have been substantial research efforts towards automated detection of such content using Natural Language Processing (NLP). While successfully filtering textual data, no research has focused on detecting hateful content in multimedia data. With increased ease of data storage and the exponential growth of social media platforms, multimedia content proliferates the internet as much as text data. Nevertheless, it escapes the automatic filtering systems. Hate speech and offensiveness can be detected in multimedia primarily via three modalities, i.e., visual, acoustic, and verbal. Our preliminary study concluded that the most essential features in classifying hate speech would be the speaker's emotional state and its influence on the spoken words, therefore limiting our current research to these modalities. This paper proposes the first multimodal deep learning framework to combine the auditory features representing emotion and the semantic features to detect hateful content. Our results demonstrate that incorporating emotional attributes leads to significant improvement over text-based models in detecting hateful multimedia content. This paper also presents a new Hate Speech Detection Video Dataset (HSDVD) collected for the purpose of multimodal learning as no such dataset exists today.