反对在线社交网络中的恶意内容逃避逃避：伪装单词的模拟和检测

论文标题

反对在线社交网络中的恶意内容逃避逃避：伪装单词的模拟和检测

Countering Malicious Content Moderation Evasion in Online Social Networks: Simulation and Detection of Word Camouflage

论文作者

Huertas-García, Álvaro, Martín, Alejandro, Tato, Javier Huertas, Camacho, David

论文摘要

内容审核是在线筛选和监视用户生成的内容的过程。它在停止因不可接受的行为所产生的内容（例如仇恨言论，骚扰，针对特定群体的暴力，恐怖主义，种族主义，仇外心理，同性恋恐惧症或厌恶症）引起的内容中起着至关重要的作用。这些平台利用大量工具来检测和管理恶意信息；但是，恶意演员还提高了他们的技能，制定了超越这些障碍并继续传播误导性信息的策略。扭曲和伪装的关键字是逃避平台内容审核系统的最常用技术之一。为了回应最近正在进行的问题，本文通过模拟不同内容逃避技术和一个多语言变压器模型来解决社交网络中这种语言趋势的创新方法。通过这种方式，我们与其他科学界分享了一种多语言公共工具，名为“ PyleetsPeak”，以可自定义的方式生成/模拟，通过自动单词伪装和多种语言命名的实体识别（NER）的模型，以其识别和检测来生成内容逃避现象。在不同的文本场景中评估了多语言NER模型，检测伪装技术的不同类型和混合物，达到总体加权F1分数为0.8795。本文通过开发多语言工具来模拟和检测社交网络上逃避内容的新方法，从而使对信息障碍的斗争更加有效。

Content moderation is the process of screening and monitoring user-generated content online. It plays a crucial role in stopping content resulting from unacceptable behaviors such as hate speech, harassment, violence against specific groups, terrorism, racism, xenophobia, homophobia, or misogyny, to mention some few, in Online Social Platforms. These platforms make use of a plethora of tools to detect and manage malicious information; however, malicious actors also improve their skills, developing strategies to surpass these barriers and continuing to spread misleading information. Twisting and camouflaging keywords are among the most used techniques to evade platform content moderation systems. In response to this recent ongoing issue, this paper presents an innovative approach to address this linguistic trend in social networks through the simulation of different content evasion techniques and a multilingual Transformer model for content evasion detection. In this way, we share with the rest of the scientific community a multilingual public tool, named "pyleetspeak" to generate/simulate in a customizable way the phenomenon of content evasion through automatic word camouflage and a multilingual Named-Entity Recognition (NER) Transformer-based model tuned for its recognition and detection. The multilingual NER model is evaluated in different textual scenarios, detecting different types and mixtures of camouflage techniques, achieving an overall weighted F1 score of 0.8795. This article contributes significantly to countering malicious information by developing multilingual tools to simulate and detect new methods of evasion of content on social networks, making the fight against information disorders more effective.

下载PDF全文

下载文献需遵守相关版权规定

论文标题