论文标题
在LLMS时代暴露影响运动:一种基于行为的AI方法,用于检测国家赞助的巨魔
Exposing Influence Campaigns in the Age of LLMs: A Behavioral-Based AI Approach to Detecting State-Sponsored Trolls
论文作者
论文摘要
对社交媒体的影响力活动中的国家赞助的巨魔的检测对于研究界来说是一个至关重要的尚未解决的挑战,该研究社区的挑战超出了在线领域。为了应对这一挑战,我们提出了一种新的基于AI的解决方案,该解决方案仅通过与他们的共享活动序列相关的行为提示来标识巨魔帐户,涵盖了他们的行动和他们从他人那里收到的反馈。我们的方法不包含共享的任何文本内容,并包括两个步骤:首先,我们利用基于LSTM的分类器来确定帐户序列是属于国家赞助的巨魔还是有机的合法用户。其次,我们采用分类序列来计算一个名为“巨魔得分”的度量,从而量化了帐户表现出类似巨魔的行为的程度。为了评估我们方法的有效性,我们在美国总统大选期间在2016年俄罗斯干预运动的背景下研究了其绩效。我们的实验产生了引人注目的结果,表明我们的方法可以识别AUC接近99%的帐户序列,并准确区分俄罗斯巨魔和有机用户,而AUC为91%。 Notably, our behavioral-based approach holds a significant advantage in the ever-evolving landscape, where textual and linguistic properties can be easily mimicked by Large Language Models (LLMs): In contrast to existing language-based techniques, it relies on more challenging-to-replicate behavioral cues, ensuring greater resilience in identifying influence campaigns, especially given the potential increase in the usage of LLMs for generating inauthentic content.最后,我们评估了解决方案对推动不同信息操作的各种实体的普遍性,并找到了有希望的结果,可以指导未来的研究。
The detection of state-sponsored trolls operating in influence campaigns on social media is a critical and unsolved challenge for the research community, which has significant implications beyond the online realm. To address this challenge, we propose a new AI-based solution that identifies troll accounts solely through behavioral cues associated with their sequences of sharing activity, encompassing both their actions and the feedback they receive from others. Our approach does not incorporate any textual content shared and consists of two steps: First, we leverage an LSTM-based classifier to determine whether account sequences belong to a state-sponsored troll or an organic, legitimate user. Second, we employ the classified sequences to calculate a metric named the "Troll Score", quantifying the degree to which an account exhibits troll-like behavior. To assess the effectiveness of our method, we examine its performance in the context of the 2016 Russian interference campaign during the U.S. Presidential election. Our experiments yield compelling results, demonstrating that our approach can identify account sequences with an AUC close to 99% and accurately differentiate between Russian trolls and organic users with an AUC of 91%. Notably, our behavioral-based approach holds a significant advantage in the ever-evolving landscape, where textual and linguistic properties can be easily mimicked by Large Language Models (LLMs): In contrast to existing language-based techniques, it relies on more challenging-to-replicate behavioral cues, ensuring greater resilience in identifying influence campaigns, especially given the potential increase in the usage of LLMs for generating inauthentic content. Finally, we assessed the generalizability of our solution to various entities driving different information operations and found promising results that will guide future research.