神经文本作者身份的归因和混淆：数据挖掘观点

论文标题

神经文本作者身份的归因和混淆：数据挖掘观点

Attribution and Obfuscation of Neural Text Authorship: A Data Mining Perspective

论文作者

Uchendu, Adaku, Le, Thai, Lee, Dongwon

论文摘要

在隐私研究中，越来越重要的兴趣和重要性的两个相互关联的研究问题是作者归因（AA）和作者混淆（AO）。考虑到工件，尤其是所讨论的文本t，AA解决方案旨在将其准确地归因于其真正的作者，而AO解决方案旨在修改t以隐藏其真实的作者身份。传统上，作者身份的概念及其伴随的隐私问题仅针对人类作者。但是，近年来，由于NLP中神经文本生成（NTG）技术的爆炸性进步，能够综合人类质量的开放式文本（所谓的“神经文本”），因此现在必须考虑人类，机器或组合的作者。由于恶意使用神经文本的含义和潜在威胁，了解传统AA/AO解决方案的局限性并开发出新颖的AA/AO解决方案在处理神经文本方面变得至关重要。因此，在这项调查中，我们从数据挖掘的角度进行了有关神经文本作者身份的归因和混淆的最新文献的全面回顾，并分享了我们对它们的局限性和有希望的研究方向的看法。

Two interlocking research questions of growing interest and importance in privacy research are Authorship Attribution (AA) and Authorship Obfuscation (AO). Given an artifact, especially a text t in question, an AA solution aims to accurately attribute t to its true author out of many candidate authors while an AO solution aims to modify t to hide its true authorship. Traditionally, the notion of authorship and its accompanying privacy concern is only toward human authors. However, in recent years, due to the explosive advancements in Neural Text Generation (NTG) techniques in NLP, capable of synthesizing human-quality open-ended texts (so-called "neural texts"), one has to now consider authorships by humans, machines, or their combination. Due to the implications and potential threats of neural texts when used maliciously, it has become critical to understand the limitations of traditional AA/AO solutions and develop novel AA/AO solutions in dealing with neural texts. In this survey, therefore, we make a comprehensive review of recent literature on the attribution and obfuscation of neural text authorship from a Data Mining perspective, and share our view on their limitations and promising research directions.

下载PDF全文

下载文献需遵守相关版权规定

论文标题