论文标题
人工智能与玛雅·安吉洛(Maya Angelou):实验证据表明,人们无法与人类写的诗歌区分开
Artificial Intelligence versus Maya Angelou: Experimental evidence that people cannot differentiate AI-generated from human-written poetry
论文作者
论文摘要
公开可用的自然语言生成算法(NLG)的发布引起了公众的关注和辩论。原因之一在于算法据称能够在各个领域生成类似人类的文本的能力。使用激励任务来评估人(a)是否可以区分和(b)偏爱算法生成的文本与人写文本的经验证据。我们进行了两个实验,评估了对最先进的自然语言生成算法GPT-2(ntotal = 830)的行为反应。 GPT-2使用人类诗歌的相同起始人,产生了诗歌样本。从这些样本中,选择了一首随机诗(人类之外的诗),或者选择了最好的诗(人类在循环中),然后与人写的诗相吻合。在新的图灵测试的新版本中,参与者未能可靠地检测到人类在人类待遇中的算法生成的诗,但在人类之外的治疗中取得了成功。此外,人们揭示了对算法生成的诗歌的轻微厌恶,独立于参与者是否被告知诗歌(透明度)的算法起源(不透明度)。我们讨论了这些结果传达了有关NLG算法的性能,以产生类似人类的文本并提出方法来研究人类试验环境中的这种学习算法。
The release of openly available, robust natural language generation algorithms (NLG) has spurred much public attention and debate. One reason lies in the algorithms' purported ability to generate human-like text across various domains. Empirical evidence using incentivized tasks to assess whether people (a) can distinguish and (b) prefer algorithm-generated versus human-written text is lacking. We conducted two experiments assessing behavioral reactions to the state-of-the-art Natural Language Generation algorithm GPT-2 (Ntotal = 830). Using the identical starting lines of human poems, GPT-2 produced samples of poems. From these samples, either a random poem was chosen (Human-out-of-the-loop) or the best one was selected (Human-in-the-loop) and in turn matched with a human-written poem. In a new incentivized version of the Turing Test, participants failed to reliably detect the algorithmically-generated poems in the Human-in-the-loop treatment, yet succeeded in the Human-out-of-the-loop treatment. Further, people reveal a slight aversion to algorithm-generated poetry, independent on whether participants were informed about the algorithmic origin of the poem (Transparency) or not (Opacity). We discuss what these results convey about the performance of NLG algorithms to produce human-like text and propose methodologies to study such learning algorithms in human-agent experimental settings.