论文标题
新一代的视角API:有效的多语言角色级变压器
A New Generation of Perspective API: Efficient Multilingual Character-level Transformers
论文作者
论文摘要
在万维网上,有毒内容探测器是针对潜在的仇恨和冒犯性信息的至关重要的防御方法。因此,构建具有更安全的互联网的高效分类器是一个重要的研究领域。此外,网络是一个高度多语言的跨文化社区,随着时间的流逝,它会发展自己的术语。因此,开发在各种语言,用法和样式上有效的模型至关重要。在本文中,我们介绍了Google Jigsaw的下一个版本API背后的基本原理。该方法的核心是单个多语言的无代币模型,该模型适用于各种语言,域和任务。我们证明,通过放弃静态词汇,我们可以在各种环境中获得灵活性。我们还概述了使这种字节级模型有效且可行的技术所采用的技术。通过对多语言有毒评论分类的广泛实验,从真正的API流量中得出的基准以及对一系列代码转换,秘密毒性,基于表情符号的仇恨,可读性的混淆,分配变化和偏见评估的评估,我们表明我们的建议方法表现出强大的基准。最后,我们介绍了将该系统部署在生产中的发现。
On the world wide web, toxic content detectors are a crucial line of defense against potentially hateful and offensive messages. As such, building highly effective classifiers that enable a safer internet is an important research area. Moreover, the web is a highly multilingual, cross-cultural community that develops its own lingo over time. As such, it is crucial to develop models that are effective across a diverse range of languages, usages, and styles. In this paper, we present the fundamentals behind the next version of the Perspective API from Google Jigsaw. At the heart of the approach is a single multilingual token-free Charformer model that is applicable across a range of languages, domains, and tasks. We demonstrate that by forgoing static vocabularies, we gain flexibility across a variety of settings. We additionally outline the techniques employed to make such a byte-level model efficient and feasible for productionization. Through extensive experiments on multilingual toxic comment classification benchmarks derived from real API traffic and evaluation on an array of code-switching, covert toxicity, emoji-based hate, human-readable obfuscation, distribution shift, and bias evaluation settings, we show that our proposed approach outperforms strong baselines. Finally, we present our findings from deploying this system in production.