论文标题
模块化域的适应性
Modular Domain Adaptation
论文作者
论文摘要
计算社会科学研究人员广泛使用现成的模型来衡量文本的属性,例如情感。但是,如果不访问源数据,就很难说明域移位,这代表了对有效性的威胁。在这里,我们将域的适应视为一个模块化过程,涉及单独的模型生产者和模型消费者,并展示他们如何独立合作以促进更准确的文本测量。我们在此情况下介绍了两种轻量级技术,并证明它们可靠地提高了与线性和上下文嵌入模型一起使用的四个多域文本分类数据集上的室外精度。我们最终提出了有关模型生产者和消费者的建议,并发布了本文伴随的模型和复制代码。
Off-the-shelf models are widely used by computational social science researchers to measure properties of text, such as sentiment. However, without access to source data it is difficult to account for domain shift, which represents a threat to validity. Here, we treat domain adaptation as a modular process that involves separate model producers and model consumers, and show how they can independently cooperate to facilitate more accurate measurements of text. We introduce two lightweight techniques for this scenario, and demonstrate that they reliably increase out-of-domain accuracy on four multi-domain text classification datasets when used with linear and contextual embedding models. We conclude with recommendations for model producers and consumers, and release models and replication code to accompany this paper.