论文标题
信息理论探测最小描述长度
Information-Theoretic Probing with Minimum Description Length
论文作者
论文摘要
为了衡量审慎的表示的编码程度,通常使用探针的准确性,即经过训练的分类器来预测该属性从表示形式进行预测。尽管广泛采用了探针,但其准确性的差异无法充分反映表示的差异。例如,它们与随机初始化的表示相比,它们并不基本上有利于预估计的表示形式。类似地,当探测真正的语言标签并探测随机合成任务时,它们的准确性可能相似。要看到相对于这些随机基线的准确性差异合理,先前的工作必须限制探测训练数据的量或其模型大小。取而代之的是,我们提出了标准探针的替代方案,即具有最小描述长度(MDL)的信息理论探测。通过MDL探测,训练探针预测标签是重新铸造的,因为它教了它以有效传输数据。因此,兴趣的度量从探针准确性变为标签的描述长度给定表示形式。除了探测质量外,描述长度还评估了实现质量所需的“努力量”。这种努力的特征是(i)探测模型的大小,或(ii)实现高质量所需的数据量。我们考虑了估计MDL的两种方法,可以在标准探测管道之上轻松实现:变化编码和在线编码。我们表明,这些方法在结果中一致,并且比标准探针更有信息和稳定。
To measure how well pretrained representations encode some linguistic property, it is common to use accuracy of a probe, i.e. a classifier trained to predict the property from the representations. Despite widespread adoption of probes, differences in their accuracy fail to adequately reflect differences in representations. For example, they do not substantially favour pretrained representations over randomly initialized ones. Analogously, their accuracy can be similar when probing for genuine linguistic labels and probing for random synthetic tasks. To see reasonable differences in accuracy with respect to these random baselines, previous work had to constrain either the amount of probe training data or its model size. Instead, we propose an alternative to the standard probes, information-theoretic probing with minimum description length (MDL). With MDL probing, training a probe to predict labels is recast as teaching it to effectively transmit the data. Therefore, the measure of interest changes from probe accuracy to the description length of labels given representations. In addition to probe quality, the description length evaluates "the amount of effort" needed to achieve the quality. This amount of effort characterizes either (i) size of a probing model, or (ii) the amount of data needed to achieve the high quality. We consider two methods for estimating MDL which can be easily implemented on top of the standard probing pipelines: variational coding and online coding. We show that these methods agree in results and are more informative and stable than the standard probes.