通过类型级别的互换干预培训在基于子词的语言模型中诱导字符级结构

论文标题

通过类型级别的互换干预培训在基于子词的语言模型中诱导字符级结构

Inducing Character-level Structure in Subword-based Language Models with Type-level Interchange Intervention Training

论文作者

Huang, Jing, Wu, Zhengxuan, Mahowald, Kyle, Potts, Christopher

论文摘要

涉及字符级操作的语言任务（例如，拼写校正，算术操作，文字游戏）对于在子词单元上运行的模型来说是一项挑战。为了解决这个问题，我们开发了一个因果干预框架，以学习基于子字的语言模型中的强大而可解释的角色表示。我们的方法将每个字符视为因果模型中的打字变量，并通过调整Geiger等人的交换干预训练方法来学习这种因果结构。（2021）。我们还引入了一系列字符级任务，这些任务在其对含义和序列级别上下文的依赖性上有所不同。尽管字符级模型仍然在纯粹的基于表单的任务（例如字符串反转）上表现最好，但我们的方法在更复杂的任务上优于字符级模型，这些模型融合了形式，含义和上下文，例如上下文和单词搜索游戏中的拼写校正。与基于标准子词的模型相比，我们的方法还显着提高了看不见的令牌序列的鲁棒性，并导致人物的内部表示。

Language tasks involving character-level manipulations (e.g., spelling corrections, arithmetic operations, word games) are challenging for models operating on subword units. To address this, we develop a causal intervention framework to learn robust and interpretable character representations inside subword-based language models. Our method treats each character as a typed variable in a causal model and learns such causal structures by adapting the interchange intervention training method of Geiger et al. (2021). We additionally introduce a suite of character-level tasks that systematically vary in their dependence on meaning and sequence-level context. While character-level models still perform best on purely form-based tasks like string reversal, our method outperforms character-level models on more complex tasks that blend form, meaning, and context, such as spelling correction in context and word search games. Compared with standard subword-based models, our approach also significantly improves robustness on unseen token sequences and leads to human-interpretable internal representations of characters.

下载PDF全文

下载文献需遵守相关版权规定

论文标题