论文标题
Quirk或Palmer:带有注释数据集的模态动词框架的比较研究
Quirk or Palmer: A Comparative Study of Modal Verb Frameworks with Annotated Datasets
论文作者
论文摘要
在日常沟通中通常使用诸如“ can”,“ may”和“必须”之类的模态动词,以传达说话者的观点与命题的可能性和/或模式有关。它们的含义可能会大大差异,具体取决于它们的使用方式和句子的上下文(例如,“他们必须'必须互相帮助。” vs.“他们必须互相帮助。”)尽管他们在自然语言理解中的实际意义),但语言学家尚未同意一个单一的,杰出的模态动词感官的框架。缺乏同意的原因是模态动词的高灵活性和多义,使研究人员更难将这个单词家族的见解融入他们的工作中。这项工作介绍了Moverb数据集,该数据集由27,240个模态动词的注释超过4,540个话语,其中包含社交对话中的一个或多个句子。每个话语都用三个注释者使用模态动词感官的两个不同的理论框架(即怪异和帕尔默)注释。我们观察到,尽管有不同数量的意义类型(Quirk 8,Palmer的3个),这两个框架都具有相似的通道间协议。鉴于基于罗伯塔的分类器在\ dataset上进行了微调,我们在Quirk和Palmer上分别达到了82.2和78.3的F1分数,这表明模态动词感觉歧义歧义并不是一项琐碎的任务。我们的最终版本将公开使用我们的数据集。
Modal verbs, such as "can", "may", and "must", are commonly used in daily communication to convey the speaker's perspective related to the likelihood and/or mode of the proposition. They can differ greatly in meaning depending on how they're used and the context of a sentence (e.g. "They 'must' help each other out." vs. "They 'must' have helped each other out.") Despite their practical importance in natural language understanding, linguists have yet to agree on a single, prominent framework for the categorization of modal verb senses. This lack of agreement stems from high degrees of flexibility and polysemy from the modal verbs, making it more difficult for researchers to incorporate insights from this family of words into their work. This work presents Moverb dataset, which consists of 27,240 annotations of modal verb senses over 4,540 utterances containing one or more sentences from social conversations. Each utterance is annotated by three annotators using two different theoretical frameworks (i.e., Quirk and Palmer) of modal verb senses. We observe that both frameworks have similar inter-annotator agreements, despite having different numbers of sense types (8 for Quirk and 3 for Palmer). With the RoBERTa-based classifiers fine-tuned on \dataset, we achieve F1 scores of 82.2 and 78.3 on Quirk and Palmer, respectively, showing that modal verb sense disambiguation is not a trivial task. Our dataset will be publicly available with our final version.