论文标题
波兰自然语言推论和事实 - 基于专家的数据集和基准
Polish Natural Language Inference and Factivity -- an Expert-based Dataset and Benchmarks
论文作者
论文摘要
尽管最近在机器学习的自然语言处理方面取得了突破,但自然语言推断(NLI)问题仍然构成挑战。为此,我们贡献了一个新的数据集,该数据集仅专注于事实现象。但是,我们的任务仍然与其他NLI任务相同,即对需要的预测,矛盾或中立(ECN)。该数据集包含波兰语中的完全自然语言,并收集了2,432个动词 - 容器对和309个独特的动词。该数据集基于全国性的波兰语(NKJP),并且是主要动词和其他语言特征(例如,内部否定的发生)的代表性样本。我们发现,基于变压器BERT的模型可在句子上工作,获得了相对较好的结果($ \ yout89 \%$ f1得分)。即使使用语言功能($ \ of 91 \%$ f1得分)取得了更好的结果,但该模型需要更多的人工(循环中的人类),因为专家语言学家手动准备了功能。基于BERT的模型仅消耗输入句子,表明它们捕获了NLI/Factivity的大部分复杂性。现象中的复杂情况 - 例如具有权利(E)和非事实动词的病例 - 仍然是进一步研究的空缺问题。
Despite recent breakthroughs in Machine Learning for Natural Language Processing, the Natural Language Inference (NLI) problems still constitute a challenge. To this purpose we contribute a new dataset that focuses exclusively on the factivity phenomenon; however, our task remains the same as other NLI tasks, i.e. prediction of entailment, contradiction or neutral (ECN). The dataset contains entirely natural language utterances in Polish and gathers 2,432 verb-complement pairs and 309 unique verbs. The dataset is based on the National Corpus of Polish (NKJP) and is a representative sample in regards to frequency of main verbs and other linguistic features (e.g. occurrence of internal negation). We found that transformer BERT-based models working on sentences obtained relatively good results ($\approx89\%$ F1 score). Even though better results were achieved using linguistic features ($\approx91\%$ F1 score), this model requires more human labour (humans in the loop) because features were prepared manually by expert linguists. BERT-based models consuming only the input sentences show that they capture most of the complexity of NLI/factivity. Complex cases in the phenomenon - e.g. cases with entitlement (E) and non-factive verbs - remain an open issue for further research.