论文标题
文本生成评估:调查
Evaluation of Text Generation: A Survey
论文作者
论文摘要
论文调查了过去几年中开发的自然语言产生(NLG)系统的评估方法。我们将NLG评估方法分为三类:(1)以人为中心的评估指标,(2)不需要培训的自动指标,以及(3)机器学习指标。对于每个类别,我们讨论已经取得的进展以及仍面临的挑战,重点是评估最近提出的NLG任务和神经NLG模型。然后,我们为特定于任务的NLG评估提供了两个示例,以进行自动文本摘要和长文本生成,并通过提出未来的研究方向来结束论文。
The paper surveys evaluation methods of natural language generation (NLG) systems that have been developed in the last few years. We group NLG evaluation methods into three categories: (1) human-centric evaluation metrics, (2) automatic metrics that require no training, and (3) machine-learned metrics. For each category, we discuss the progress that has been made and the challenges still being faced, with a focus on the evaluation of recently proposed NLG tasks and neural NLG models. We then present two examples for task-specific NLG evaluations for automatic text summarization and long text generation, and conclude the paper by proposing future research directions.