论文标题
HybriDialogue:基于表格和文本数据的信息寻求对话数据集
HybriDialogue: An Information-Seeking Dialogue Dataset Grounded on Tabular and Textual Data
论文作者
论文摘要
当前对话系统中的一个紧迫挑战是成功与用户交谈,并在主题上,信息分布在不同方式上。 Multiturn对话系统的先前工作主要集中在文本或表信息上。在更现实的情况下,对两者的共同理解至关重要,因为知识通常分布在非结构化和结构化形式上。我们提出了一个新的对话数据集,即HybriDialogue,该数据集由以Wikipedia文本和表格为基础的众包自然对话组成。对话是通过将复杂的多人问题分解为简单,现实的多弯对话交互来创建的。我们建议对数据集进行检索,系统状态跟踪和对话响应生成任务,并为每个数据集进行基线实验。我们的研究结果表明,仍然有足够的改进机会,证明了建立更强大的对话系统的重要性,这些系统可以在桌子和文本上建立复杂的信息寻求对话的复杂设置。
A pressing challenge in current dialogue systems is to successfully converse with users on topics with information distributed across different modalities. Previous work in multiturn dialogue systems has primarily focused on either text or table information. In more realistic scenarios, having a joint understanding of both is critical as knowledge is typically distributed over both unstructured and structured forms. We present a new dialogue dataset, HybriDialogue, which consists of crowdsourced natural conversations grounded on both Wikipedia text and tables. The conversations are created through the decomposition of complex multihop questions into simple, realistic multiturn dialogue interactions. We propose retrieval, system state tracking, and dialogue response generation tasks for our dataset and conduct baseline experiments for each. Our results show that there is still ample opportunity for improvement, demonstrating the importance of building stronger dialogue systems that can reason over the complex setting of information-seeking dialogue grounded on tables and text.