论文标题
AI助手:半自动数据争吵的框架
AI Assistants: A Framework for Semi-Automated Data Wrangling
论文作者
论文摘要
数据争吵的任务,例如从各种来源获取和链接数据,转换数据格式以及纠正错误记录,最多可以构成典型数据工程工作的80%。尽管机器学习和人工智能的兴起,但数据争吵仍然是一项繁琐而手动的任务。我们介绍了AI助手,这是一种半自动交互式工具,以简化数据争吵。 AI助手通过建议通过与分析师互动获得的约束来指导分析师通过特定的数据争吵任务。 我们正式定义了AI助手的结构,并描述将数据清洁视为优化问题的现有工具符合定义。我们通过利用其遵循的共同结构来实施四个常见数据纠缠任务的AI助手,并使数据分析师在开源笔记本环境中轻松访问数据分析师的AI助手。我们通过三个示例场景进行定量和定性评估我们的AI助手。我们表明,统一和交互式设计使执行很难手动或使用全自动工具的任务变得容易。
Data wrangling tasks such as obtaining and linking data from various sources, transforming data formats, and correcting erroneous records, can constitute up to 80% of typical data engineering work. Despite the rise of machine learning and artificial intelligence, data wrangling remains a tedious and manual task. We introduce AI assistants, a class of semi-automatic interactive tools to streamline data wrangling. An AI assistant guides the analyst through a specific data wrangling task by recommending a suitable data transformation that respects the constraints obtained through interaction with the analyst. We formally define the structure of AI assistants and describe how existing tools that treat data cleaning as an optimization problem fit the definition. We implement AI assistants for four common data wrangling tasks and make AI assistants easily accessible to data analysts in an open-source notebook environment for data science, by leveraging the common structure they follow. We evaluate our AI assistants both quantitatively and qualitatively through three example scenarios. We show that the unified and interactive design makes it easy to perform tasks that would be difficult to do manually or with a fully automatic tool.