论文标题
推断X Path
Infer XPath
论文作者
论文摘要
我们建议对网页中数据结构的发现进行重新调整,作为文档节点集之间的关系。我们首先将网页分析重新设计为在XPATH扩展中找到表达式。然后,我们建议用suplxpath元语言自动发现这些XPATH表达式。我们的目标是自动化用作软件文档,Wiki和参考文档的手动创建的网页的费力转换过程,并加快其转换为表格数据,这些数据可以直接馈送到数据管道中。
We propose reformulation of discovery of data structure within a web page as relations between sets of document nodes. We start by reformulating web page analysis as finding expressions in extension of XPath. Then we propose to automatically discover these XPath expressions with InferXPath meta-language. Our goal is to automate laborious process of conversion of manually created web pages that serve as software documentations, wikis, and reference documents, and speed up their conversion into tabular data that can be directly fed into data pipeline.