论文标题

由高效的基因组映射算法增强的全面私人保护搜索引擎

Full-privacy secured search engine empowered by efficient genome-mapping algorithms

论文作者

Chang, Yuan-Yu, Wong, Sheng-Tang, Salawu, Emmanuel O, Wang, Yu-Xuan, Hung, Jui-Hung, Yang, Lee-Wei

论文摘要

自90年代以来,基于关键字的搜索引擎一直在帮助人们通过简单的查询找到相关的Web内容,因此,在文章上传后,最近主要用于窃的基于全文的搜索引擎。但是,这些“免费”或付费服务通过存储用户的搜索查询和个人分析和有针对性的广告的偏好来运行,而用户付费的文章可以进一步获取服务提供商作为其扩展数据库的一部分。简而言之,在过去的几十年中,搜索引擎隐私不是网络探索的选择。在这里,我们证明,可以正确执行的数据库或Internet搜索,并以整个文章作为查询提供,而无需通过不可逆的编码方案和有效的FM-Index搜索例程揭示用户的敏感查询,该搜索程序通常在基因组NGS中使用。在我们的解决方案中,Sapiens Aperio Veritas发动机(S.A.V.E.),查询中的每个单词都被编码为12个“氨基酸”(A.A.)之一,其中包括用户本地机器上的伪生物学序列(PBS)。 PBS介导的pla窃检测是通过用户通过我们的云服务提交本地编码的PBS来完成的,以在所收集的Web内容中找到相同的重复项,这些重复项以与查询相同的方式编码。发现长度长于12 a.a的PBS可以以误报率<0.8%的误报返回正确的结果。节省。以与Bowtie相似的速度运行,比Blast快4个订单。 S.A.V.E.,在常规和私有搜索模式下运行,为在压缩搜索空间中有效的Internet搜索和pla窃检测提供了一个新的选项,而无需存储和揭示用户的机密内容。我们希望未来的隐私搜索引擎可以参考此处提出的想法。节省。可在https://dyn.life.nthu.edu.tw/save/

Since the 90s, keyword-based search engines have been helping people locate relevant web content via a simple query, so have the recent full-text-based search engines mainly used for plagiarism detection following an article upload. However, these "free" or paid services operate by storing users' search queries and preferences for personal profiling and targeted ads delivery, while user-uploaded articles can further profit the service providers as part of their expanding databases. In short, search engine privacy has not been an option for web exploration in the past decades. Here we demonstrate that a database or internet search, provided with the entire article as a query, can be correctly carried out without revealing users' sensitive queries by an irreversible encoding scheme and an efficient FM-index search routine that is generally used in the NGS of genomes. In our solution, Sapiens Aperio Veritas Engine (S.A.V.E.), every word in the query is encoded into one of 12 "amino acids" (a.a.) comprising a pseudo-biological sequence (PBS) at users' local machines. The PBS-mediated plagiarism detection is done by users' submission of locally encoded PBS through our cloud service to locate identical duplicates in the collected web contents which had been encoded in the same way as the query. It is found that PBSs with a length longer than 12 a.a., can return correct results with a false positive rate <0.8%. S.A.V.E. runs at a similar speed as Bowtie and is 4 orders faster than BLAST. S.A.V.E., functioning in both regular and in-private search modes, provides a new option for efficient internet search and plagiarism detection in a compressed search space without a chance of storing and revealing users' confidential contents. We expect that future privacy-aware search engines can reference the ideas proposed herein. S.A.V.E. is made available at https://dyn.life.nthu.edu.tw/SAVE/

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源