论文标题
在随机列重复下的马尔可夫数据库匹配
Matching of Markov Databases Under Random Column Repetitions
论文作者
论文摘要
相关的洗牌数据库的匹配条目具有从隐私到生物学的实际应用。在本文中,研究了由时间索引数据库的采样中的同步错误,研究了随机列重复和删除下随机数据库的匹配。假定对于数据库中的每个条目(行),属性(列)是相关的,该属性(列)被建模为Markov过程。提出了列直方图作为置换不变特征,以检测重复模式,其渐近唯一性是使用信息理论工具证明的。然后,重复检测之后是基于典型的行匹配方案。考虑到这一总体方案,得出了数据库增长率方面成功匹配数据库的足够条件。 Fano不平等的修改版本导致成功匹配的必要条件,并在列重复下建立了匹配能力。该容量等于擦除结合,假设重复位置是已知的A-Priori。总体而言,我们的结果提供了有关匿名时间索引数据的隐私发布的见解。
Matching entries of correlated shuffled databases have practical applications ranging from privacy to biology. In this paper, motivated by synchronization errors in the sampling of time-indexed databases, matching of random databases under random column repetitions and deletions is investigated. It is assumed that for each entry (row) in the database, the attributes (columns) are correlated, which is modeled as a Markov process. Column histograms are proposed as a permutation-invariant feature to detect the repetition pattern, whose asymptotic-uniqueness is proved using information-theoretic tools. Repetition detection is then followed by a typicality-based row matching scheme. Considering this overall scheme, sufficient conditions for successful matching of databases in terms of the database growth rate are derived. A modified version of Fano's inequality leads to a tight necessary condition for successful matching, establishing the matching capacity under column repetitions. This capacity is equal to the erasure bound, which assumes the repetition locations are known a-priori. Overall, our results provide insights on privacy-preserving publication of anonymized time-indexed data.