论文标题
用于短段检测的超级可扩展算法
A super scalable algorithm for short segment detection
论文作者
论文摘要
在许多应用程序(例如拷贝数变体(CNV)检测)等应用中,目标是确定观测值不同的手段或中位数的短段。这些细分市场通常很短,并且长期隐藏,因此发现非常具有挑战性。我们在本文中研究了超级可扩展的短段(4S)检测算法。这种非参数方法将观测值超过片段检测阈值的位置簇。它在计算上是有效的,并且不依赖于高斯噪声假设。此外,我们开发了一个框架来为检测到的细分分配显着性水平。我们通过理论,模拟和实际数据研究证明了我们提出的方法的优势。
In many applications such as copy number variant (CNV) detection, the goal is to identify short segments on which the observations have different means or medians from the background. Those segments are usually short and hidden in a long sequence, and hence are very challenging to find. We study a super scalable short segment (4S) detection algorithm in this paper. This nonparametric method clusters the locations where the observations exceed a threshold for segment detection. It is computationally efficient and does not rely on Gaussian noise assumption. Moreover, we develop a framework to assign significance levels for detected segments. We demonstrate the advantages of our proposed method by theoretical, simulation, and real data studies.