论文标题

基因形式:使用基于变压器的上下文建模的学习基因压缩

GeneFormer: Learned Gene Compression using Transformer-based Context Modeling

论文作者

Cui, Zhanbei, Liao, Yu, Xu, Tongda, Wang, Yan

论文摘要

随着基因测序技术的发展,已经见证了基因数据的爆炸性增长。基因数据的存储已成为一个重要问题。传统的基因数据压缩方法依赖于G-ZIP等一般软件,该软件无法使用核苷酸序列的相互关系。最近,许多研究人员开始研究基于深度学习的基因数据压缩方法。在本文中,我们提出了一种名为Geneformer的基于变压器的基因压缩方法。具体而言,我们首先引入了修改的变压器结构,以充分探索核苷酸序列依赖性。然后,我们提出固定长度平行分组,以加速自回归模型的解码速度。现实世界数据集的实验结果表明,与最先进的方法相比,我们的方法节省了29.7%的比特率,并且解码速度明显快于所有现有的基于学习的基因压缩方法。

With the development of gene sequencing technology, an explosive growth of gene data has been witnessed. And the storage of gene data has become an important issue. Traditional gene data compression methods rely on general software like G-zip, which fails to utilize the interrelation of nucleotide sequence. Recently, many researchers begin to investigate deep learning based gene data compression method. In this paper, we propose a transformer-based gene compression method named GeneFormer. Specifically, we first introduce a modified transformer structure to fully explore the nucleotide sequence dependency. Then, we propose fixed-length parallel grouping to accelerate the decoding speed of our autoregressive model. Experimental results on real-world datasets show that our method saves 29.7% bit rate compared with the state-of-the-art method, and the decoding speed is significantly faster than all existing learning-based gene compression methods.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源