论文标题

缺少位:使用Bfloat16在NVIDIA GPU上找到异国情调的脉冲星

Bits missing: Finding exotic pulsars using bfloat16 on NVIDIA GPUs

论文作者

White, Jack, Adamek, Karel, Roy, Jayanta, Dimoudi, Sofia, Ransom, Scott M., Armour, Wesley

论文摘要

傅立叶域加速度搜索(FDA)是一种用于检测大型射电天文学数据集中微弱二进制脉冲星的有效技术。本文量化了在AstroAccelerate软件包的GPU加速FDA管道中降低数值精度的敏感性影响。先前的实现在整个二进制PULSAR检测管道中使用了IEEE-754单位精确,花费了很大一部分运行时计算GPU加速的FFT。已修改了AstroAccelerate在FDAS例程的傅立叶域卷积部分中使用BFLOAT16(和IEEE754双重精液)提供“黄金标准”比较)。使用具有一系列物理参数的SIGPROC生成了代表二进制脉冲星的大约20,000个合成PULSAR FILLEBANK文件。它们已经使用BFLOAT16,单一和双重精确卷积处理。所有BFLOAT16峰都在其相应单位峰的预测信噪比的3%以内。 14,971个“明亮”单位基本峰值高于44.982的功率(我们的实验性测量最高噪声值),14,602(97.53%)(97.53%)在BFLOAT16的相同加速度和频率bin中达到峰值,而在剩余的369中,在剩余的369中,在剩余的369峰中,邻接加速度位于邻接加速度BIN中。单个和双重精确结果之间没有测量的bin漂移。与单精度相比,FDA的BFLOAT16版本的加速约为1.6倍。使用PSR J1544+4937的GMRT收集的观测值,在2.8小时紧凑型轨道中收集了2.16毫秒黑寡妇Pulsar。

The Fourier Domain Acceleration Search (FDAS) is an effective technique for detecting faint binary pulsars in large radio astronomy datasets. This paper quantifies the sensitivity impact of reducing numerical precision in the GPU accelerated FDAS pipeline of the AstroAccelerate software package. The prior implementation used IEEE-754 single-precision in the entire binary pulsar detection pipeline, spending a large fraction of the runtime computing GPU accelerated FFTs. AstroAccelerate has been modified to use bfloat16 (and IEEE754 double-precision to provide a "gold standard" comparison) within the Fourier domain convolution section of the FDAS routine. Approximately 20,000 synthetic pulsar filterbank files representing binary pulsars were generated using SIGPROC with a range of physical parameters. They have been processed using bfloat16, single and double-precision convolutions. All bfloat16 peaks are within 3% of the predicted signal-to-noise ratio of their corresponding single-precision peaks. Of 14,971 "bright" single-precision fundamental peaks above a power of 44.982 (our experimentally measured highest noise value), 14,602 (97.53%) have a peak in the same acceleration and frequency bin in the bfloat16 output plane, whilst in the remaining 369 the nearest peak is located in the adjacent acceleration bin. There is no bin drift measured between the single and double-precision results. The bfloat16 version of FDAS achieves a speedup of approximately 1.6x compared to single-precision. A comparison between AstroAccelerate and the PRESTO software package is presented using observations collected with the GMRT of PSR J1544+4937, a 2.16ms black widow pulsar in a 2.8 hour compact orbit.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源