论文标题
ARM 4位PQ:基于SIMD的加速度,用于大约最近的邻居搜索手臂
ARM 4-BIT PQ: SIMD-based Acceleration for Approximate Nearest Neighbor Search on ARM
论文作者
论文摘要
我们在ARM架构上加速了4位产品量化(PQ)。值得注意的是,常规4位PQ的急剧性能强烈依赖于X64特异性SIMD寄存器,例如AVX2;因此,我们还无法在ARM上取得如此出色的表现。为了填补这一空白,我们首先将两个128位寄存器捆绑为一个256位组件。然后,我们使用ARM特定的霓虹灯指令为每个操作应用洗牌操作。通过进行这种简单但批判性的修改,我们为4位PQ在ARM架构上实现了巨大的加速。实验表明,所提出的方法始终以相同的精度比幼稚的PQ提高了10倍。
We accelerate the 4-bit product quantization (PQ) on the ARM architecture. Notably, the drastic performance of the conventional 4-bit PQ strongly relies on x64-specific SIMD register, such as AVX2; hence, we cannot yet achieve such good performance on ARM. To fill this gap, we first bundle two 128-bit registers as one 256-bit component. We then apply shuffle operations for each using the ARM-specific NEON instruction. By making this simple but critical modification, we achieve a dramatic speedup for the 4-bit PQ on an ARM architecture. Experiments show that the proposed method consistently achieves a 10x improvement over the naive PQ with the same accuracy.