较小总是更快吗？压缩自我监督的语音变压器方面的权衡

论文标题

较小总是更快吗？压缩自我监督的语音变压器方面的权衡

Is Smaller Always Faster? Tradeoffs in Compressing Self-Supervised Speech Transformers

论文作者

Lin, Tzu-Quan, Yang, Tsung-Huan, Chang, Chun-Yao, Chen, Kuang-Ming, Feng, Tzu-hsun, Lee, Hung-yi, Tang, Hao

论文摘要

基于变压器的自我监督模型在语音处理方面取得了巨大的成功，但是它们的较大规模和高推理成本对现实部署面临着重大挑战。尽管已经提出了许多压缩技术，但不一致的评估指标使得很难比较其实际有效性。在这项工作中，我们对四种常见的压缩方法进行了全面的研究，包括重量修剪，头部修剪，低级别近似以及对自我监督的语音变压器的知识蒸馏。我们在三个关键指标下评估每种方法：参数计数，乘以蓄能操作和实时因子。结果表明，每种方法都具有不同的优势。此外，我们将最新的压缩技术背景化，比较了Distilhubert，Fithubert，Lighthubert，Armhubert和Starhubert在相同的框架下，提供了有关部署压缩的实用指南。

Transformer-based self-supervised models have achieved remarkable success in speech processing, but their large size and high inference cost present significant challenges for real-world deployment. While numerous compression techniques have been proposed, inconsistent evaluation metrics make it difficult to compare their practical effectiveness. In this work, we conduct a comprehensive study of four common compression methods, including weight pruning, head pruning, low-rank approximation, and knowledge distillation on self-supervised speech Transformers. We evaluate each method under three key metrics: parameter count, multiply-accumulate operations, and real-time factor. Results show that each method offers distinct advantages. In addition, we contextualize recent compression techniques, comparing DistilHuBERT, FitHuBERT, LightHuBERT, ARMHuBERT, and STaRHuBERT under the same framework, offering practical guidance on compression for deployment.

下载PDF全文

下载文献需遵守相关版权规定

论文标题