通过使用零频率过滤建模源和系统信息，无监督的语音活动检测

论文标题

通过使用零频率过滤建模源和系统信息，无监督的语音活动检测

Unsupervised Voice Activity Detection by Modeling Source and System Information using Zero Frequency Filtering

论文作者

Sarkar, Eklavya, Prasad, RaviShankar, -Doss, Mathew Magimai.

论文摘要

语音活动检测（VAD）是语音技术应用的重要预处理步骤。该任务包括衍生包含声音信息的音频信号的段边界。近年来，已经表明，可以使用零频过滤（ZFF）提取语音源和声道系统信息，而无需对语音信号做出任何明确的模型假设。本文研究了零频过滤的潜力，用于共同建模语音源和声带系统信息，并提出了VAD的两种方法。第一种方法使用由不同零频过滤信号组成的复合信号来表达区域。第二种方法将复合信号作为RVAD算法输入。将这些方法与文献中的其他受监督和无监督的VAD方法进行了比较，并在Aurora -2数据库上进行了评估，这些方法在一系列SNR（20至-5 dB）上进行了评估。我们的研究表明，所提出的基于ZFF的方法的作用与最先进的VAD方法相当，并且更不变，而不是增加降解和不同的通道特征。

Voice activity detection (VAD) is an important pre-processing step for speech technology applications. The task consists of deriving segment boundaries of audio signals which contain voicing information. In recent years, it has been shown that voice source and vocal tract system information can be extracted using zero-frequency filtering (ZFF) without making any explicit model assumptions about the speech signal. This paper investigates the potential of zero-frequency filtering for jointly modeling voice source and vocal tract system information, and proposes two approaches for VAD. The first approach demarcates voiced regions using a composite signal composed of different zero-frequency filtered signals. The second approach feeds the composite signal as input to the rVAD algorithm. These approaches are compared with other supervised and unsupervised VAD methods in the literature, and are evaluated on the Aurora-2 database, across a range of SNRs (20 to -5 dB). Our studies show that the proposed ZFF-based methods perform comparable to state-of-art VAD methods and are more invariant to added degradation and different channel characteristics.

下载PDF全文

下载文献需遵守相关版权规定

论文标题