基于DNN的后过滤器，以提高MDCT域中编码语音的质量

论文标题

基于DNN的后过滤器，以提高MDCT域中编码语音的质量

A DNN Based Post-Filter to Enhance the Quality of Coded Speech in MDCT Domain

论文作者

Gupta, Kishan, Korse, Srikanth, Edler, Bernd, Fuchs, Guillaume

论文摘要

频域处理，特别是使用修改的离散余弦变换（MDCT）是音频编码的最广泛方法。但是，在低比特率下，音频质量，尤其是对于语音，由于缺乏可用的位来直接编码变换系数，因此会大大降低。传统上，过滤后被用来通过利用源的A-Priori信息和额外的传输参数来减轻编码语音中的人工制品。最近，数据驱动的后过滤器显示出更好的结果，但以明显的额外复杂性和延迟为代价。在这项工作中，我们提出了直接在编解码器的MDCT域中运行的基于掩模的后过滤器，没有引起额外的延迟。实价蒙版应用于量化的MDCT系数，并根据相对轻巧的卷积编码器网络估算。我们的解决方案在最近标准化的低延迟，低复杂编解码器（LC3）上测试，最低比特率的比特率为16 kbps。客观和主观评估清楚地表明了这种方法比传统后过滤器的优势，而LC3编码语音的平均提高了10个Mushra点。

Frequency domain processing, and in particular the use of Modified Discrete Cosine Transform (MDCT), is the most widespread approach to audio coding. However, at low bitrates, audio quality, especially for speech, degrades drastically due to the lack of available bits to directly code the transform coefficients. Traditionally, post-filtering has been used to mitigate artefacts in the coded speech by exploiting a-priori information of the source and extra transmitted parameters. Recently, data-driven post-filters have shown better results, but at the cost of significant additional complexity and delay. In this work, we propose a mask-based post-filter operating directly in MDCT domain of the codec, inducing no extra delay. The real-valued mask is applied to the quantized MDCT coefficients and is estimated from a relatively lightweight convolutional encoder-decoder network. Our solution is tested on the recently standardized low-delay, low-complexity codec (LC3) at lowest possible bitrate of 16 kbps. Objective and subjective assessments clearly show the advantage of this approach over the conventional post-filter, with an average improvement of 10 MUSHRA points over the LC3 coded speech.

下载PDF全文

下载文献需遵守相关版权规定

论文标题