论文标题
高保真音频生成和代表性学习,带有指导的对抗自动编码器
High-Fidelity Audio Generation and Representation Learning with Guided Adversarial Autoencoder
论文作者
论文摘要
从未标记的音频数据中学习无监督的表示的表示,高保真音频产生已成为机器学习研究领域的两个Linchpins。但是,从无监督的环境中学到的表示形式并不能保证其对手头的任何下游任务的可用性,如果对该特定的后部工作进行了培训,则可能是对资源的浪费。同样,在代表学习期间,如果该模型对下游任务高度偏见,则它会损失其概括能力,从而直接受益于下游工作,但是将其扩展到其他相关任务的能力将丢失。因此,为了填补这一空白,我们提出了一种新的基于自动装码的模型,名为“指导对抗自动编码器(GAAE)”,该模型可以同时学习特定后的表示后的表示和一般表示,从而捕获了培训数据中利用少数标记样品的变化因素;因此,使其适合将来相关的任务。此外,我们提出的模型可以产生具有较高质量的音频,这与真实音频样品没有区别。因此,通过广泛的实验结果,我们证明,通过利用高保真音频产生的能力,提出的GAAE模型可以从无标记的数据集中学习强大的表示,从而利用标记数据的比例较少作为监督/指导。
Unsupervised disentangled representation learning from the unlabelled audio data, and high fidelity audio generation have become two linchpins in the machine learning research fields. However, the representation learned from an unsupervised setting does not guarantee its' usability for any downstream task at hand, which can be a wastage of the resources, if the training was conducted for that particular posterior job. Also, during the representation learning, if the model is highly biased towards the downstream task, it losses its generalisation capability which directly benefits the downstream job but the ability to scale it to other related task is lost. Therefore, to fill this gap, we propose a new autoencoder based model named "Guided Adversarial Autoencoder (GAAE)", which can learn both post-task-specific representations and the general representation capturing the factors of variation in the training data leveraging a small percentage of labelled samples; thus, makes it suitable for future related tasks. Furthermore, our proposed model can generate audio with superior quality, which is indistinguishable from the real audio samples. Hence, with the extensive experimental results, we have demonstrated that by harnessing the power of the high-fidelity audio generation, the proposed GAAE model can learn powerful representation from unlabelled dataset leveraging a fewer percentage of labelled data as supervision/guidance.