具有音色和中级感知特征的元评估的歌手识别

论文标题

具有音色和中级感知特征的元评估的歌手识别

Singer Identification for Metaverse with Timbral and Middle-Level Perceptual Features

论文作者

Zhang, Xulong, Wang, Jianzong, Cheng, Ning, Xiao, Jing

论文摘要

Metaverse是一个结合现实和虚拟性的互动世界，参与者可以成为虚拟化身。任何人都可以在虚拟音乐厅举行音乐会，用户可以通过歌手身份证快速识别虚拟偶像背后的真正歌手。大多数歌手识别方法都是使用框架级功能处理的。但是，期望这位歌手的音色，音乐框架包括音乐信息，例如悠扬，节奏和音调。这意味着音乐信息是使用框架级功能来识别歌手的噪音。在本文中，我们建议使用其他两个解决此问题的功能，而不是仅仅框架级功能。中级功能代表音乐的悠扬性，有节奏的稳定性和音调稳定性，并能够捕获音乐的感知功能。音色功能用于扬声器标识，代表歌手的语音功能。此外，我们提出了一个卷积复发性神经网络（CRNN），以结合三个用于歌手识别的特征。该模型首先融合框架级功能和音色功能，然后将中层功能与混合功能结合在一起。在实验中，所提出的方法在Artist20的基准数据集上的平均F1分数达到了可比性的性能，这显着改善了相关工作。

Metaverse is an interactive world that combines reality and virtuality, where participants can be virtual avatars. Anyone can hold a concert in a virtual concert hall, and users can quickly identify the real singer behind the virtual idol through the singer identification. Most singer identification methods are processed using the frame-level features. However, expect the singer's timbre, the music frame includes music information, such as melodiousness, rhythm, and tonal. It means the music information is noise for using frame-level features to identify the singers. In this paper, instead of only the frame-level features, we propose to use another two features that address this problem. Middle-level feature, which represents the music's melodiousness, rhythmic stability, and tonal stability, and is able to capture the perceptual features of music. The timbre feature, which is used in speaker identification, represents the singers' voice features. Furthermore, we propose a convolutional recurrent neural network (CRNN) to combine three features for singer identification. The model firstly fuses the frame-level feature and timbre feature and then combines middle-level features to the mix features. In experiments, the proposed method achieves comparable performance on an average F1 score of 0.81 on the benchmark dataset of Artist20, which significantly improves related works.

下载PDF全文

下载文献需遵守相关版权规定

论文标题