论文标题
基于tiktok评论数据集建立特定词典
Building domain specific lexicon based on TikTok comment dataset
论文作者
论文摘要
在情感分析任务中,预测句子的情感趋势是一个重要的分支。以前的研究更多地集中在英语中的情感分析上,例如,根据价,唤醒,句子的主导地位分析句子的情感趋势。两种语言之间的情感趋势是不同的。例如,中文和英语之间的句子顺序可能会带来不同的情绪。本文尝试了一种构建特定域词典的方法。通过这种方式,模型可以用情感倾向将中文单词分类。在这种方法中,基于[13],通过嵌入中国Tiktok评论和情感词典来源(种子单词)的单词嵌入了一个超密集的空间嵌入表。该模型的结果是一个特定领域的词典,它表明了单词的情感趋势。我收集了中国Tiktok评论作为培训数据。通过将训练结果与PCA方法进行比较以评估中国情感分类的模型的性能,结果表明该模型在中文方面做得很好。源代码已在github上发布:https://github.com/h22222/douyin_comment_dataset
In the sentiment analysis task, predicting the sentiment tendency of a sentence is an important branch. Previous research focused more on sentiment analysis in English, for example, analyzing the sentiment tendency of sentences based on Valence, Arousal, Dominance of sentences. the emotional tendency is different between the two languages. For example, the sentence order between Chinese and English may present different emotions. This paper tried a method that builds a domain-specific lexicon. In this way, the model can classify Chinese words with emotional tendency. In this approach, based on the [13], an ultra-dense space embedding table is trained through word embedding of Chinese TikTok review and emotional lexicon sources(seed words). The result of the model is a domain-specific lexicon, which presents the emotional tendency of words. I collected Chinese TikTok comments as training data. By comparing The training results with the PCA method to evaluate the performance of the model in Chinese sentiment classification, the results show that the model has done well in Chinese. The source code has released on github:https://github.com/h2222/douyin_comment_dataset