乌尔都语语音和基于文本的情感分析仪

论文标题

乌尔都语语音和基于文本的情感分析仪

Urdu Speech and Text Based Sentiment Analyzer

论文作者

Ahmad, Waqar, Edalati, Maryam

论文摘要

发现别人认为是我们信息收集策略的关键方面。现在，人们可以积极利用信息技术来寻找和理解他人的想法，这要归功于越来越多的意见资源（例如在线评论网站和个人博客）的越来越多。由于其在理解人们的意见方面的关键功能，情感分析（SA）是至关重要的任务。另一方面，现有的研究主要集中在英语上，只有少量研究专门研究低资源语言。对于情感分析，这项工作根据用户评估提供了一个新的多级乌尔都语数据集。高音扬声器网站用于获取乌尔都语数据集。我们提出的数据集包括10,000个评论，这些评论已被人类专家仔细地分为两类：正面，负面。这项研究的主要目的是构建一个手动注释的数据集进行乌尔都语情绪分析，并确定基线结果。采用了五种不同的词典和规则的算法，包括NaiveBayes，Stanza，TextBlob，Vader和Flair，实验结果表明，其精度为70％的天赋优于其他经过测试过的算法。

Discovering what other people think has always been a key aspect of our information-gathering strategy. People can now actively utilize information technology to seek out and comprehend the ideas of others, thanks to the increased availability and popularity of opinion-rich resources such as online review sites and personal blogs. Because of its crucial function in understanding people's opinions, sentiment analysis (SA) is a crucial task. Existing research, on the other hand, is primarily focused on the English language, with just a small amount of study devoted to low-resource languages. For sentiment analysis, this work presented a new multi-class Urdu dataset based on user evaluations. The tweeter website was used to get Urdu dataset. Our proposed dataset includes 10,000 reviews that have been carefully classified into two categories by human experts: positive, negative. The primary purpose of this research is to construct a manually annotated dataset for Urdu sentiment analysis and to establish the baseline result. Five different lexicon- and rule-based algorithms including Naivebayes, Stanza, Textblob, Vader, and Flair are employed and the experimental results show that Flair with an accuracy of 70% outperforms other tested algorithms.

下载PDF全文

下载文献需遵守相关版权规定

论文标题