使用有监督的机器学习算法的药物评论中的情感分析

论文标题

使用有监督的机器学习算法的药物评论中的情感分析

Sentiment Analysis in Drug Reviews using Supervised Machine Learning Algorithms

论文作者

Vijayaraghavan, Sairamvinay, Basu, Debraj

论文摘要

情感分析是自然语言处理的重要算法，用于检测某些文本中的情感。在我们的项目中，我们选择了分析以文本形式进行审查的各种药物的评论，并以1-10的比例给予了评级。我们从UCI机器学习存储库中获得了此数据集，该存储库有2个数据集：训练和测试（分为75-25 \％）。我们将药物的数量额定值分为三个类别：正（7-10），负（1-4）或中性（4-7）。对于属于类似情况的药物有多次评论，我们决定调查不同条件的评论如何使用不同单词影响药物的评分。我们的目的主要是实施有监督的机器学习分类算法，这些算法使用文本审查预测评级类别。我们主要实现了不同的嵌入，例如术语频率逆文档频率（TFIDF）和计数向量（CV）。我们在数据集中训练了最流行条件的模型，例如“节育”，“抑郁”和“疼痛”，并在预测测试数据集的同时获得了良好的结果。

Sentiment Analysis is an important algorithm in Natural Language Processing which is used to detect sentiment within some text. In our project, we had chosen to work on analyzing reviews of various drugs which have been reviewed in form of texts and have also been given a rating on a scale from 1-10. We had obtained this data set from the UCI machine learning repository which had 2 data sets: train and test (split as 75-25\%). We had split the number rating for the drug into three classes in general: positive (7-10), negative (1-4) or neutral(4-7). There are multiple reviews for the drugs that belong to a similar condition and we decided to investigate how the reviews for different conditions use different words impact the ratings of the drugs. Our intention was mainly to implement supervised machine learning classification algorithms that predict the class of the rating using the textual review. We had primarily implemented different embeddings such as Term Frequency Inverse Document Frequency (TFIDF) and the Count Vectors (CV). We had trained models on the most popular conditions such as "Birth Control", "Depression" and "Pain" within the data set and obtained good results while predicting the test data sets.

下载PDF全文

下载文献需遵守相关版权规定

论文标题