社交媒体上库尔德短文的医学数据集分类

论文标题

社交媒体上库尔德短文的医学数据集分类

Medical Dataset Classification for Kurdish Short Text over Social Media

论文作者

Saeed, Ari M., Hussein, Shnya R., Ali, Chro M., Rashid, Tarik A.

论文摘要

Facebook应用程序被用作收集此数据集评论的资源，该数据集由6756个注释组成，以创建医疗库尔德数据集（MKD）。样本是用户的评论，这些评论是从不同的页面（医学，新闻，经济，教育和运动）中收集的。在RAW数据集上执行了六个步骤作为预处理技术，以通过更换字符来清洁和消除评论中的噪音。评论（简短文本）被标记为积极的类（医学评论）和负面类（非医学评论）作为文本分类。负类别的百分比比例为55％，而正等级为45％。

The Facebook application is used as a resource for collecting the comments of this dataset, The dataset consists of 6756 comments to create a Medical Kurdish Dataset (MKD). The samples are comments of users, which are gathered from different posts of pages (Medical, News, Economy, Education, and Sport). Six steps as a preprocessing technique are performed on the raw dataset to clean and remove noise in the comments by replacing characters. The comments (short text) are labeled for positive class (medical comment) and negative class (non-medical comment) as text classification. The percentage ratio of the negative class is 55% while the positive class is 45%.

下载PDF全文

下载文献需遵守相关版权规定

论文标题