论文标题

深度学习算法的文本提取:虚假新闻检测的应用

Feature Extraction of Text for Deep Learning Algorithms: Application on Fake News Detection

论文作者

Kim, HyeonJun

论文摘要

特征提取是机器学习和深度学习的重要过程,因为该过程使算法更有效地发挥作用,并且也是准确的。在欺骗检测(例如假新闻检测)中使用的自然语言处理中,已经引入了统计方面的几种特征提取方法(例如N-gram)。在这项研究中,将表明,通过使用新闻原始文本的深度学习算法和字母频率,而没有任何有关字母顺序的信息,实际上可以用来对假新闻进行分类,并以高准确性(85 \%)对假新闻进行分类。由于这种预处理方法使数据显着紧凑,还包括分类器所需的功能,看来字母频率包含一些有用的功能,可用于理解原始文本的复杂上下文或含义。

Feature extraction is an important process of machine learning and deep learning, as the process make algorithms function more efficiently, and also accurate. In natural language processing used in deception detection such as fake news detection, several ways of feature extraction in statistical aspect had been introduced (e.g. N-gram). In this research, it will be shown that by using deep learning algorithms and alphabet frequencies of the original text of a news without any information about the sequence of the alphabet can actually be used to classify fake news and trustworthy ones in high accuracy (85\%). As this pre-processing method makes the data notably compact but also include the feature that is needed for the classifier, it seems that alphabet frequencies contains some useful features for understanding complex context or meaning of the original text.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源