论文标题

超越社交媒体分析:使用自我结构化机器学习了解人类的行为和深层情绪

Beyond Social Media Analytics: Understanding Human Behaviour and Deep Emotion using Self Structuring Incremental Machine Learning

论文作者

Bandaragoda, Tharindu

论文摘要

本文考虑了一个概念框架,考虑到社会数据代表人类社会行为,需求和认知等级的表面层,该层面被用来将社会数据转化为保留社会行为及其因果关系的表示。基于此框架,建立了两个平台,以捕获快节奏和缓慢节奏的社交数据的见解。为了快节奏,开发了一种自动结构和增量学习技术,以自动捕获随着时间的推移的显着主题和相应的动态。开发了一种事件检测技术,以自动监视那些使用多个指标(例如音量和情感)在社交行为中发生重大波动的识别主题途径。使用两个超过100万条推文的大型数据集证明了该平台。分开的主题途径代表了每个实体的关键主题,并反对主题连贯性措施相干。针对新闻报道的当代事件进行了验证。其次,对于慢节奏的社交数据,开发了一套新的机器学习和自然语言处理技术,以自动捕获个人事件的人口统计学,情感和时间表等个人的自我披露信息。该平台经过大量的文本语料库,该文本语料库是超过400万个从在线支持小组中收集的帖子。进一步扩展了这一点,以将相关的在线支持小组讨论转变为多维代表,并针对时间,人口统计和临床因素调查了患者(和伴侣)的自我披露生活质量。该扩展平台的功能已通过从10个前列腺癌在线支持小组收集的文本语料库证明,其中包括609,960个前列腺癌讨论和22,233例患者。

This thesis develops a conceptual framework considering social data as representing the surface layer of a hierarchy of human social behaviours, needs and cognition which is employed to transform social data into representations that preserve social behaviours and their causalities. Based on this framework two platforms were built to capture insights from fast-paced and slow-paced social data. For fast-paced, a self-structuring and incremental learning technique was developed to automatically capture salient topics and corresponding dynamics over time. An event detection technique was developed to automatically monitor those identified topic pathways for significant fluctuations in social behaviours using multiple indicators such as volume and sentiment. This platform is demonstrated using two large datasets with over 1 million tweets. The separated topic pathways were representative of the key topics of each entity and coherent against topic coherence measures. Identified events were validated against contemporary events reported in news. Secondly for the slow-paced social data, a suite of new machine learning and natural language processing techniques were developed to automatically capture self-disclosed information of the individuals such as demographics, emotions and timeline of personal events. This platform was trialled on a large text corpus of over 4 million posts collected from online support groups. This was further extended to transform prostate cancer related online support group discussions into a multidimensional representation and investigated the self-disclosed quality of life of patients (and partners) against time, demographics and clinical factors. The capabilities of this extended platform have been demonstrated using a text corpus collected from 10 prostate cancer online support groups comprising of 609,960 prostate cancer discussions and 22,233 patients.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源