基于多元EMA时间序列数据的聚类个体

论文标题

基于多元EMA时间序列数据的聚类个体

Clustering individuals based on multivariate EMA time-series data

论文作者

Ntekouli, Mandani, Spanakis, Gerasimos, Waldorp, Lourens, Roefs, Anne

论文摘要

在心理病理学领域，生态瞬时评估（EMA）的方法学进步为收集时间密集型，重复和个体内的测量提供了新的机会。这样，大量数据就可以使用，提供了进一步探索精神障碍的手段。因此，需要先进的机器学习（ML）方法来了解数据特征，并发现有关基础复杂心理过程的隐藏和有意义的关系。除其他用途外，ML通过聚类促进了不同个体数据中类似模式的识别。本文着重于将个体数据的聚类多元时间序列（MTS）数据分为几组。由于聚类是一个无监督的问题，因此评估由此产生的分组是否成功是一项挑战。因此，我们根据不同的距离测量方法研究了不同的聚类方法，并评估它们的稳定性和质量。这些聚类步骤在现实世界中的EMA数据集上进行了说明，其中包括33个个体和15个变量。通过评估，基于内核的聚类方法的结果似乎有望在数据中识别有意义的组。因此，EMA数据的有效表示在聚类中起重要作用。

In the field of psychopathology, Ecological Momentary Assessment (EMA) methodological advancements have offered new opportunities to collect time-intensive, repeated and intra-individual measurements. This way, a large amount of data has become available, providing the means for further exploring mental disorders. Consequently, advanced machine learning (ML) methods are needed to understand data characteristics and uncover hidden and meaningful relationships regarding the underlying complex psychological processes. Among other uses, ML facilitates the identification of similar patterns in data of different individuals through clustering. This paper focuses on clustering multivariate time-series (MTS) data of individuals into several groups. Since clustering is an unsupervised problem, it is challenging to assess whether the resulting grouping is successful. Thus, we investigate different clustering methods based on different distance measures and assess them for the stability and quality of the derived clusters. These clustering steps are illustrated on a real-world EMA dataset, including 33 individuals and 15 variables. Through evaluation, the results of kernel-based clustering methods appear promising to identify meaningful groups in the data. So, efficient representations of EMA data play an important role in clustering.

下载PDF全文

下载文献需遵守相关版权规定

论文标题