论文标题

多变量时间序列分类的经验评估,并在不同维度上进行输入转换

An Empirical Evaluation of Multivariate Time Series Classification with Input Transformation across Different Dimensions

论文作者

Pantiskas, Leonardos, Verstoep, Kees, Hoogendoorn, Mark, Bal, Henri

论文摘要

在当前的研究中,用于分类时间数据的机器和深度学习解决方案正在从单渠道数据集(单变量)转移到具有多个信息渠道(多变量)的问题。这些作品中的大多数都集中在方法和架构上,并且输入数据的格式通常被隐式处理。特别是,在输入预处理方面,多元数据集通常被视为单变量时间序列的堆栈,并分别在每个通道上使用缩放方法。在此评估中,我们的目的是证明附加的通道维度远非微不足道,而缩放的不同方法可能会导致溶液准确性显着不同。为此,我们在四个不同的时间维度上测试了七种不同的数据转换方法,并研究了它们对五种最近方法的分类准确性的影响。我们表明,对于大多数经过测试的数据集,与每个模型相比具有相同的超参数且没有缩放的结果,最佳的转换差异配置可提高准确性,范围为0.16至76.79个百分点。我们还表明,如果我们保持转换方法的恒定,则将其在不同维度上应用时的准确性结果存在统计学上的显着差异,精度差异为0.23至47.79个百分点。最后,我们探讨了转换方法和尺寸与分类器的关系,我们得出结论,没有明显的一般趋势,最佳配置是数据集和分类器特定的。

In current research, machine and deep learning solutions for the classification of temporal data are shifting from single-channel datasets (univariate) to problems with multiple channels of information (multivariate). The majority of these works are focused on the method novelty and architecture, and the format of the input data is often treated implicitly. Particularly, multivariate datasets are often treated as a stack of univariate time series in terms of input preprocessing, with scaling methods applied across each channel separately. In this evaluation, we aim to demonstrate that the additional channel dimension is far from trivial and different approaches to scaling can lead to significantly different results in the accuracy of a solution. To that end, we test seven different data transformation methods on four different temporal dimensions and study their effect on the classification accuracy of five recent methods. We show that, for the large majority of tested datasets, the best transformation-dimension configuration leads to an increase in the accuracy compared to the result of each model with the same hyperparameters and no scaling, ranging from 0.16 to 76.79 percentage points. We also show that if we keep the transformation method constant, there is a statistically significant difference in accuracy results when applying it across different dimensions, with accuracy differences ranging from 0.23 to 47.79 percentage points. Finally, we explore the relation of the transformation methods and dimensions to the classifiers, and we conclude that there is no prominent general trend, and the optimal configuration is dataset- and classifier-specific.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源