太细还是太粗？可靠的左右眼睛跟踪分类器的数据复杂性的Goldilocks组成

论文标题

太细还是太粗？可靠的左右眼睛跟踪分类器的数据复杂性的Goldilocks组成

Too Fine or Too Coarse? The Goldilocks Composition of Data Complexity for Robust Left-Right Eye-Tracking Classifiers

论文作者

Xiang, Brian, Abdelmonsef, Abdelrahman

论文摘要

基准数据和现实世界数据之间的分布模式差异一直是使用脑电图（EEG）信号进行眼球跟踪（ET）分类的主要挑战之一。因此，增加机器学习模型在预测脑电图数据的眼睛跟踪位置方面的鲁棒性是研究和消费者使用不可或缺的一部分。以前，我们比较了仅在细粒度数据上训练的分类器的性能与仅在粗粒元中训练的人的性能。结果表明，尽管鲁棒性的总体改善，但与经过粗粒的训练模型相比，经过细粒训练的模型的性能下降了，当时测试和训练集包含相同的分布模式\ cite \ cite {vectorbased}。本文旨在通过使用混合数据复杂性的数据集来解决这种情况，以确定细粒和粗粒数据的理想分布。我们使用由细粒和粗粒数据组成的混合数据集训练机器学习模型，然后将精度与仅使用细粒或粗粒数据训练的模型进行比较。出于我们的目的，细粒数据是指使用更复杂的方法收集的数据，而粗粒的数据是指使用更简单的方法收集的数据。我们应用协变量分配变化来测试每个训练集的敏感性。我们的结果表明，用于EEG-ET分类的最佳训练数据集不是由仅由细粒或粗粒的数据组成，而是两者的混合物组成，倾向于细粒度。

The differences in distributional patterns between benchmark data and real-world data have been one of the main challenges of using electroencephalogram (EEG) signals for eye-tracking (ET) classification. Therefore, increasing the robustness of machine learning models in predicting eye-tracking positions from EEG data is integral for both research and consumer use. Previously, we compared the performance of classifiers trained solely on finer-grain data to those trained solely on coarse-grain. Results indicated that despite the overall improvement in robustness, the performance of the fine-grain trained models decreased, compared to coarse-grain trained models, when the testing and training set contained the same distributional patterns \cite{vectorbased}. This paper aims to address this case by training models using datasets of mixed data complexity to determine the ideal distribution of fine- and coarse-grain data. We train machine learning models utilizing a mixed dataset composed of both fine- and coarse-grain data and then compare the accuracies to models trained using solely fine- or coarse-grain data. For our purposes, finer-grain data refers to data collected using more complex methods whereas coarser-grain data refers to data collected using more simple methods. We apply covariate distributional shifts to test for the susceptibility of each training set. Our results indicated that the optimal training dataset for EEG-ET classification is not composed of solely fine- or coarse-grain data, but rather a mix of the two, leaning towards finer-grain.

下载PDF全文

下载文献需遵守相关版权规定

论文标题