论文标题
用于建模多元时间序列的表格变压器
Tabular Transformers for Modeling Multivariate Time Series
论文作者
论文摘要
表格数据集在数据科学应用程序中无处不在。鉴于它们的重要性,使用最先进的深度学习算法以充分发挥其潜力似乎很自然。在这里,我们提出了代表可以选择利用其层次结构的表格时间序列的神经网络模型。这导致了对表格时间序列的两个体系结构:一个用于类似于BERT的学习表示形式,可以是预先训练的端到端,并用于下游任务,类似于GPT,可用于生成现实的合成表格序列。我们在两个数据集上演示了我们的模型:合成信用卡事务数据集,其中学习的表示形式用于欺诈检测和合成数据生成,以及在实际污染数据集上,在该数据集中使用了所学的编码来预测大气污染物的浓度。代码和数据可在https://github.com/ibm/tabformer上找到。
Tabular datasets are ubiquitous in data science applications. Given their importance, it seems natural to apply state-of-the-art deep learning algorithms in order to fully unlock their potential. Here we propose neural network models that represent tabular time series that can optionally leverage their hierarchical structure. This results in two architectures for tabular time series: one for learning representations that is analogous to BERT and can be pre-trained end-to-end and used in downstream tasks, and one that is akin to GPT and can be used for generation of realistic synthetic tabular sequences. We demonstrate our models on two datasets: a synthetic credit card transaction dataset, where the learned representations are used for fraud detection and synthetic data generation, and on a real pollution dataset, where the learned encodings are used to predict atmospheric pollutant concentrations. Code and data are available at https://github.com/IBM/TabFormer.