论文标题
Paeback:时间序列数据的帕累托效率返回采样
PaEBack: Pareto-Efficient Backsubsampling for Time Series Data
论文作者
论文摘要
时间序列预测一直是数据科学中的典型话题,但传统上,预测模型依赖于广泛的历史数据。在本文中,我们解决了一个实用的问题:与全日制序列相比,获得统计预测效率的有针对性百分比需要多少?我们提出了帕累托效率的后订采样(PAEBACK)方法,以估计达到所需的预测准确性水平所需的最新数据的百分比。我们基于自回旋(AR)模型的渐近预测理论提供理论上的理由。特别是,通过几个数字插图,我们即使模型可能被误指定,Paeback在最近开发的机器学习预测方法中的应用也是如此。主要的结论是,最新历史数据中只有一小部分为广泛的预测方法提供了几乎最佳甚至更好的相对预测精度。
Time series forecasting has been a quintessential topic in data science, but traditionally, forecasting models have relied on extensive historical data. In this paper, we address a practical question: How much recent historical data is required to attain a targeted percentage of statistical prediction efficiency compared to the full time series? We propose the Pareto-Efficient Backsubsampling (PaEBack) method to estimate the percentage of the most recent data needed to achieve the desired level of prediction accuracy. We provide a theoretical justification based on asymptotic prediction theory for the AutoRegressive (AR) models. In particular, through several numerical illustrations, we show the application of the PaEBack for some recently developed machine learning forecasting methods even when the models might be misspecified. The main conclusion is that only a fraction of the most recent historical data provides near-optimal or even better relative predictive accuracy for a broad class of forecasting methods.