论文标题
数据分析中的六本教科书错误
Six textbook mistakes in data analysis
论文作者
论文摘要
本文讨论了有关数据分析,机器学习或计算方法的教科书中出现的许多不正确陈述。在所有这些情况下,共同的主题是统计数据在科学或工程数据的研究中的相关性和应用;这些错误在研究文献中也很普遍。至关重要的是,我们没有解决单个作者的错误,而是专注于介绍性文献中普遍存在的错误。在经常出现了频繁主义和贝叶斯线性回归的背景之后,我们转到了六个范式案例,在每个实例中提供了一个特定的教科书错误示例,指示了正确处理主题的专业文献,并进行了校正,总结了显着点。这些错误(和更正)与任何用于得出实际结论的技术环境广泛相关,包括从一门实验测量的基础课程中引入的主题到一路的主题到更多涉及的回归方法。
This article discusses a number of incorrect statements appearing in textbooks on data analysis, machine learning, or computational methods; the common theme in all these cases is the relevance and application of statistics to the study of scientific or engineering data; these mistakes are also quite prevalent in the research literature. Crucially, we do not address errors made by an individual author, focusing instead on mistakes that are widespread in the introductory literature. After some background on frequentist and Bayesian linear regression, we turn to our six paradigmatic cases, providing in each instance a specific example of the textbook mistake, pointers to the specialist literature where the topic is handled properly, along with a correction that summarizes the salient points. The mistakes (and corrections) are broadly relevant to any technical setting where statistical techniques are used to draw practical conclusions, ranging from topics introduced in an elementary course on experimental measurements all the way to more involved approaches to regression.