论文标题
从古典统计到现代统计和数据科学的变化
Changes from Classical Statistics to Modern Statistics and Data Science
论文作者
论文摘要
坐标系统是每种定量科学,工程和医学的基础。经典物理和统计基于笛卡尔坐标系。经典的概率和假设检验理论只能应用于欧几里得数据。但是,现实世界中的现代数据来自自然语言处理,数学公式,社交网络,运输和传感器网络,计算机视觉,自动化和生物医学测量值。欧几里得假设不适合非欧几里得数据。该观点解决了克服这些基本局限性的迫切需求,并鼓励经典概率理论和假设检验的扩展,扩散模型和从欧几里得空间到非欧几里得空间的随机微分方程。人工智能,例如自然语言处理,计算机视觉,图形神经网络,流动回归和推理理论,流形学习,图形神经网络,自动组成概念组成的组成扩散模型以及揭开机器学习系统的揭开。差异歧管理论也是深度学习和数据科学的数学基础。我们迫切需要将范式从经典的欧几里得数据分析转移到欧几里得和非欧几里得数据分析,并开发越来越多的创新方法来描述,估计和推断非现代真实数据集的非欧几里得几何形状。欧几里得和非欧几里得数据的综合分析的一般框架,复合人工智能,决策情报和边缘AI提供了强大的创新思想和策略,以从根本上推进AI。期望我们将统计数据与AI结合,发展统一的现代统计理论,并推动下一代AI和数据科学。
A coordinate system is a foundation for every quantitative science, engineering, and medicine. Classical physics and statistics are based on the Cartesian coordinate system. The classical probability and hypothesis testing theory can only be applied to Euclidean data. However, modern data in the real world are from natural language processing, mathematical formulas, social networks, transportation and sensor networks, computer visions, automations, and biomedical measurements. The Euclidean assumption is not appropriate for non Euclidean data. This perspective addresses the urgent need to overcome those fundamental limitations and encourages extensions of classical probability theory and hypothesis testing , diffusion models and stochastic differential equations from Euclidean space to non Euclidean space. Artificial intelligence such as natural language processing, computer vision, graphical neural networks, manifold regression and inference theory, manifold learning, graph neural networks, compositional diffusion models for automatically compositional generations of concepts and demystifying machine learning systems, has been rapidly developed. Differential manifold theory is the mathematic foundations of deep learning and data science as well. We urgently need to shift the paradigm for data analysis from the classical Euclidean data analysis to both Euclidean and non Euclidean data analysis and develop more and more innovative methods for describing, estimating and inferring non Euclidean geometries of modern real datasets. A general framework for integrated analysis of both Euclidean and non Euclidean data, composite AI, decision intelligence and edge AI provide powerful innovative ideas and strategies for fundamentally advancing AI. We are expected to marry statistics with AI, develop a unified theory of modern statistics and drive next generation of AI and data science.