论文标题
基于熵的建模约束表征
Entropy-based Characterization of Modeling Constraints
论文作者
论文摘要
在大多数数据科学方法中,最大熵的原理(Maxent)用于后验证明某些已根据经验,先验知识或计算简单性选择的参数模型是合理的。在传统模型构建的垂直公式中,我们从现象学约束的线性系统开始,渐近地在满足提供的约束集合集的所有可行分布上得出分布。最大分布起着特殊的作用,因为它是所有现象学上可行的分布中最典型的,代表了大N技术的良好膨胀点。这使我们能够以完全DATA驱动的方式始终如一地提出假设检验。数据支持的适当参数模型可以在模型选择结束时始终推导。在Maxent框架中,我们恢复了多个应用程序中使用的主要分数和选择过程,并评估其在数据生成过程中捕获关联并确定最概括的模型的能力。标准模型选择的数据驱动的对应物展示了由Maxent原理提倡的演绎逻辑的统一预期,同时有可能为反问题提供新的见解。
In most data-scientific approaches, the principle of Maximum Entropy (MaxEnt) is used to a posteriori justify some parametric model which has been already chosen based on experience, prior knowledge or computational simplicity. In a perpendicular formulation to conventional model building, we start from the linear system of phenomenological constraints and asymptotically derive the distribution over all viable distributions that satisfy the provided set of constraints. The MaxEnt distribution plays a special role, as it is the most typical among all phenomenologically viable distributions representing a good expansion point for large-N techniques. This enables us to consistently formulate hypothesis testing in a fully-data driven manner. The appropriate parametric model which is supported by the data can be always deduced at the end of model selection. In the MaxEnt framework, we recover major scores and selection procedures used in multiple applications and assess their ability to capture associations in the data-generating process and identify the most generalizable model. This data-driven counterpart of standard model selection demonstrates the unifying prospective of the deductive logic advocated by MaxEnt principle, while potentially shedding new insights to the inverse problem.