低概率状态，数据统计和熵估计

论文标题

低概率状态，数据统计和熵估计

Low probability states, data statistics, and entropy estimation

论文作者

Hernández, Damián G., Roman, Ahmed, Nemenman, Ilya

论文摘要

分析复杂系统的一个基本问题是对其在状态空间上的概率分布的熵进行了可靠的估计。这很困难，因为未采样的状态可以对熵产生重大贡献，而它们不构成熵的最大似然估计量，这用观察到的频率代替了概率。贝叶斯估计器通过引入概率分布的低概率尾巴的模型来克服这一障碍。观察到的数据的哪些统计特征决定了尾巴的模型，因此该估计器的输出尚不清楚。在这里，我们表明，离散状态空间上概率分布的概率分布的熵估计值主要基于数据的少数数据统计数据：样本量，最大似然估计，样品之间的同时数量，共同巧合的分散。我们根据这些统计数据得出了无效分布的近似分析熵估计量，我们使用结果来提出对贝叶斯熵估计器如何工作的直观理解。

A fundamental problem in analysis of complex systems is getting a reliable estimate of entropy of their probability distributions over the state space. This is difficult because unsampled states can contribute substantially to the entropy, while they do not contribute to the Maximum Likelihood estimator of entropy, which replaces probabilities by the observed frequencies. Bayesian estimators overcome this obstacle by introducing a model of the low-probability tail of the probability distribution. Which statistical features of the observed data determine the model of the tail, and hence the output of such estimators, remains unclear. Here we show that well-known entropy estimators for probability distributions on discrete state spaces model the structure of the low probability tail based largely on few statistics of the data: the sample size, the Maximum Likelihood estimate, the number of coincidences among the samples, the dispersion of the coincidences. We derive approximate analytical entropy estimators for undersampled distributions based on these statistics, and we use the results to propose an intuitive understanding of how the Bayesian entropy estimators work.

下载PDF全文

下载文献需遵守相关版权规定

论文标题