论文标题

分类器边界的结构:天真贝叶斯分类器的案例研究

Structure of Classifier Boundaries: Case Study for a Naive Bayes Classifier

论文作者

Karr, Alan F., Bowen, Zac, Porter, Adam A.

论文摘要

无论是基于模型,培训数据还是组合,分类器将输入数据(可能是复杂)放置在相对较少的输出类别之一中。在本文中,我们研究了边界的结构 - 邻居对邻居的分类不同 - 在输入空间的上下文中是图形的,因此有一个邻近输入的概念,科学环境是基于模型的NAIVE BAYES分类器,用于下一代序列产生的DNA读数。我们表明边界在结构上既大又复杂。我们创建了一个新的不确定性度量,称为邻居相似性,该度量比较了其邻居分布的结果。该措施不仅跟踪贝叶斯分类器的两个固有的不确定性度量,而且还可以以计算成本来实施分类器,而没有固有的不确定性措施。

Whether based on models, training data or a combination, classifiers place (possibly complex) input data into one of a relatively small number of output categories. In this paper, we study the structure of the boundary--those points for which a neighbor is classified differently--in the context of an input space that is a graph, so that there is a concept of neighboring inputs, The scientific setting is a model-based naive Bayes classifier for DNA reads produced by Next Generation Sequencers. We show that the boundary is both large and complicated in structure. We create a new measure of uncertainty, called Neighbor Similarity, that compares the result for a point to the distribution of results for its neighbors. This measure not only tracks two inherent uncertainty measures for the Bayes classifier, but also can be implemented, at a computational cost, for classifiers without inherent measures of uncertainty.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源