冲浪：通过向繁忙和嘈杂的最终用户学习来改善生产分类器

论文标题

冲浪：通过向繁忙和嘈杂的最终用户学习来改善生产分类器

SURF: Improving classifiers in production by learning from busy and noisy end users

论文作者

Lockhart, Joshua, Assefa, Samuel, Alajdad, Ayham, Alexander, Andrew, Balch, Tucker, Veloso, Manuela

论文摘要

监督的学习分类器不可避免地会犯错误，也许会误导一封电子邮件，或将原本常规交易标记为欺诈性。至关重要的是，为这样一个系统的最终用户提供了一种重新标记的数据点的方法，他们认为这是错误的标签。然后可以在重新标记的数据点上重新训练分类器，以期改善性能。为了减少此反馈数据中的噪音，可以采用众包文献中众所周知的算法。但是，反馈设置提供了一个新的挑战：在用户无响应的情况下，我们如何知道该怎么办？如果用户向我们提供标签上没有反馈，那么假设他们隐含地同意可能是危险的：用户可以忙碌，懒惰或不再是系统的用户！我们表明，在此用户反馈设置中，常规的众包算法遇到了困难，并提出了一种新算法冲浪，可以应对这种无响应的歧义。

Supervised learning classifiers inevitably make mistakes in production, perhaps mis-labeling an email, or flagging an otherwise routine transaction as fraudulent. It is vital that the end users of such a system are provided with a means of relabeling data points that they deem to have been mislabeled. The classifier can then be retrained on the relabeled data points in the hope of performance improvement. To reduce noise in this feedback data, well known algorithms from the crowdsourcing literature can be employed. However, the feedback setting provides a new challenge: how do we know what to do in the case of user non-response? If a user provides us with no feedback on a label then it can be dangerous to assume they implicitly agree: a user can be busy, lazy, or no longer a user of the system! We show that conventional crowdsourcing algorithms struggle in this user feedback setting, and present a new algorithm, SURF, that can cope with this non-response ambiguity.

下载PDF全文

下载文献需遵守相关版权规定

论文标题