论文标题
有效的基于流的最大多元化,最小的故障率
Efficient stream-based Max-Min diversification with minimal failure rate
论文作者
论文摘要
流媒体最大多元化问题涉及从已知有限长度的数据流中选择有限且多样化的项目样本。最大化的目标是任何一对选定项目之间的最小距离。我们考虑了不可撤销的选择抽样,在处理流的项目时,必须立即做出直接和不可撤销的决定,这是文献中很少研究的设置。顺序选择的标准算法方法无视选择故障,这是默认情况下的最后一个项目以防止提供不完整的选择集的时间。对于最大最大多元化目标,这种缺陷可能是灾难性的。拟议的故障率最小化(FRM)是一种基于等级的算法,可选择一组不同的项目,此外,还大大降低了失败的可能性。我们通过模拟与现有选择策略相比的模拟表现。
The streaming max-min diversification problem concerns the selection of a limited and diverse sample of items out of a data stream of known finite length. The objective to be maximized is the minimum distance among any pair of selected items. We consider the irrevocable-choice sampling, where decisions need to be immediate and irrevocable while processing the items of the stream, which is a setting little studied in the literature. Standard algorithmic approaches for sequential selection disregard selection failures, which is when the last items of the stream are picked by default, to prevent delivering an incomplete selection set. This defect can be catastrophic for the max-min diversification objective. The proposed Failure Rate Minimization (FRM) is a rank-based algorithm that selects a set of diverse items and, in addition, reduces significantly the probability of having failures. We demonstrate with simulations FRM's performance comparing with existing selection strategies.