论文标题
带有多个参考数据集的共振异常检测
Resonant Anomaly Detection with Multiple Reference Datasets
论文作者
论文摘要
高能物理学中共振异常检测的重要一类技术建立了可以区分参考数据集和目标数据集的模型,而后者只有可观的信号。这样的技术,包括没有标签的分类(CWOLA)和仿真辅助无似然异常检测(沙拉)依赖于单个参考数据集。他们无法利用通常可用的多个数据集,因此无法完全利用可用信息。在这项工作中,我们提出了Cwola和Salad的概括,以进行多个参考数据集的设置,并以弱监督技术为基础。我们证明了在许多具有现实和合成数据的设置中的性能改善。为了额外的好处,我们的概括使我们能够提供有限的样本保证,从而改善现有的渐近分析。
An important class of techniques for resonant anomaly detection in high energy physics builds models that can distinguish between reference and target datasets, where only the latter has appreciable signal. Such techniques, including Classification Without Labels (CWoLa) and Simulation Assisted Likelihood-free Anomaly Detection (SALAD) rely on a single reference dataset. They cannot take advantage of commonly-available multiple datasets and thus cannot fully exploit available information. In this work, we propose generalizations of CWoLa and SALAD for settings where multiple reference datasets are available, building on weak supervision techniques. We demonstrate improved performance in a number of settings with realistic and synthetic data. As an added benefit, our generalizations enable us to provide finite-sample guarantees, improving on existing asymptotic analyses.