要删除还是不删除移动应用程序？数据驱动的预测模型方法

论文标题

要删除还是不删除移动应用程序？数据驱动的预测模型方法

To remove or not remove Mobile Apps? A data-driven predictive model approach

论文作者

Mohsen, Fadi, Karastoyanova, Dimka, Azzopardi, George

论文摘要

移动应用商店是移动应用程序的关键分销商。他们定期将审核过程应用于部署的应用程序。但是，其中一些审查过程可能不足或迟到。延迟删除应用程序可能会对开发人员和用户产生不愉快的后果。因此，在这项工作中，我们提出了一种数据驱动的预测方法，该方法决定了是否将删除或接受该应用程序。它还表明了功能的相关性，可以帮助利益相关者进行解释。反过来，我们的方法可以支持开发人员改善其应用程序和用户下载不太可能被删除的应用程序。我们专注于Google App Store，并编制了一个新的数据集，该数据集为870,515个应用程序，其中56％实际上已从市场中删除。我们提出的方法是多个XGBoost机器学习分类器的自举汇总。我们提出了两种模型：使用47个功能以用户为中心，并以37个功能为中心，仅在部署之前可用。我们在测试集的ROC曲线（AUC）下实现以下区域：以用户为中心= 0.792，以开发人员为中心= 0.762。

Mobile app stores are the key distributors of mobile applications. They regularly apply vetting processes to the deployed apps. Yet, some of these vetting processes might be inadequate or applied late. The late removal of applications might have unpleasant consequences for developers and users alike. Thus, in this work we propose a data-driven predictive approach that determines whether the respective app will be removed or accepted. It also indicates the features' relevance that help the stakeholders in the interpretation. In turn, our approach can support developers in improving their apps and users in downloading the ones that are less likely to be removed. We focus on the Google App store and we compile a new data set of 870,515 applications, 56% of which have actually been removed from the market. Our proposed approach is a bootstrap aggregating of multiple XGBoost machine learning classifiers. We propose two models: user-centered using 47 features, and developer-centered using 37 features, the ones only available before deployment. We achieve the following Areas Under the ROC Curves (AUCs) on the test set: user-centered = 0.792, developer-centered = 0.762.

下载PDF全文

下载文献需遵守相关版权规定

论文标题