论文标题
经销商:基于模型的定价的端到端数据市场
Dealer: End-to-End Data Marketplace with Model-based Pricing
论文作者
论文摘要
数据驱动的机器学习(ML)见证了各种应用领域的巨大成功。由于ML模型培训至关重要地依赖大量数据,因此对ML模型培训收集的高质量数据的需求不断增长。但是,从数据所有者的角度来看,他们有贡献数据的风险。为了激励数据贡献,理想的数据将在预设限制下使用,并为其数据贡献而获得报酬。 在本文中,我们采用正式的数据市场角度,并提出了第一个en \ textbf {\ undesline {d}} - to- \ \ \ \ textbf {\ textbf {\ listionline {e}} nd d \ textbf {\ textbf {\ textbf {\ textlline {a}} mod \ textbf {\下划线{e}} l基于p \ textbf {\ undesline {\ suppesing {r}} iCING(\ emph {veraler})回答问题:\ emph {经纪人如何基于对数据所有者的贡献来构建模型,并确定一个模型的贡献,并确定prive prestim worder的模型,并确定prive的模型,并确定一个系列的贡献,并确定prestim worder的模型,并确定一个系列的范围。无套利保证}。对于前者,我们引入了一种基于沙普利价值的机制,以将每个数据所有者的价值量化对所有受贡献数据训练的模型。对于后者,我们根据模型的隐私参数设计了定价机制,以最大程度地提高收入。更重要的是,我们研究数据所有者的数据使用限制如何影响市场设计,这是我们与现有方法的显着差异。此外,我们展示了一个具体的实现DP- \ emph {经销商},可证明满足所需的形式属性。广泛的实验表明DP- \ emph {经销商}是有效的。
Data-driven machine learning (ML) has witnessed great successes across a variety of application domains. Since ML model training are crucially relied on a large amount of data, there is a growing demand for high quality data to be collected for ML model training. However, from data owners' perspective, it is risky for them to contribute their data. To incentivize data contribution, it would be ideal that their data would be used under their preset restrictions and they get paid for their data contribution. In this paper, we take a formal data market perspective and propose the first en\textbf{\underline{D}}-to-\textbf{\underline{e}}nd d\textbf{\underline{a}}ta marketp\textbf{\underline{l}}ace with mod\textbf{\underline{e}}l-based p\textbf{\underline{r}}icing (\emph{Dealer}) towards answering the question: \emph{How can the broker assign value to data owners based on their contribution to the models to incentivize more data contribution, and determine pricing for a series of models for various model buyers to maximize the revenue with arbitrage-free guarantee}. For the former, we introduce a Shapley value-based mechanism to quantify each data owner's value towards all the models trained out of the contributed data. For the latter, we design a pricing mechanism based on models' privacy parameters to maximize the revenue. More importantly, we study how the data owners' data usage restrictions affect market design, which is a striking difference of our approach with the existing methods. Furthermore, we show a concrete realization DP-\emph{Dealer} which provably satisfies the desired formal properties. Extensive experiments show that DP-\emph{Dealer} is efficient and effective.