论文标题

学习指定模型

Learning Underspecified Models

论文作者

Cho, In-Koo, Libgober, Jonathan

论文摘要

本文研究了是否可以学习发挥最佳动作,而仅了解环境的真实规范的一部分。我们选择最佳定价问题作为我们的实验室,在该实验室中,垄断者被赋予了市场需求的指定模型,但可以观察到市场结果。与传统的学习模型相反,模型规范是完整且外源固定的,垄断者必须从数据中学习需求曲线的规范和参数。我们将学习动态制定为一种算法,根据机器学习文献(Shalev-Shwartz and Ben-David(2014)),根据数据预测最佳价格。受PAC可学习性的启发,我们通过要求该算法必须在与真实规范的一部分一致的模型类别中均匀量的数据产生准确的预测,从而开发出新的可学习性概念。此外,我们假设垄断者对算法的回报和复杂性成本具有词素偏好,寻求一种算法,该算法具有最少的参数,可承担PAC保证最佳解决方案(Rubinstein(Rubinstein(1986)))。我们表明,对于一组需求曲线,严格减少Lipschitz连续边缘收入曲线的均匀降低,最佳算法也会递归估算线性需求曲线的斜率和截距,即使实际需求曲线不是线性的。垄断者选择了一个错误的模型来节省计算成本,同时在一组未指定的需求曲线上统一地学习真正的最佳决策。

This paper examines whether one can learn to play an optimal action while only knowing part of true specification of the environment. We choose the optimal pricing problem as our laboratory, where the monopolist is endowed with an underspecified model of the market demand, but can observe market outcomes. In contrast to conventional learning models where the model specification is complete and exogenously fixed, the monopolist has to learn the specification and the parameters of the demand curve from the data. We formulate the learning dynamics as an algorithm that forecast the optimal price based on the data, following the machine learning literature (Shalev-Shwartz and Ben-David (2014)). Inspired by PAC learnability, we develop a new notion of learnability by requiring that the algorithm must produce an accurate forecast with a reasonable amount of data uniformly over the class of models consistent with the part of the true specification. In addition, we assume that the monopolist has a lexicographic preference over the payoff and the complexity cost of the algorithm, seeking an algorithm with a minimum number of parameters subject to PAC-guaranteeing the optimal solution (Rubinstein (1986)). We show that for the set of demand curves with strictly decreasing uniformly Lipschitz continuous marginal revenue curve, the optimal algorithm recursively estimates the slope and the intercept of the linear demand curve, even if the actual demand curve is not linear. The monopolist chooses a misspecified model to save computational cost, while learning the true optimal decision uniformly over the set of underspecified demand curves.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源