设计自我管理的机器学习推理服务系统Inpublic Cloud

论文标题

设计自我管理的机器学习推理服务系统Inpublic Cloud

Towards Designing a Self-Managed Machine Learning Inference Serving System inPublic Cloud

论文作者

Gunasekaran, Jashwant Raj, Thinakaran, Prashanth, Mishra, Cyan Subhra, Kandemir, Mahmut Taylan, Das, Chita R.

论文摘要

我们目睹了基于机器学习（ML）的预测系统的日益趋势，这些系统跨越了不同的应用领域，包括生产授权系统，个人助理设备，面部认知等。这些应用程序通常会在准确性和响应延迟方面具有多样性的标语，这对它们在公众中部署它们的成本有直接影响。此外，部署成本还取决于所采购的资源的模型，这些资源本身就是提供潜伏期和收费复杂性的。因此，推理服务系统很难从这种混杂的资源型和模型类型中进行选择，以提供低延迟和成本效益。在这项工作中，我们定量地表征了托管ML推断的成本，准确性和延迟含义，而不同的公共云资源产品。此外，WeCheremements对试图实现有效的预测服务的先前工作进行评估。我们的评估表明，先前的工作并不能从模型的维度和资源异质性方面解决问题。因此，我们认为场表达这个问题，我们需要在尝试结合模型和资源近期性时整体解决issuesthat，以优化应用程序约束。此外，我们设想开发自我管理的推论系统，这可以基于公共云资源来优化应用程序要求。为了解决这个复杂的优化问题，我们探索了基于加强学习的系统的高级设计，该系统可以很好地适应系统的不断变化。

We are witnessing an increasing trend towardsusing Machine Learning (ML) based prediction systems, span-ning across different application domains, including productrecommendation systems, personal assistant devices, facialrecognition, etc. These applications typically have diverserequirements in terms of accuracy and response latency, thathave a direct impact on the cost of deploying them in a publiccloud. Furthermore, the deployment cost also depends on thetype of resources being procured, which by themselves areheterogeneous in terms of provisioning latencies and billingcomplexity. Thus, it is strenuous for an inference servingsystem to choose from this confounding array of resourcetypes and model types to provide low-latency and cost-effectiveinferences. In this work we quantitatively characterize the cost,accuracy and latency implications of hosting ML inferenceson different public cloud resource offerings. In addition, wecomprehensively evaluate prior work which tries to achievecost-effective prediction-serving. Our evaluation shows that,prior work does not solve the problem from both dimensionsof model and resource heterogeneity. Hence, we argue that toaddress this problem, we need to holistically solve the issuesthat arise when trying to combine both model and resourceheterogeneity towards optimizing for application constraints.Towards this, we envision developing a self-managed inferenceserving system, which can optimize the application require-ments based on public cloud resource characteristics. In orderto solve this complex optimization problem, we explore the highlevel design of a reinforcement-learning based system that canefficiently adapt to the changing needs of the system at scale.

下载PDF全文

下载文献需遵守相关版权规定

论文标题