论文标题
云对象存储服务的AIOPS
AIOps for a Cloud Object Storage Service
论文作者
论文摘要
随着IT系统和服务无处不在的可用性越来越依赖,这些系统变得更加全球,扩展和复杂。为了维持业务生存能力,IT服务提供商必须提供可靠且高效的运营支持。 IT操作(AIOPS)的人工智能是减轻IT系统和服务的运营复杂性的有前途的技术。 AIOPS平台利用大数据,机器学习和其他高级分析技术来通过主动可行的动态洞察力来增强IT操作。 在本文中,我们分享了将AIOPS方法应用于生产云对象存储服务的经验,以获得对系统行为和健康的可行见解。我们描述了现实生活中的生产云量表服务及其运营数据,介绍了我们创建的AIOPS平台,并展示了它如何帮助我们解决操作疼痛点。
With the growing reliance on the ubiquitous availability of IT systems and services, these systems become more global, scaled, and complex to operate. To maintain business viability, IT service providers must put in place reliable and cost efficient operations support. Artificial Intelligence for IT Operations (AIOps) is a promising technology for alleviating operational complexity of IT systems and services. AIOps platforms utilize big data, machine learning and other advanced analytics technologies to enhance IT operations with proactive actionable dynamic insight. In this paper we share our experience applying the AIOps approach to a production cloud object storage service to get actionable insights into system's behavior and health. We describe a real-life production cloud scale service and its operational data, present the AIOps platform we have created, and show how it has helped us resolving operational pain points.