论文标题
建立智能电子基础结构,这是一种基于实际数据集的社区驱动方法
Towards Smart e-Infrastructures, A Community Driven Approach Based on Real Datasets
论文作者
论文摘要
电子基础设施为跨领域的电子服务的成功渗透提供了动力,并构成了现代计算景观的骨干。电子基础结构是用于大型,中小型计算环境的广义术语。应用程序的复杂性和复杂性的日益增加导致甚至由数千个互连组成的小规模数据中心。但是,数据中心中资源的有效利用仍然是一项具有挑战性的任务,这主要是由于管理物理节点,网络设备,冷却系统,电力等的复杂性。这导致了该行业非常强大的碳足迹。近年来,基于机器学习方法的努力已显示出有望减少数据中心能源消耗的有希望的结果。但是,可以帮助数据中心运营商提供节能服务的实用解决方案。在中小型数据中心操作员(E基础结构提供商的长尾巴)的背景下,此问题更为明显。此外,已经观察到解决方案提供商(机器学习专家)和数据中心运营商之间的断开连接。本文介绍了一个社区驱动的开源软件框架,该框架使社区成员能够更好地了解资源利用的各个方面。该框架利用机器学习模型来预测和优化数据中心操作的各种参数,从而提高效率,服务质量和降低能耗。此外,提出的框架不需要共享数据集,这可以减轻以适当格式组织,描述和匿名数据的额外努力。
e-Infrastructures have powered the successful penetration of e-services across domains, and form the backbone of the modern computing landscape. e-Infrastructure is a broad term used for large, medium and small scale computing environments. The increasing sophistication and complexity of applications have led to even small-scale data centers consisting of thousands of interconnects. However, efficient utilization of resources in data centers remains a challenging task, mainly due to the complexity of managing physical nodes, network equipment, cooling systems, electricity, etc. This results in a very strong carbon footprint of this industry. In recent years, efforts based on machine learning approaches have shown promising results towards reducing energy consumption of data centers. Yet, practical solutions that can help data center operators in offering energy efficient services are lacking. This problem is more visible in the context of medium and small scale data center operators (the long tail of e-infrastructure providers). Additionally, a disconnect between solution providers (machine learning experts) and data center operators has been observed. This article presents a community-driven open source software framework that allows community members to develop better understanding of various aspects of resource utilization. The framework leverages machine learning models for forecasting and optimizing various parameters of data center operations, enabling improved efficiency, quality of service and lower energy consumption. Also, the proposed framework does not require datasets to be shared, which alleviates the extra effort of organizing, describing and anonymizing data in an appropriate format.