论文标题
通过强化学习授权的有效层次存储管理框架
Efficient Hierarchical Storage Management Framework Empowered by Reinforcement Learning
论文作者
论文摘要
随着大数据和云计算的快速发展,数据管理变得越来越具有挑战性。多年来,已经有许多具有各种特征和功能的数据管理和存储框架。其中大多数是高效的,但最终会创建数据孤岛。由于没有单个框架可以有效地满足各种应用程序的数据管理需求,因此很难与数据保持一致并连贯地工作。一种可能的解决方案是设计智能有效的层次结构(多层)存储解决方案。分层存储系统(HSS)是一种元解决方案,由不同的存储框架组成,该框架组织为共同构建的大型存储池。它带来了许多好处,包括更好地利用存储,成本效益以及基础存储框架提供的不同功能。为了最大程度地提高层次存储解决方案的收益,重要的是,它们包括以不同基础框架特征为基础的数据管理的智能和自主机制。这些决定应根据数据集,层状态和访问模式的特征做出。这些是高度动态的参数,基于上述参数定义策略是一项非平凡的任务。本文提出了一个基于强化学习(RL)的动态迁移策略的开源层次存储框架。我们提出了基于模拟和基于云的环境的数学模型,软件体系结构和实现。我们将提出的基于RL的策略与三个基于规则的策略的基准进行了比较,这表明与基于动态规则的策略相比,基于RL的策略在不同方案中实现了更高的效率和最佳数据分布。
With the rapid development of big data and cloud computing, data management has become increasingly challenging. Over the years, a number of frameworks for data management and storage with various characteristics and features have become available. Most of these are highly efficient, but ultimately create data silos. It becomes difficult to move and work coherently with data as new requirements emerge as no single framework can efficiently fulfill the data management needs of diverse applications. A possible solution is to design smart and efficient hierarchical (multi-tier) storage solutions. A hierarchical storage system (HSS) is a meta solution that consists of different storage frameworks organized as a jointly constructed large storage pool. It brings a number of benefits including better utilization of the storage, cost-efficiency, and use of different features provided by the underlying storage frameworks. In order to maximize the gains of hierarchical storage solutions, it is important that they include intelligent and autonomous mechanisms for data management grounded in the features of the different underlying frameworks. These decisions should be made according to the characteristics of the dataset, tier status, and access patterns. These are highly dynamic parameters and defining a policy based on the mentioned parameters is a non-trivial task. This paper presents an open-source hierarchical storage framework with a dynamic migration policy based on reinforcement learning (RL). We present a mathematical model, a software architecture, and an implementation based on both simulations and a live cloud-based environment. We compare the proposed RL-based strategy to a baseline of three rule-based policies, showing that the RL-based policy achieves significantly higher efficiency and optimal data distribution in different scenarios compared to the dynamic rule-based policies.