论文标题

低复杂性分布式计算通过二进制矩阵扩展到Stragglers

Low Complexity Distributed Computing via Binary Matrices with Extension to Stragglers

论文作者

Agrawal, Shailja, Krishnan, Prasad

论文摘要

我们考虑MapReduce的分布式计算框架,该框架由三个阶段组成,即MAP相,洗牌相和减少阶段。对于此框架,我们建议使用二进制矩阵($ 0,1 $条目)称为\ textIt {computit {computit矩阵}来描述地图阶段和混音阶段。最近,针对编码的缓存框架提出了类似的二进制矩阵。二进制计算矩阵中的结构和零的结构捕获了MapReduce框架的地图阶段。我们基于计算矩阵的\ textIt {Identity submatrix Cover},为此二进制矩阵模型提供了一个新的简单编码数据改组方案。通常,这种新的编码改组方案比现有方案具有更大的通信负载,但是就文件分解以及相关的索引和协调而言,与文献中知名的早期方案相比,开销的优势较小。我们还表明,使用我们的新数据改组方案的基于二进制矩阵的分布式计算方案存在,该方案的严格影响少于文献中已知的最佳方案的通信负载的两倍。该新方案的结构使其可以以简单的方式将其应用于MapReduce的框架上,并从无束缚的情况下借用了其优势和缺点。最后,使用来自组合设计的二进制矩阵,我们显示了具有非常低\ textIt {文件复杂性}(文件中的子文件数)的特定类别计算方案,并且与等效参数的最佳方案相比,通信负载略高。

We consider the distributed computing framework of MapReduce, which consists of three phases, the Map phase, the Shuffle phase and the Reduce phase. For this framework, we propose the use of binary matrices (with $0,1$ entries) called \textit{computing matrices} to describe the map phase and the shuffle phase. Similar binary matrices were recently proposed for the coded caching framework. The structure of ones and zeroes in the binary computing matrix captures the map phase of the MapReduce framework. We present a new simple coded data shuffling scheme for this binary matrix model, based on a \textit{identity submatrix cover} of the computing matrix. This new coded shuffling scheme has in general a larger communication load than existing schemes, but has the advantage of less complexity overhead than the well-known earlier schemes in literature in terms of the file-splitting and associated indexing and coordination required. We also show that there exists a binary matrix based distributed computing scheme with our new data-shuffling scheme which has strictly less than twice than the communication load of the known optimal scheme in literature. The structure of this new scheme enables it to be applied to the framework of MapReduce with stragglers also, in a straightforward manner, borrowing its advantages and disadvantages from the no-straggler situation. Finally, using binary matrices derived from combinatorial designs, we show specific classes of computing schemes with very low \textit{file complexity} (number of subfiles in the file), with marginally higher communication load compared to the optimal scheme for equivalent parameters.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源