用于大规模粒子流的浸入边界降压玻尔兹曼方法的有效实施第I部分：串行计算

论文标题

用于大规模粒子流的浸入边界降压玻尔兹曼方法的有效实施第I部分：串行计算

Efficient implementation of immersed boundary-lattice Boltzmann method for massive particle-laden flows Part I: Serial computing

论文作者

Jiang, Maoqiang, Li, Jing, Liu, Zhaohui

论文摘要

浸泡的边界晶体玻尔兹曼方法（IB-LBM）最近被广泛用于模拟颗粒流量。但是，它仅限于不超过O（103）颗粒的小型模拟。在这里，我们通过两项顺序作品扩展了具有超过O（104）颗粒的大规模颗粒流量的IB-LBM。首先是单个CPU核心上的第I部分：串行计算，并遵循第二部分：许多CPU核心上的平行计算。在这部分文章中，提出了用于串行计算的IB-LBM的高效和局部实现。我们在三个主要方面进行了优化：交换算法不可压缩的LBM，IBM的本地网格到点算法以及改进的粒子对短距离交互的网格搜索算法。另外，提出了对称算法，用于LB碰撞和外部力项的半分配。分析了单个CPU核心上的计算性能。在封闭腔中沉降的二维（2D）和三维（3D）颗粒的不同尺度用于测试。固体体积分数从0到0.40变化。仿真结果表明，所有计算零件都通过改进的算法大大减少。对于无粒子流，使用改进的算法，每秒的巨型晶格位点更新（MLUP）最多可以达到36（2D）和12（3D）。对于充满粒子的流，在密集流的模拟中，MLUP可以达到不到15（2d）和7（3D）的低于15（2d）和7（3D）。最后，我们讨论了新算法对使用MPI平行技术对大型粒子流量进行高性能计算的潜力。

Immersed boundary-lattice Boltzmann method (IB-LBM) has been widely used for simulation of particle-laden flows recently. However, it was limited to small-scale simulations with no more than O(103) particles. Here, we expand IB-LBM for massive particle-laden flows with more than O(104) particles by two sequential works. First is the Part I: serial computing on a single CPU core and following the Part II: parallel computing on many CPU cores. In this Part I paper, a highly efficient and localized implementation of IB-LBM is proposed for serial computing. We optimize in three main aspects: swap algorithm for incompressible LBM, local grid-to-point algorithm for IBM and improved grid search algorithm for particle pair short-range interaction. In addition, symmetry algorithm is proposed for the half-calculation of LB collision and external force term. The computational performance on a single CPU core is analyzed. Different scales of two dimensional (2D) and three-dimensional (3D) particles settling in closed cavities are used for testing. The solid volume fraction is varied from 0 to 0.40. Simulation results demonstrate that all calculation parts are dramatically decreased by the improved algorithm. For the particle-free flows, the Mega Lattice Site Update per Second (MLUPS) can be achieved up to 36 (2D) and 12 (3D) using the improved algorithm. For the particle-laden flows, MLUPS can be achieved no lower than 15 (2D) and 7 (3D) in the simulations of dense flows. At last, we discuss the potential of the new algorithms for the high-performance computation of the large-scale systems of particle-laden flows with MPI parallel technique.

下载PDF全文

下载文献需遵守相关版权规定

论文标题