论文标题

用于大规模粒子流的浸入边界降压玻尔兹曼方法的有效实施第I部分:串行计算

Efficient implementation of immersed boundary-lattice Boltzmann method for massive particle-laden flows Part I: Serial computing

论文作者

Jiang, Maoqiang, Li, Jing, Liu, Zhaohui

论文摘要

浸泡的边界晶体玻尔兹曼方法(IB-LBM)最近被广泛用于模拟颗粒流量。但是,它仅限于不超过O(103)颗粒的小型模拟。在这里,我们通过两项顺序作品扩展了具有超过O(104)颗粒的大规模颗粒流量的IB-LBM。首先是单个CPU核心上的第I部分:串行计算,并遵循第二部分:许多CPU核心上的平行计算。在这部分文章中,提出了用于串行计算的IB-LBM的高效和局部实现。我们在三个主要方面进行了优化:交换算法不可压缩的LBM,IBM的本地网格到点算法以及改进的粒子对短距离交互的网格搜索算法。另外,提出了对称算法,用于LB碰撞和外部力项的半分配。分析了单个CPU核心上的计算性能。在封闭腔中沉降的二维(2D)和三维(3D)颗粒的不同尺度用于测试。固体体积分数从0到0.40变化。仿真结果表明,所有计算零件都通过改进的算法大大减少。对于无粒子流,使用改进的算法,每秒的巨型晶格位点更新(MLUP)最多可以达到36(2D)和12(3D)。对于充满粒子的流,在密集流的模拟中,MLUP可以达到不到15(2d)和7(3D)的低于15(2d)和7(3D)。最后,我们讨论了新算法对使用MPI平行技术对大型粒子流量进行高性能计算的潜力。

Immersed boundary-lattice Boltzmann method (IB-LBM) has been widely used for simulation of particle-laden flows recently. However, it was limited to small-scale simulations with no more than O(103) particles. Here, we expand IB-LBM for massive particle-laden flows with more than O(104) particles by two sequential works. First is the Part I: serial computing on a single CPU core and following the Part II: parallel computing on many CPU cores. In this Part I paper, a highly efficient and localized implementation of IB-LBM is proposed for serial computing. We optimize in three main aspects: swap algorithm for incompressible LBM, local grid-to-point algorithm for IBM and improved grid search algorithm for particle pair short-range interaction. In addition, symmetry algorithm is proposed for the half-calculation of LB collision and external force term. The computational performance on a single CPU core is analyzed. Different scales of two dimensional (2D) and three-dimensional (3D) particles settling in closed cavities are used for testing. The solid volume fraction is varied from 0 to 0.40. Simulation results demonstrate that all calculation parts are dramatically decreased by the improved algorithm. For the particle-free flows, the Mega Lattice Site Update per Second (MLUPS) can be achieved up to 36 (2D) and 12 (3D) using the improved algorithm. For the particle-laden flows, MLUPS can be achieved no lower than 15 (2D) and 7 (3D) in the simulations of dense flows. At last, we discuss the potential of the new algorithms for the high-performance computation of the large-scale systems of particle-laden flows with MPI parallel technique.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源