论文标题

用于Tokamak的机器学习研究的数据管理系统

A data management system for machine learning research of tokamak

论文作者

Wan, Chenguang, Yu, Zhi, Liu, Xiaojuan, Wen, Xinghao, Deng, Xi, Li, Jiangang

论文摘要

近年来,机器学习(ML)研究方法在Tokamak社区中受到了越来越多的关注。实验数据的常规数据库(即MDSPLUS)旨在用于小组消费,主要旨在同时可视化少量数据。 ML数据访问模式从根本上与传统的数据访问模式不同。典型的MDSPLU数据库越来越多地显示其局限性。我们开发了一种新的数据管理系统,适用于基于实验性高级超导Tokamak(EAST)数据的Tokamak机器学习研究。数据管理系统基于MongoDB和分层数据格式版本5(HDF5)。当前,整个数据管理具有3000多个数据渠道。该系统可以提供高度可靠的并发访问。该系统包括误差校正,MDSPLUS原始数据转换和高性能序列数据输出。此外,还实施了一些有价值的功能来加速融合的ML模型训练,例如铲斗发生器,串联缓冲液和分布式序列生成。该数据管理系统比MDSPlus更适合融合机学习模型R \&D,但无法替代MDSPLUS数据库。 MDSPLUS数据库仍然是East Tokamak数据采集和存储的后端。

In recent years, machine learning (ML) research methods have received increasing attention in the tokamak community. The conventional database (i.e., MDSplus for tokamak) of experimental data has been designed for small group consumption and is mainly aimed at simultaneous visualization of a small amount of data. The ML data access patterns fundamentally differ from traditional data access patterns. The typical MDSplus database is increasingly showing its limitations. We developed a new data management system suitable for tokamak machine learning research based on Experimental Advanced Superconducting Tokamak (EAST) data. The data management system is based on MongoDB and Hierarchical Data Format version 5 (HDF5). Currently, the entire data management has more than 3000 channels of data. The system can provide highly reliable concurrent access. The system includes error correction, MDSplus original data conversion, and high-performance sequence data output. Further, some valuable functions are implemented to accelerate ML model training of fusion, such as bucketing generator, the concatenating buffer, and distributed sequence generation. This data management system is more suitable for fusion machine learning model R\&D than MDSplus, but it can not replace the MDSplus database. The MDSplus database is still the backend for EAST tokamak data acquisition and storage.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源