论文标题
SCIPP:C ++和Python标记的多维阵列的科学数据处理
Scipp: Scientific data handling with labeled multi-dimensional arrays for C++ and Python
论文作者
论文摘要
Scipp受到Python Library Xarray的启发。它通过添加命名尺寸和相关的坐标来丰富原始的数字多维数据阵列。多个数组合并到数据集中。最重要的是,Scipp引入了(i)对物理单位的隐式处理,(ii)对不确定性的隐式传播,(iii)对直方图的支持,即bin-gede坐标轴,这些轴轴超过了数据的维度范围,并且(iv)对事件数据的支持。结合起来,这些功能使更自然,更简洁的用户体验。指定维度,坐标和单位的组合有助于大大降低编程错误的风险。 SCIPP的核心是用C ++编写的,以开放基于Python的解决方案不允许的绩效改进的机会。除了C ++核心之外,Scipp的Python组件提供了用于绘制和内容表示形式的功能,例如用于Jupyter Notebooks。尽管Scipp孤立的概念都不是新颖的perse,但我们并不知道将所有这些方面结合在单个连贯的软件包中的任何项目。
Scipp is heavily inspired by the Python library xarray. It enriches raw NumPy-like multi-dimensional arrays of data by adding named dimensions and associated coordinates. Multiple arrays are combined into datasets. On top of this, scipp introduces (i) implicit handling of physical units, (ii) implicit propagation of uncertainties, (iii) support for histograms, i.e., bin-edge coordinate axes, which exceed the data's dimension extent by one, and (iv) support for event data. In conjunction these features enable a more natural and more concise user experience. The combination of named dimensions, coordinates, and units helps to drastically reduce the risk for programming errors. The core of scipp is written in C++ to open opportunities for performance improvements that a Python-based solution would not allow for. On top of the C++ core, scipp's Python components provide functionality for plotting and content representations, e.g., for use in Jupyter Notebooks. While none of scipp's concepts in isolation is novel per-se, we are not aware of any project combining all of these aspects in a single coherent software package.