论文标题

位置纸:架构优先申请遥测

Positional Paper: Schema-First Application Telemetry

论文作者

Shkuro, Yuri, Renard, Benjamin, Singh, Atul

论文摘要

应用遥测是指从软件系统中进行的测量,以评估其性能,可用性,正确性,效率和其他对操作员有用的方面,以及在其表现异常时对其进行故障排除。许多现代可观察性平台都支持遥测信号的维度模型,其中测量伴随着其他维度,用于识别遥测所描述的资源或活动的特定业务属性(例如,客户标识符)。但是,这些平台中的大多数缺乏对数据的任何语义理解,而不是通过捕获有关遥测的任何元数据,从诸如计量单位或数据类型(将所有维度视为字符串)等简单方面到更复杂的概念,例如目的策略。这限制了平台提供丰富的用户体验的能力,尤其是在处理不同的遥测资产时,例如将时间序列中的异常与相应的日志或痕迹子集联系起来,这需要对相应数据集中维度的语义理解。 在本文中,我们描述了在META实施的应用程序遥测方法。它允许可观察性平台从一开始就捕获有关遥测的元数据,并启用广泛的功能,包括编译时输入验证,多信号相关性和交叉过滤,甚至隐私规则执行。我们提出了设计目标的集合,并展示了模式优先的方法如何提供比行业中许多现有解决方案更好的权衡取舍。

Application telemetry refers to measurements taken from software systems to assess their performance, availability, correctness, efficiency, and other aspects useful to operators, as well as to troubleshoot them when they behave abnormally. Many modern observability platforms support dimensional models of telemetry signals where the measurements are accompanied by additional dimensions used to identify either the resources described by the telemetry or the business-specific attributes of the activities (e.g., a customer identifier). However, most of these platforms lack any semantic understanding of the data, by not capturing any metadata about telemetry, from simple aspects such as units of measure or data types (treating all dimensions as strings) to more complex concepts such as purpose policies. This limits the ability of the platforms to provide a rich user experience, especially when dealing with different telemetry assets, for example, linking an anomaly in a time series with the corresponding subset of logs or traces, which requires semantic understanding of the dimensions in the respective data sets. In this paper, we describe a schema-first approach to application telemetry that is being implemented at Meta. It allows the observability platforms to capture metadata about telemetry from the start and enables a wide range of functionalities, including compile-time input validation, multi-signal correlations and cross-filtering, and even privacy rules enforcement. We present a collection of design goals and demonstrate how schema-first approach provides better trade-offs than many of the existing solutions in the industry.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源