论文标题
Tensorflow Lite Micro:Tinyml系统上的嵌入式机器学习
TensorFlow Lite Micro: Embedded Machine Learning on TinyML Systems
论文作者
论文摘要
对嵌入式设备的深入学习推断是一个迅速应用的领域,因为微小的嵌入式设备无处不在。但是,我们必须克服重大挑战,然后才能从这个机会中受益。嵌入式处理器受到严格的资源约束。他们最接近的移动功能在计算能力,内存可用性和功耗上至少具有100-1,000倍的差异。结果,机器学习(ML)模型和相关的ML推理框架不仅必须有效地执行,而且还必须在几千键型内存中运行。同样,嵌入式设备的生态系统被严重分散。为了最大程度地提高效率,系统供应商通常会省略许多通常出现在主流系统中的功能,包括动态内存分配和虚拟内存,这些功能允许跨平台互操作性。硬件有多种口味(例如,指令集架构和FPU支持,或缺乏)。我们引入了Tensorflow Lite Micro(TF Micro),这是一种开源ML推理框架,用于在嵌入式系统上运行深度学习模型。 TF微型解决了嵌入式系统资源限制和碎片挑战所施加的效率要求,这些挑战几乎不可能使跨平台互操作性变得不可能。该框架采用了一种独特的基于解释器的方法,该方法在克服这些挑战的同时提供了灵活性。本文解释了TF Micro背后的设计决策,并描述了其实现细节。此外,我们提出了一项评估,以证明其资源较低的需求和最少的运行时性能开销。
Deep learning inference on embedded devices is a burgeoning field with myriad applications because tiny embedded devices are omnipresent. But we must overcome major challenges before we can benefit from this opportunity. Embedded processors are severely resource constrained. Their nearest mobile counterparts exhibit at least a 100 -- 1,000x difference in compute capability, memory availability, and power consumption. As a result, the machine-learning (ML) models and associated ML inference framework must not only execute efficiently but also operate in a few kilobytes of memory. Also, the embedded devices' ecosystem is heavily fragmented. To maximize efficiency, system vendors often omit many features that commonly appear in mainstream systems, including dynamic memory allocation and virtual memory, that allow for cross-platform interoperability. The hardware comes in many flavors (e.g., instruction-set architecture and FPU support, or lack thereof). We introduce TensorFlow Lite Micro (TF Micro), an open-source ML inference framework for running deep-learning models on embedded systems. TF Micro tackles the efficiency requirements imposed by embedded-system resource constraints and the fragmentation challenges that make cross-platform interoperability nearly impossible. The framework adopts a unique interpreter-based approach that provides flexibility while overcoming these challenges. This paper explains the design decisions behind TF Micro and describes its implementation details. Also, we present an evaluation to demonstrate its low resource requirement and minimal run-time performance overhead.