论文标题
多用户终端边缘网络中推断编排的在线学习
Online Learning for Orchestration of Inference in Multi-User End-Edge-Cloud Networks
论文作者
论文摘要
基于深度学习的智能服务已在包括智能城市和医疗保健在内的网络物理应用中普遍存在。在最终用户附近部署基于深度学习的情报可以增强隐私保护,响应能力和可靠性。必须仔细管理资源受限的终端设备,以满足计算密集型深度学习服务的延迟和能源需求。深度学习的协作最终边缘云计算提供了一系列的性能和效率,可以通过计算卸载来解决应用程序要求。卸载计算的决定是一个通信计算合作的问题,随着系统参数(例如,网络条件)和工作负载特征(例如输入)而变化。另一方面,深度学习模型优化为延迟和模型准确性之间的另一个权衡提供了另一个来源。需要一种端到端的决策解决方案来考虑这种计算通信问题,以协同找到最佳的卸载策略和深度学习服务的模型。为此,我们提出了一个基于加强学习的计算卸载解决方案,该解决方案考虑了考虑深度学习模型选择技术的最佳卸载策略,以最大程度地减少响应时间,同时提供足够的准确性。我们证明了解决方案对边缘云系统中边缘设备的有效性,并使用多个AWS和ARM Core配置进行了实现实现的实现。我们的解决方案在平均响应时间内提供了35%的速度,而最先进的方法降低了0.9%,这表明了我们在线学习框架在端云系统中安排DL推断的希望。
Deep-learning-based intelligent services have become prevalent in cyber-physical applications including smart cities and health-care. Deploying deep-learning-based intelligence near the end-user enhances privacy protection, responsiveness, and reliability. Resource-constrained end-devices must be carefully managed in order to meet the latency and energy requirements of computationally-intensive deep learning services. Collaborative end-edge-cloud computing for deep learning provides a range of performance and efficiency that can address application requirements through computation offloading. The decision to offload computation is a communication-computation co-optimization problem that varies with both system parameters (e.g., network condition) and workload characteristics (e.g., inputs). On the other hand, deep learning model optimization provides another source of tradeoff between latency and model accuracy. An end-to-end decision-making solution that considers such computation-communication problem is required to synergistically find the optimal offloading policy and model for deep learning services. To this end, we propose a reinforcement-learning-based computation offloading solution that learns optimal offloading policy considering deep learning model selection techniques to minimize response time while providing sufficient accuracy. We demonstrate the effectiveness of our solution for edge devices in an end-edge-cloud system and evaluate with a real-setup implementation using multiple AWS and ARM core configurations. Our solution provides 35% speedup in the average response time compared to the state-of-the-art with less than 0.9% accuracy reduction, demonstrating the promise of our online learning framework for orchestrating DL inference in end-edge-cloud systems.