内容交付网络（CDN）中的客户端错误聚类方法

论文标题

内容交付网络（CDN）中的客户端错误聚类方法

Client Error Clustering Approaches in Content Delivery Networks (CDN)

论文作者

Birihanu, Ermiyas, Mahmud, Jiyan, Kiss, Péter, Kamuzora, Adolf, Skaf, Wadie, Horváth, Tomáš, Jursonovics, Tamás, Pogrzeba, Peter, Lendák, Imre

论文摘要

内容交付网络（CDN）是Internet的骨干，是向数十亿用户提供高质量视频（VOD），Web内容和文件服务的关键。 CDN通常由层次有组织的内容服务器组成，该内容服务器位于尽可能靠近客户的情况下。当分析其系统生成的数十亿个Web服务器和代理日志时，CDN操作员面临巨大的挑战。这项研究的主要目的是分析各种聚类方法在CDN错误日志分析中的适用性。我们使用现实生活中的CDN代理日志，确定了日志中包含的关键功能（例如，内容类型，HTTP状态代码，日期，主机），并聚集了对应于与其他主机类型相对应的日志行，提供实时电视，按需进行视频，文件缓存和Web内容。我们的实验是在一个数据集上运行的，该数据集由一个在7天期间收集的代理日志组成，该数据集从一个运行多种类型的服务（VOD，LIVE TV，FILE）的单个物理CDN服务器中收集的代理日志。数据集由22亿日志线组成。我们的分析表明，CDN错误聚类是确定反复出现错误并提高整体服务质量的可行方法。

Content delivery networks (CDNs) are the backbone of the Internet and are key in delivering high quality video on demand (VoD), web content and file services to billions of users. CDNs usually consist of hierarchically organized content servers positioned as close to the customers as possible. CDN operators face a significant challenge when analyzing billions of web server and proxy logs generated by their systems. The main objective of this study was to analyze the applicability of various clustering methods in CDN error log analysis. We worked with real-life CDN proxy logs, identified key features included in the logs (e.g., content type, HTTP status code, time-of-day, host) and clustered the log lines corresponding to different host types offering live TV, video on demand, file caching and web content. Our experiments were run on a dataset consisting of proxy logs collected over a 7-day period from a single, physical CDN server running multiple types of services (VoD, live TV, file). The dataset consisted of 2.2 billion log lines. Our analysis showed that CDN error clustering is a viable approach towards identifying recurring errors and improving overall quality of service.

下载PDF全文

下载文献需遵守相关版权规定

论文标题