迈向混合优化视频编码

论文标题

迈向混合优化视频编码

Towards Hybrid-Optimization Video Coding

论文作者

Huo, Shuai, Liu, Dong, Li, Li, Ma, Siwei, Wu, Feng, Gao, Wen

论文摘要

视频编码本质上是速率和失真的数学优化问题。为了解决这个复杂的优化问题，已经开发了两个流行的视频编码框架：基于块的混合视频编码和端到端学习的视频编码。如果我们从优化的角度重新考虑视频编码，我们会发现现有的两个框架代表了两个优化解决方案的方向。基于块的混合编码代表离散优化解决方案，因为这些无关的编码模式在数学中是离散的。它在多个起点（即模式）中搜索最好的一个。但是，搜索不够有效。另一方面，端到端学习的编码代表连续优化解决方案，因为梯度下降基于连续函数。它通过数值算法有效地优化了一组模型参数。但是，只有一个起点限制，很容易落入本地最佳。为了更好地解决优化问题，我们建议将视频编码视为离散和连续优化问题的混合体，并同时使用搜索和数值算法来解决它。我们的想法是在全球空间中提供多个离散的起点，并通过数值算法有效地优化每个点附近的本地最优值。最后，我们在那些本地最佳中搜索全球最佳。在混合优化想法的指导下，我们设计了混合优化视频编码框架，该框架完全建立在连续的深网网络上，还包含一些离散模式。我们进行了一组全面的实验。与连续优化框架相比，我们的方法优于纯学习的视频编码方法。同时，与离散优化框架相比，我们的方法与PSNR中的HEVC参考软件HM16.10相当。

Video coding is a mathematical optimization problem of rate and distortion essentially. To solve this complex optimization problem, two popular video coding frameworks have been developed: block-based hybrid video coding and end-to-end learned video coding. If we rethink video coding from the perspective of optimization, we find that the existing two frameworks represent two directions of optimization solutions. Block-based hybrid coding represents the discrete optimization solution because those irrelevant coding modes are discrete in mathematics. It searches for the best one among multiple starting points (i.e. modes). However, the search is not efficient enough. On the other hand, end-to-end learned coding represents the continuous optimization solution because the gradient descent is based on a continuous function. It optimizes a group of model parameters efficiently by the numerical algorithm. However, limited by only one starting point, it is easy to fall into the local optimum. To better solve the optimization problem, we propose to regard video coding as a hybrid of the discrete and continuous optimization problem, and use both search and numerical algorithm to solve it. Our idea is to provide multiple discrete starting points in the global space and optimize the local optimum around each point by numerical algorithm efficiently. Finally, we search for the global optimum among those local optimums. Guided by the hybrid optimization idea, we design a hybrid optimization video coding framework, which is built on continuous deep networks entirely and also contains some discrete modes. We conduct a comprehensive set of experiments. Compared to the continuous optimization framework, our method outperforms pure learned video coding methods. Meanwhile, compared to the discrete optimization framework, our method achieves comparable performance to HEVC reference software HM16.10 in PSNR.

下载PDF全文

下载文献需遵守相关版权规定

论文标题