连续学习，离散地行动：混合动作空间增强力量学习以实现最佳执行

论文标题

连续学习，离散地行动：混合动作空间增强力量学习以实现最佳执行

Learn Continuously, Act Discretely: Hybrid Action-Space Reinforcement Learning For Optimal Execution

论文作者

Pan, Feiyang, Zhang, Tongzhe, Luo, Ling, He, Jia, Liu, Shuoling

论文摘要

最佳执行是算法交易中节省成本的顺序决策问题。研究发现，加强学习（RL）可以帮助确定订单分类的大小。但是，问题仍未解决：如何以适当的限制价格下达限额订单？关键挑战在于动作空间的“连续二差双重性”。一方面，使用价格变化百分比的连续行动空间是概括。另一方面，交易者最终需要离散地选择限制价格，这是由于tick尺寸的存在，这需要针对每个具有不同特征（例如流动性和价格范围）的每种股票的专业化。因此，我们需要持续控制以进行概括和离散控制以进行专业化。为此，我们提出了一种混合RL方法来结合两者的优势。我们首先使用连续的控制代理来范围范围，然后部署精细的代理以选择特定的限制价格。广泛的实验表明，与现有的RL算法相比，我们的方法具有更高的样本效率和更好的训练稳定性，并且显着优于先前基于学习的订单执行方法。

Optimal execution is a sequential decision-making problem for cost-saving in algorithmic trading. Studies have found that reinforcement learning (RL) can help decide the order-splitting sizes. However, a problem remains unsolved: how to place limit orders at appropriate limit prices? The key challenge lies in the "continuous-discrete duality" of the action space. On the one hand, the continuous action space using percentage changes in prices is preferred for generalization. On the other hand, the trader eventually needs to choose limit prices discretely due to the existence of the tick size, which requires specialization for every single stock with different characteristics (e.g., the liquidity and the price range). So we need continuous control for generalization and discrete control for specialization. To this end, we propose a hybrid RL method to combine the advantages of both of them. We first use a continuous control agent to scope an action subset, then deploy a fine-grained agent to choose a specific limit price. Extensive experiments show that our method has higher sample efficiency and better training stability than existing RL algorithms and significantly outperforms previous learning-based methods for order execution.

下载PDF全文

下载文献需遵守相关版权规定

论文标题