metatrader：一种加强学习方法，整合了投资组合优化的多种政策

论文标题

metatrader：一种加强学习方法，整合了投资组合优化的多种政策

MetaTrader: An Reinforcement Learning Approach Integrating Diverse Policies for Portfolio Optimization

论文作者

Niu, Hui, Li, Siyuan, Li, Jian

论文摘要

投资组合管理是金融中的一个基本问题。它涉及资产的定期重新分配，以最大程度地在适当的风险敞口范围内最大化预期收益。由于其在顺序决策中的强大能力，深入的强化学习（RL）被认为是解决此问题的一种有希望的方法。但是，由于金融市场的非平稳性，将RL技术应用于投资组合优化仍然是一个具有挑战性的问题。从各种专家策略中提取交易知识可能有助于代理商适应不断变化的市场。在本文中，我们提出了Metatrader，这是一种基于两阶段RL的投资组合管理方法，该方法学会了整合各种交易政策以适应各种市场条件。在第一阶段，Metatrader将模仿学习目标纳入了增强学习框架中。通过模仿不同的专家演示，Metatrader获得了一系列具有多样性的交易政策。在第二阶段，MetaTrader学习了一个元派利，以认识到市场状况并决定遵循的最合适的学习政策。我们在三个现实世界索引数据集上评估了建议的方法，并将其与最先进的基准进行了比较。经验结果表明，在平衡利润和风险方面，Metatrader明显优于这些基准。此外，彻底消融研究验证了所提出的方法中组件的有效性。

Portfolio management is a fundamental problem in finance. It involves periodic reallocations of assets to maximize the expected returns within an appropriate level of risk exposure. Deep reinforcement learning (RL) has been considered a promising approach to solving this problem owing to its strong capability in sequential decision making. However, due to the non-stationary nature of financial markets, applying RL techniques to portfolio optimization remains a challenging problem. Extracting trading knowledge from various expert strategies could be helpful for agents to accommodate the changing markets. In this paper, we propose MetaTrader, a novel two-stage RL-based approach for portfolio management, which learns to integrate diverse trading policies to adapt to various market conditions. In the first stage, MetaTrader incorporates an imitation learning objective into the reinforcement learning framework. Through imitating different expert demonstrations, MetaTrader acquires a set of trading policies with great diversity. In the second stage, MetaTrader learns a meta-policy to recognize the market conditions and decide on the most proper learned policy to follow. We evaluate the proposed approach on three real-world index datasets and compare it to state-of-the-art baselines. The empirical results demonstrate that MetaTrader significantly outperforms those baselines in balancing profits and risks. Furthermore, thorough ablation studies validate the effectiveness of the components in the proposed approach.

下载PDF全文

下载文献需遵守相关版权规定

论文标题