论文标题
超越笼子:调查学到的自主网络防御政策的概括
Beyond CAGE: Investigating Generalization of Learned Autonomous Network Defense Policies
论文作者
论文摘要
增强学习的进步(RL)激发了网络防御智能自动化的新方向。但是,其中许多进步要么超过了网络安全性的应用,要么没有考虑与在现实世界中实施它们相关的挑战。为了了解这些问题,这项工作评估了第二版CAGE Challenge中实施的几种RL方法,这是一项公共竞争,旨在在高保真网络模拟器中构建自主网络防御者代理。我们的方法都基于算法近端政策优化(PPO)家族,包括层次RL,动作掩盖,定制培训和集合RL。我们发现合奏RL技术的表现最强,表现优于我们的其他模型,并在比赛中排名第二。为了了解对真实环境的适用性,我们评估了每种方法概括到看不见的网络和反对未知攻击策略的能力。在看不见的环境中,我们所有的方法都表现较差,并且根据环境变化的类型而降解变化。根据未知的攻击者策略,我们发现我们的模型降低了整体性能,即使新策略的效率不如我们的模型训练。这些结果共同凸显了在现实世界中自动网络防御的有前途的研究方向。
Advancements in reinforcement learning (RL) have inspired new directions in intelligent automation of network defense. However, many of these advancements have either outpaced their application to network security or have not considered the challenges associated with implementing them in the real-world. To understand these problems, this work evaluates several RL approaches implemented in the second edition of the CAGE Challenge, a public competition to build an autonomous network defender agent in a high-fidelity network simulator. Our approaches all build on the Proximal Policy Optimization (PPO) family of algorithms, and include hierarchical RL, action masking, custom training, and ensemble RL. We find that the ensemble RL technique performs strongest, outperforming our other models and taking second place in the competition. To understand applicability to real environments we evaluate each method's ability to generalize to unseen networks and against an unknown attack strategy. In unseen environments, all of our approaches perform worse, with degradation varied based on the type of environmental change. Against an unknown attacker strategy, we found that our models had reduced overall performance even though the new strategy was less efficient than the ones our models trained on. Together, these results highlight promising research directions for autonomous network defense in the real world.