寻求权力的人工智能是存在的风险吗？

论文标题

寻求权力的人工智能是存在的风险吗？

Is Power-Seeking AI an Existential Risk?

论文作者

Carlsmith, Joseph

论文摘要

该报告探讨了我认为对未对齐人工智能存在的存在风险的关注的核心论点。我在两个阶段进行。首先，我布置了一个引起这种关注的背景图片。在这张照片上，智能代理是一项极其强大的力量，创造代理人比我们发火要聪明得多 - 尤其是考虑到他们的目标是有问题的，这种代理人将有可能具有工具激励措施来寻求对人类的权力。其次，我提出并评估了一个更具体的六异端论点，即创建这种代理将导致到2070年的存在灾难。在这一论点上，到2070年：（1）建立相关强大和代理的AI系统将变得可能且在财务上是可行的；（2）将有强烈的激励措施；（3）要建立比对（相关/具有相关/代理）AI系统比建立未对部署具有表面吸引力的（相关强大/代理）AI系统的（相关且相关的功能/代理）系统更难的；（4）一些这种未对准的系统将以高影响力的方式寻求对人类的权力；（5）这个问题将扩展到人类的全部剥夺；（6）这种不授权将构成生存的灾难。我将粗略的主观凭证分配给该论点的前提，最终以约5％的总体估计，即这种存在的灾难将在2070年发生。（2022年5月更新：自2021年4月将此报告公开以来，我的估计已经上升了，我的估计已经上升了，并且现在已经> 10％。）。

This report examines what I see as the core argument for concern about existential risk from misaligned artificial intelligence. I proceed in two stages. First, I lay out a backdrop picture that informs such concern. On this picture, intelligent agency is an extremely powerful force, and creating agents much more intelligent than us is playing with fire -- especially given that if their objectives are problematic, such agents would plausibly have instrumental incentives to seek power over humans. Second, I formulate and evaluate a more specific six-premise argument that creating agents of this kind will lead to existential catastrophe by 2070. On this argument, by 2070: (1) it will become possible and financially feasible to build relevantly powerful and agentic AI systems; (2) there will be strong incentives to do so; (3) it will be much harder to build aligned (and relevantly powerful/agentic) AI systems than to build misaligned (and relevantly powerful/agentic) AI systems that are still superficially attractive to deploy; (4) some such misaligned systems will seek power over humans in high-impact ways; (5) this problem will scale to the full disempowerment of humanity; and (6) such disempowerment will constitute an existential catastrophe. I assign rough subjective credences to the premises in this argument, and I end up with an overall estimate of ~5% that an existential catastrophe of this kind will occur by 2070. (May 2022 update: since making this report public in April 2021, my estimate here has gone up, and is now at >10%.)

下载PDF全文

下载文献需遵守相关版权规定

论文标题