论文标题
在预防和治疗研究中做出明智的决定
Making SMART decisions in prophylaxis and treatment studies
论文作者
论文摘要
最佳预防和如果预防失败,则可以使用顺序多重分配随机试验(SMART)来评估疾病。 SMART是一项多阶段研究,将参与者随机进行初始治疗,观察到对该治疗的反应,然后根据其观察到的反应,将同一参与者随机与替代治疗一样。与固定的随机化相比,在某些情况下,反应自适应随机化可能会改善试验参与者的结果和加快试验结论。但是,“近视”反应自适应随机化策略(对多阶段动态视而不见)也可能导致次优的治疗分配。我们提出了一种基于Q-学习的“动态”响应自适应随机化策略,该Q-Learning是一种近似动态编程算法。 Q学习使用阶段的统计模型和向后归纳,将后期的“收益”(即临床结果)纳入早期“动作”(即治疗)。我们的现实世界示例包括在每个阶段具有质量不同的二进制终点的智能预防和智能治疗。标准Q-学习不适用于此类数据,因为它不能用于二进制端点的序列。定性不同的终点的序列也可能需要不同的权重,以确保设计指导参与者到具有最高实用程序的方案。我们描述了如何使用简单的决策理论扩展到Q学习,以处理具有不同实用程序的顺序二进制端点。使用模拟,我们表明,在一组二进制实用程序下,与固定方法相比,“动态”方法增加了预期的参与者效用,有时在所有模型参数中显着,而“近视”方法实际上可以降低效用。
The optimal prophylaxis, and treatment if the prophylaxis fails, for a disease may be best evaluated using a sequential multiple assignment randomised trial (SMART). A SMART is a multi-stage study that randomises a participant to an initial treatment, observes some response to that treatment and then, depending on their observed response, randomises the same participant to an alternative treatment. Response adaptive randomisation may, in some settings, improve the trial participants' outcomes and expedite trial conclusions, compared to fixed randomisation. But 'myopic' response adaptive randomisation strategies, blind to multistage dynamics, may also result in suboptimal treatment assignments. We propose a 'dynamic' response adaptive randomisation strategy based on Q-learning, an approximate dynamic programming algorithm. Q-learning uses stage-wise statistical models and backward induction to incorporate late-stage 'payoffs' (i.e. clinical outcomes) into early-stage 'actions' (i.e. treatments). Our real-world example consists of a COVID-19 prophylaxis and treatment SMART with qualitatively different binary endpoints at each stage. Standard Q-learning does not work with such data because it cannot be used for sequences of binary endpoints. Sequences of qualitatively distinct endpoints may also require different weightings to ensure that the design guides participants to regimens with the highest utility. We describe how a simple decision-theoretic extension to Q-learning can be used to handle sequential binary endpoints with distinct utilities. Using simulation we show that, under a set of binary utilities, the 'dynamic' approach increases expected participant utility compared to the fixed approach, sometimes markedly, for all model parameters, whereas the 'myopic' approach can actually decrease utility.