通过提示大型语言模型的类比生成：教学案例研究

论文标题

通过提示大型语言模型的类比生成：教学案例研究

Analogy Generation by Prompting Large Language Models: A Case Study of InstructGPT

论文作者

Bhavya, Bhavya, Xiong, Jinjun, Zhai, Chengxiang

论文摘要

我们提出了一种新的应用，即提示预培训的语言模型（PLM）生成类比并研究如何设计有效的两个任务设置：产生类似于给定目标概念（AKA类似概念或ACG）的来源概念，并产生对给定的目标概念和源概念之间的相似性（AKA类似的解释）（AKA类似的解释）。我们发现，促使指示邀请发现产生有意义的类比是可行的，而最佳提示往往是精确的命令式陈述，尤其是在低温设置的情况下。我们还系统地分析了指令设备模型促使设计，温度和注入拼写错误的敏感性，并发现该模型对某些变化特别敏感（例如，问题与命令性语句）。此外，我们对1.4K生成的类比进行了人类评估，发现世代的质量因模型大小而大不相同。最大的指导游戏模型可以在为给定目标生成有意义的类比时实现人级的性能，而在AEG任务上仍然有改进的余地。

We propose a novel application of prompting Pre-trained Language Models (PLMs) to generate analogies and study how to design effective prompts for two task settings: generating a source concept analogous to a given target concept (aka Analogous Concept Generation or ACG), and generating an explanation of the similarity between a given pair of target concept and source concept (aka Analogous Explanation Generation or AEG). We found that it is feasible to prompt InstructGPT to generate meaningful analogies and the best prompts tend to be precise imperative statements especially with a low temperature setting. We also systematically analyzed the sensitivity of the InstructGPT model to prompt design, temperature, and injected spelling errors, and found that the model is particularly sensitive to certain variations (e.g., questions vs. imperative statements). Further, we conducted human evaluation on 1.4k of the generated analogies and found that the quality of generations varies substantially by model size. The largest InstructGPT model can achieve human-level performance at generating meaningful analogies for a given target while there is still room for improvement on the AEG task.

下载PDF全文

下载文献需遵守相关版权规定

论文标题