论文标题
1976年至2021年专利相似性下降的驱动因素
Drivers of the decrease of patent similarities from 1976 to 2021
论文作者
论文摘要
专利的引文网络是由专利申请人正确披露其发明的法律义务引起的。研究当前专利与其先决条件之间关系的一种方法是分析专利文本要素之间的相似性。自70年代中期以来,许多专利相似指标已显示出恒定的减少。尽管已经提出了一些解释,但对这种现象的更全面分析很少见。在本文中,我们使用利用最先进的自然语言处理工具的专利相似性评分的计算有效度量,以调查这种明显相似性的潜在驱动因素。这是通过通过广义加性模型对专利相似性得分进行建模来实现的。我们发现,非线性建模规范能够区分专利相似性水平的不同的,时间变化的驱动因素,这些级别解释了与以前的方法相比,可以解释数据的更多变化($ r^2 \ sim 18 \%$)。此外,该模型揭示了相似性得分的潜在趋势,这与前面提出的分数根本不同。
The citation network of patents citing prior art arises from the legal obligation of patent applicants to properly disclose their invention. One way to study the relationship between current patents and their antecedents is by analyzing the similarity between the textual elements of patents. Many patent similarity indicators have shown a constant decrease since the mid-70s. Although several explanations have been proposed, more comprehensive analyses of this phenomenon have been rare. In this paper, we use a computationally efficient measure of patent similarity scores that leverages state-of-the-art Natural Language Processing tools, to investigate potential drivers of this apparent similarity decrease. This is achieved by modeling patent similarity scores by means of generalized additive models. We found that non-linear modeling specifications are able to distinguish between distinct, temporally varying drivers of the patent similarity levels that explain more variation in the data ($R^2\sim 18\%$) compared to previous methods. Moreover, the model reveals an underlying trend in similarity scores that is fundamentally different from the one presented previously.