论文标题

使用文本数据进行破产预测的多模式生成模型

Multimodal Generative Models for Bankruptcy Prediction Using Textual Data

论文作者

Mancisidor, Rogelio A., Aas, Kjersti

论文摘要

金融文件中的文本数据,例如,管理层的讨论与分析(MDA)部分10-K已被用来提高破产模型的预测准确性。但是,实际上,我们无法获得所有上市公司的MDA部分,这限制了传统破产模型中MDA数据的使用,因为它们需要完整的数据来做出预测。缺乏MDA的两个主要原因是:(i)并非所有公司都必须提交MDA,并且(ii)在爬行和取消MDA部分时出现技术问题。为了解决这一限制,这项研究介绍了有条件的多模式判别(CMMD)模型,该模型学习了多模式表示,这些表示从会计,市场和文本数据模式中嵌入信息。 CMMD模型需要一个具有所有数据模式的样本,以进行模型培训。在测试时,CMMD模型仅需要访问会计和市场方式来生成多模式表示,这些表示进一步用于做出破产预测并从缺失的MDA模式中生成单词。通过这种新颖的方法,在破产预测模型中使用文本数据是现实的,因为与文本数据不同,所有公司都可以使用会计和市场数据。这项研究的经验结果表明,如果金融监管机构或投资者使用MDA数据使用传统模型,他们只能对60%的公司进行预测。此外,考虑到我们样本中的所有公司,我们提出的方法的分类性能优于大量传统分类器模型。

Textual data from financial filings, e.g., the Management's Discussion & Analysis (MDA) section in Form 10-K, has been used to improve the prediction accuracy of bankruptcy models. In practice, however, we cannot obtain the MDA section for all public companies, which limits the use of MDA data in traditional bankruptcy models, as they need complete data to make predictions. The two main reasons for the lack of MDA are: (i) not all companies are obliged to submit the MDA and (ii) technical problems arise when crawling and scrapping the MDA section. To solve this limitation, this research introduces the Conditional Multimodal Discriminative (CMMD) model that learns multimodal representations that embed information from accounting, market, and textual data modalities. The CMMD model needs a sample with all data modalities for model training. At test time, the CMMD model only needs access to accounting and market modalities to generate multimodal representations, which are further used to make bankruptcy predictions and to generate words from the missing MDA modality. With this novel methodology, it is realistic to use textual data in bankruptcy prediction models, since accounting and market data are available for all companies, unlike textual data. The empirical results of this research show that if financial regulators, or investors, were to use traditional models using MDA data, they would only be able to make predictions for 60% of the companies. Furthermore, the classification performance of our proposed methodology is superior to that of a large number of traditional classifier models, taking into account all the companies in our sample.

扫码加入交流群

加入微信交流群

微信交流群二维码

扫码加入学术交流群,获取更多资源