论文标题
Tweetrought:基于Twitter数据的深度学习干旱影响识别器
TweetDrought: A Deep-Learning Drought Impacts Recognizer based on Twitter Data
论文作者
论文摘要
在温暖的气候下,更好地了解干旱影响变得越来越重要。传统的干旱指数主要描述了生物物理变量,而不影响社会,经济和环境系统。我们利用自然语言处理和基于变压器(BERT)转移学习的双向编码器表示,以根据来自美国的基于新闻的干旱影响报告(DIR)的数据对模型进行微调,然后根据美国的Twitter数据识别七种类型的干旱影响。我们的模型在DIR测试集上达到了令人满意的宏F1分数0.89。然后将模型应用于加利福尼亚推文,并使用基于关键字的标签进行验证。宏F1分数为0.58。但是,由于关键字的限制,我们还发现了具有有争议的标签的推文。与关键字标签相比,83.5%的BERT标签是正确的。总体而言,基于BERT的微调识别器为干旱影响提供了适当的预测和有价值的信息。模型的解释和分析与经验领域的专业知识一致。
Acquiring a better understanding of drought impacts becomes increasingly vital under a warming climate. Traditional drought indices describe mainly biophysical variables and not impacts on social, economic, and environmental systems. We utilized natural language processing and bidirectional encoder representation from Transformers (BERT) based transfer learning to fine-tune the model on the data from the news-based Drought Impact Report (DIR) and then apply it to recognize seven types of drought impacts based on the filtered Twitter data from the United States. Our model achieved a satisfying macro-F1 score of 0.89 on the DIR test set. The model was then applied to California tweets and validated with keyword-based labels. The macro-F1 score was 0.58. However, due to the limitation of keywords, we also spot-checked tweets with controversial labels. 83.5% of BERT labels were correct compared to the keyword labels. Overall, the fine-tuned BERT-based recognizer provided proper predictions and valuable information on drought impacts. The interpretation and analysis of the model were consistent with experiential domain expertise.