Imbalanced credit risk prediction based on data fillers and modified balanced random forest improved by Bayesian optimization

Hongyu Zhang; Zhenjun Ye

doi:10.61935/aedmr.2.1.2024.P115

Abstract

Based on the distribution characteristics of financial big data, credit risk prediction models often face some problems, such as unbalanced data distribution and difficult data preprocessing process. High-precision models are often accompanied by low model efficiency. Therefore, this paper constructs a complete non-equilibrium credit risk prediction model, namely BO-PBRF, and improves the algorithm to deal with common problems in financial data. In the data preprocessing stage, two missing value fillers are generated according to the original data to facilitate the subsequent new data processing. In the modeling stage, we improve the balanced random forest algorithm, so that the model can not only deal with unbalanced data sets, but also suitable for the background of the explosive development of financial big data, and improve the operation speed of the model. In addition, in the process of establishing the model, we add the Bayesian optimization algorithm to further improve the accuracy of the model, especially in the prediction of default loans. In order to verify the effectiveness of the model proposed in this paper, in the empirical research, we select the credit data from the real world, and compare the model proposed in this paper with the previous models. The experimental results show that the proposed model has the best prediction performance for default data.