Abstract
In this paper, we propose a model for early warning of fraud risk among Chinese listed companies, leveraging the capabilities of heterogeneous data analysis. Given the complexity and multidimensionality of corporate fraud, our model integrates multiple data sources including financial reports, market behavior, and the sentiment of annual report texts, representing diverse sets of heterogeneous data. This approach employs advanced data processing techniques to handle and amalgamate heterogeneous data, ensuring robustness and accuracy. Utilizing machine learning algorithms, the model not only detects potential fraud signals but also quantifies the level of risk, providing stakeholders with a dynamic predictive tool. This research offers a comprehensive data-driven approach to fraud detection in the corporate sector, underscoring the importance of meticulous risk assessment using various data streams. It marks a critical step in proactive fraud management in an increasingly complex financial environment.