财务舞弊量化判别模型发展趋势及监管应用

田莉; 吴思思; 邸超伦

财务舞弊量化判别模型发展趋势及监管应用

Development Trends and Regulatory Applications of Quantitative Models for Detecting Financial Fraud

摘要

摘要: 利用量化模型判别上市公司财务舞弊是经典研究问题。早期的传统实证模型以逻辑回归方法为主，应用的数据指标从财务指标逐步拓展到公司治理、外部审计、融资需求等指标，注重内在因果推理和寻找舞弊特征因素，对早期财务舞弊样本的判别效果良好。随着近年来财务舞弊手段的日替复杂，简单逻辑回归模型的判别效果难以满足现实需要，浅层机器学习模型和深层机器学习模型逐步发展。浅层机器学习模型以决策树、支持向量机及集成学习算法为主，应用的数据指标既包括财务指标、公司治理、内部控制等，也包括原始财务数据、文本信息等。其中，以决策树为基础的集成学习模型有较好的解释性，实验室状态下的召回率在70%~80%左右。深层机器学习模型以卷积神经网络、循环神经网络、长短期记忆神经网络模型为主，处理非结构化数据的能力更加强大，应用的数据指标除了结构数据、文本数据等，还包括音频、图像等非结构化信息，判别能力往往优于浅层机器学习模型，召回率一般超过80%。但可解释性较差，成功很大程度上依赖于大量训练数据，而国内样本较少，易出现过拟合问题。总的看，量化模型尤其是深度学习模型对财务舞弊具有一定的判别效果，已可用于财务舞弊粗筛选，但也存在分行业或分舞弊类型的专用模型较少、对大量有价值的非财务数据及非结构化数据应用不足、模型的解释性不强等问题。可考虑从样本积累、数据治理、人机交互和嵌入大模型等方面多向发力，推动模型效能实现质的提升。

Abstract: Using quantitative models to detect financial fraud in listed companies is a classic research topic. Early traditional empirical models mainly rely on logistic regression, with data indicators gradually expanding from financial indicators to those related to corporate governance, external auditing, and financing needs, emphasizing intrinsic causal inference and identifying fraudulent characteristics. This approach performs well in identifying financial fraud cases in early samples. However, as fraudulent tactics have become increasingly sophisticated in recent years, the effectiveness of simple logistic regression models has struggled to meet practical needs and shallow and deep learning models are gradually developing. Shallow machine learning models mainly use decision trees, support vector machines, and ensemble learning algorithms. The indicators used by these models include not only financial indicators, corporate governance, and internal control but also raw financial data and textual information. Specifically, ensemble learning models based on decision trees have relatively good interpretability, with recall rates between 70% and 80% in laboratory settings. Deep machine learning models mainly use convolutional neural networks, recurrent neural networks, and long short-term memory networks. These models excel at handling unstructured data. The input data includes not only structured and textual data but also unstructured information such as audio and images. Deep learning models usually demonstrate stronger detection capabilities than shallow machine learning models, with recall rates generally exceeding 80%. However, deep learning models have limited interpretability and their success largely depends on large training datasets, whereas the scarcity of sample data in China can easily lead to overfitting issues. Overall, quantitative models, especially deep learning models, show considerable effectiveness in detecting financial fraud and can already be used for preliminary screening. Nevertheless, there are also problems such as the limited availability of industry- or fraud-type-specific models, insufficient utilization of valuable non-financial and unstructured data, and weak model interpretability. It is suggested to make efforts from multiple aspects such as sample accumulation, data governance, human-machine interaction, and the integration of large language models to achieve substantive improvements in model performance.

HTML全文

参考文献(47)

施引文献

资源附件(0)

注释(3)

唯一官方网站

财务舞弊量化判别模型发展趋势及监管应用

Development Trends and Regulatory Applications of Quantitative Models for Detecting Financial Fraud