Algoritmos de machine learning para la detección del fraude en el seguro de automóviles

Elena Badal Valero; Andrés Sanjuán Díaz; Jorge Segura Gisbert

doi:10.26360/2020_2

Autores/as

Elena Badal Valero Universidad de Valencia (España)
Andrés Sanjuán Díaz Universidad de Valencia (España)
Jorge Segura Gisbert Universidad de Valencia (España)

DOI:

https://doi.org/10.26360/2020_2

Palabras clave:

fraude, aprendizaje automático, seguro de automóvil, riesgo

Resumen

El fraude en el seguro de automóvil ha aumentado considerablemente en los últimos años, indudablemente impulsado por la crisis económica. Este incremento significativo del número de reclamaciones fraudulentas, así como los nuevos requerimientos asociados con Solvencia II, conducen a un mayor control y asignación de recursos contra el fraude por parte de las aseguradoras. Por estas razones, la importancia del uso de avanzadas técnicas de predicción para la detección de accidentes sospechosos está más que justificada. En este artículo, se presentan diversas metodologías de base estadística y algoritmos de aprendizaje automático que permiten el análisis y la detección de tales afirmaciones.

Descargas

Los datos de descargas todavía no están disponibles.

Citas

Artís, M., Ayuso, M., Guillen, M. (1999). Técnicas cuantitativas para la detección del fraude en el seguro del automóvil. Anales del Instituto de Actuarios Españoles, 5, 51-84.

ASEPEG (2020).Glosario de Términos. En https://www.apeseg.org.pe/glosario-de-terminos/

Ayuso, M., Guillén, M. (1999). Modelos de deteccion de fraude en el seguro de automóvil, Cuadernos Actuariales, 8, 135-149.

Badal-Valero E., Alvarez-Jareño, J.A. y Pavía, J.M. (2018). Combining Benford's Law and Machine Learning to detect Money Laundering. An actual Spanish court case, Forensic Science International, 282, 24-34.

Belhadji, B., Dionne, G. (1997). Development of an expert system for automatic detection of automobile insurance fraud. Working Paper 97-06. École des Hautes Études Commerciales. Université de Montréal.

Ben-Hur, A., Horn, D., Siegelmann, H., Vapnik, V. (2001). Support Vector Clustering. Journal of Machine Learning Research. 2. 125-137.

Bolton, R.J. y Hand, D.J. (2002). Statistical Fraud Detection: A Review. Statistical Science, 17(13), 235-255.

Breiman, L. (2001). Random Forests. Machine Learning, 45(1), 5-32.

Brockett, P.L, Xia, X y Derrig, R. (1995).Using Kohonen’s Self-Organizing Feature Map to Uncover Automobile Bodily Injury Claims Fraud. Journal of Risk and Insurance, 65(2), 245-274.

Burez, J. y Van den Poel, D. (2009). Handling class imbalance in customer churn prediction. Experts Systems with Applications, 36, 4626-4636.

Cestnik, B, Kononenko, I, Bratko, I. (1987). A knowledge elicitation tool for sophisticated users. Progress in Machine Learning, 31-45, Sigma Press.

Chawla, N. V., Bowyer, K. W., Hall, L. O., Kegelmeyer, W. P. (2002). Smote: Synthetic minority over-sampling technique. Journal of Artificial Intelligence Research, 16, 321-357.

Chawla, N. V. (2005). Data mining for imbalanced datasets: An overview. Data mining and knowledge discovery handbook, 853-867. Springer US.

Chen, C.; Liaw, A. y Breiman, L. (2004). Using random forest to learn imbalanced data. Technical Report 666. Statistics Department of University of California at Berkley.

Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y., Cho, H.; Chen, K., Mitchell, R., Cano, I., Zhou, T., Li, M., Xie, J., Lin, M.,L Geng, Y. y Li, Y. (2018). xgboost: Extreme Gradient Boosting. R package version 0.71.2. https://CRAN.R-project.org/package=xgboost

Crocker, K.J. y S. Tennyson (2002). Insurance Fraud and Optimal Claims Settlement Strategies. Journal of Law & Economics, 45(2), 469-507.

Cummins, J.D. y Tennyson, S. (1996). Moral Hazard in Insurance Claiming: Evidence from Automobile Insurance. Journal of Risk and Uncertainty, 12 (1), 29-50.

Derrig, R.A y Ostaszewski, K.M. (1995). Fuzzy Techniques of Pattern Recognition in Risk and Claim Classification, Journal of Risk and Insurance, 62(3), 447-482.

Friedman, Jerome H. (2001). Greedy function approximation: A gradient boosting machine. The Annals of Statistics, 29(5), 1189–1232.

Guo, H., Li, Y., Shang, J., Mingyun, G., Yuanyue, H., and Bing, G. (2017). Learning from class-imbalanced data: Review of methods and applications.Expert Syst. Appl., 73:220–239.

Hastie T, Rosset S, Tibshirani R, Zhu J (2004). The Entire Regularization Path for the Support Vector Machine. Journal of Machine Learning Research, 5, 1391–1415.

He, H. y Garcia, E. A. (2009). Learning from imbalanced data. IEEE Transactions on knowledge and data engineering, 21(9), 1263-1284.

Hidalgo Ruiz-Capillas, S (2014). Random Forests para detección de fraude en medios de pago. Trabajo Final de Máster. Universidad Autónoma de Madrid.

Hopfield, J. J. (1982). Neural networks and physical systems with emergent collective computational abilities. Proceedings of the national academy of sciences, 79 (8), 2554-2558.

Huigevoort, Chantine (2015). Customer Churn prediction for an insurance company. Eindhoven University of Technology. Master Thesis. https://pure.tue.nl/ws/portalfiles/portal/47019808 [Último acceso: 23 de septiembre de 2018]

ICEA (2018). El Fraude al Seguro Español. Estadística a diciembre. Año 2017. Madrid, España.

Kaymak, U.; Ben-David, A. y Potharst, R. (2012). The AUK: A simple alternative to the AUC. Engineering Applications of Artificial Intelligence, 25(5), pp. 1082-1089.

Karatzoglou, A., Meyer, D. y Hornik, K. (2006). Support Vector Machines in R. Journal of Statistical Software, 15 (i09).

Keramati, A., Jafari-Marandi, R., Aliannejadi, M., Ahmadian, I., Mozaffari, M., y Abbasi, U. (2014). Improved churn prediction in telecommunication industry using data mining techniques. Applied Soft Computing, 24, pp. 994-1012.

Kohavi, R., y F. Provost (1998) On Applied Research in Machine Learning. In Editorial for the Special Issue on Applications of Machine Learning and the Knowledge Discovery Process, Columbia. University, New York, 30.

Kursa, M.B. y Rudnicki, W.R. (2010). Feature Selection with the Boruta Package. Journal of Statistical Software, 36(11), 1-13. URL: http://www.jstatsoft.org/v36/i11/

Liaw, A. y Wiener M. (2002). Classification and Regression by Random Forest. R News 2(3), pp 18-22.

Lunardon, N., Menardi, G. y Torelli, N. (2014). ROSE: a Package for Binary Imbalanced Learning. The R Journal, 6(1), 82-92.

López, V., Fernández, A., García, S. Palade, V. y Herrera, F. (2013). An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics. Information Sciences. 250, 113-141.

Picard, P. (2000). Economic analysis of insurance fraud. Handbook of insurance. 315-362. Springer, Dordrecht.

Van Vlasselaer, V., Eliassi-Rad, T., Akoglu, L., Snoeck, M. y Baesesns, B. (2017). Network-based Fraud Detection for Social Security Fraud. Management Science, 63(9), 3090-3110.

Shmueli, G.; Patel, N.R. y Bruce, P.C. (2011). Data mining for business intelligence: concepts, techniques, and applications in microsoft office excel with xlminer. John Wiley and Sons, second edition.

Silver, N. (2014). La Señal y el Ruido. Ediciones Península, Barcelona.

Swets, J.A. (1988). Measuring the accuracy of diagnostic systems. Science, 240, 1285-1293.

Therneau, T. y Atkinson, B. (2018). rpart: Recursive Partitioning and Regression Trees. R package version 4.1-13. URL: https://CRAN.R-project.org/package=rpart

Yen, S.J y Lee, Y.S (2009). Cluster-based under-sampling approaches for imbalanced data distributions. Expert Systems with Applications, 36 (3), 5718-5727.