Prediction the Choice of Financing for Start-ups using Machine Learning Algorithms and Behavioral Biases

Document Type : Original Article

Authors

Department of Industrial Engineering, Faculty of Engineering, University of Ferdowsi of Mashhad, Mashhad, Iran

10.22091/jemsc.2024.11203.1200

Abstract

The aim of this paper is to predict financing methods to support decision-making for startup founders and their investors. Initially, factors influencing the choice of financing methods, including structural, demographic, and behavioral factors, were identified. These factors were then assessed using a questionnaire consisting of 32 items, which was sent online to startup founders. Based on 70 responses received and using algorithms including binary matching, classification chains, label power set, K-nearest neighbors, extreme gradient boosting, cluster boosting algorithm and random forest, the financing methods chosen by startups were predicted. Comparison of the results from the algorithms shows that the boosting ensemble algorithm, with an F1 score of 89 and precison of 85%, predicts the selected financing methods on the test dataset better than other algorithms. Additionally, data analysis indicates that startups are more inclined towards personal funding methods, which aligns with the prevalence of loss aversion bias among entrepreneurs. Following loss aversion, overconfidence, anchoring, and illusion of control biases were the most frequent among entrepreneurs.

Keywords

Main Subjects


  1. Azoulay, P., Jones, B. F., Kim, J. D., & Miranda, J. (2020). Age and high-growth entrepreneurship. American Economic Review: Insights, 2(1), 65-82. https://dx.doi.org/10.3386/w24489
  2. Bailly, A., Blanc, C., Francis, É., Guillotin, T., Jamal, F., Wakim, B., & Roy, P. (2022). Effects of dataset size and interactions on the prediction performance of logistic regression and deep learning models. Computer Methods and Programs in Biomedicine, 213, 106504. http://dx.doi.org/10.1016/j.cmpb.2021.106504.
  3. Bazerman, M. H., & Moore, D. A. (2012). Judgment in managerial decision making. John Wiley & Sons. http://dx.doi.org/10.4324/9780203141939-11
  4. Bolarinwa, O. A. (2015). Principles and methods of validity and reliability testing of questionnaires used in social and health science researches. Nigerian Postgraduate Medical Journal, 22(4), 195. http://dx.doi.org/10.4103/1117-1936.173959
  5. Brownlee, J. (2020). Data preparation for machine learning: data cleaning, feature selection, and data transforms in Python. Machine Learning Mastery.
  6. Cassar, G. (2004). The financing of business start-ups. Journal of Business Venturing, 19(2), 261-283. https://doi.org/10.1016/S0883-9026(03)00029-6
  7. Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y., Cho, H., Chen, K., Mitchell, R., Cano, I., & Zhou, T. (2015). Xgboost: extreme gradient boosting. R package version 0.4-2, 1(4), 1-4. http://dx.doi.org/10.32614/cran.package.xgboost
  8. Ding, H., Sun, Y., Wang, Z., Huang, N., Shen, Z., & Cui, X. (2023). RGAN-EL: A GAN and ensemble learning-based hybrid approach for imbalanced data classification. Information Processing & Management, 60(2), 103235. http://dx.doi.org/10.1016/j.ipm.2022.103235
  9. Dominic, C., & Gupta, A. (2020). Psychological factors affecting investors decision making. Journal of Xi’an University of Architecture and Technology, 7(6), 169-181. http://dx.doi.org/10.55041/ijsrem30872.
  10. Dong, X., Yu, Z., Cao, W., Shi, Y., & Ma, Q. (2020). A survey on ensemble learning. Frontiers of Computer Science, 14, 241-258 . http://dx.doi.org/ 10.1007/s11704-019-8208-.
  11. Dorogush, A. V., Ershov, V., & Gulin, A. (2018). CatBoost: gradient boosting with categorical features support. arXiv preprint arXiv:1810.11363. https://doi.org/10.48550/arXiv.1810.1136.
  12. Elston, D. M. (2021). Survivorship bias. Journal of the American Academy of Dermatology. http://dx.doi.org/10.1016/j.jaad.2021.06.84.
  13. Fotouhi, S., Asadi, S., & Kattan, M. W. (2019). A comprehensive data level analysis for cancer diagnosis on imbalanced data. Journal of biomedical informatics, 90, 103089. https://doi.org/10.1016/j.jbi.2018.12.003
  14. Franco, S., Cappa, F., & Pinelli, M. (2021). Founder Education and Start-Up Funds Raised. IEEE Engineering Management Review, 49(3), 42-48. https://doi.org/10.1109/EMR.2021.3077966
  15. Frid, C. J., Wyman, D. M., Gartner, W. B., & Hechavarria, D. H. (2016). Low-wealth entrepreneurs and access to external financing. International Journal of Entrepreneurial Behavior & Research. http://dx.doi.org/10.1108/ijebr-08-2015-0173
  16. Ganda, D., & Buch, R. (2018). A survey on multi label classification. Recent Trends in Programming Languages, 5(1), 19-23
  17. Gong, J., & Kim, H. (2017). RHSBoost: Improving classification performance in imbalance data. Computational Statistics & Data Analysis, 111, 1-13. http://dx.doi.org/ 10.1016/j.csda.2017.01.005
  18. Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2020). Generative adversarial networks. Communications of the ACM, 63(11), 139-144.
  19. https://doi.org/10.1145/3422622
  20. Han, J., Kamber, M., & Pei, J. (2012). Data mining concepts and techniques third edition. University of Illinois at Urbana-Champaign Micheline Kamber Jian Pei Simon Fraser University.
  21. Hebert, C. (2020). Gender stereotypes and entrepreneur financing. 10th Miami Behavioral Finance Conference.
  22. Israel, G. D. (1992). Determining sample size. https://dx.doi.org/10.2139/ssrn.3318245
  23. Jafari, R. (2022). Hands-On Data Preprocessing in Python: Learn How to Effectively Prepare Data for Successful Data Analytics. Packt Publishing. https://books.google.com/books?id=nzmnzgEACAAJ .
  24. Kahneman, D. (2011). Thinking, fast and slow. Macmillan.
  25. Krawezik, G. P., Kogge, P. M., Dysart, T. J., Kuntz, S. K., & McMahon, J. O. (2018). Implementing the jaccard index on the migratory memory-side processing emu architecture. 2018 IEEE High Performance extreme Computing Conference (HPEC).
  26. Langer, E. J. (1975). The illusion of control. Journal of personality and social psychology, 32(2), 311. http://dx.doi.org/10.1037//0022-3514.32.2.311
  27. Lybaert, N., & Umans, I. (2022). Start-up Performance: Looking for an Explanation in Entrepreneurial Characteristics and Financing Choice. European Conference on Innovation and Entrepreneurship, https://doi.org/10.34190/ecie.17.1.833
  28. Marsland, S. (2014). Machine Learning: An Algorithmic Perspective, Second Edition. CRC Press. https://books.google.com/books?id=6GvSBQAAQBAJ .
  29. McCallum, Q. E. (2012). Bad data handbook: cleaning up the data so you can get back to work. " O'Reilly Media, Inc.".
  30. Myung, I. J. (2000). The importance of complexity in model selection. Journal of mathematical psychology, 44(1), 190-204. https://doi.org/10.1006/jmps.1999.1283
  31. Ouimet, P., & Zarutskie, R. (2014). Who works for startups? The relation between firm age, employee age, and growth. Journal of financial Economics, 112(3), 386-407. https://doi.org/10.1016/j.jfineco.2014.03.003
  32. Pavlov, Y. L. (2019). Random forests. In Random Forests. De Gruyter.
  33. Pushpa, M., & Karpagavalli, S. (2017). Multi-label classification: problem transformation methods in Tamil phoneme classification. Procedia Computer Science, 115, 572-579. https://doi.org/10.1016/j.procs.2017.09.116
  34. Ramalakshmi, V., Pathak, V. K., & Mary, C. (2019). Impact of Cognitive Biases on investment decision making. Journal of Critical Reviews, 6(6), 59-64
  35. Rivolli, A., Read, J., Soares, C., Pfahringer, B., & de Carvalho, A. C. (2020). An empirical analysis of binary transformation strategies and base algorithms for multi-label learning. Machine Learning, 109, 1509-1563. https://doi.org/ 10.1007/s10994-020-05879-3
  36. Rosenfeld, A., & Kraus, S. (2018). Predicting human decision-making: From prediction to action. Synthesis lectures on artificial intelligence and machine learning, 12(1), 1-150. https://doi.org/ 10.1007/978-3-031-01578-6_3
  37. Rosyidah, U., & Pratikto, H. (2022). The role of behavioral bias on financial decision making: a systematic literature review and future research agenda. Journal of Enterprise and Development (JED), 4(1), 156-179. https://doi.org/10.20414/jed.v4i1.5102
  38. Simon, M., & Houghton, S. M. (2003). The relationship between overconfidence and the introduction of risky products: Evidence from a field study. Academy of management journal, 46(2), 139-149. https://doi.org/10.5465/30040610
  39. Tanha, J., Abdi, Y., Samadi, N., Razzaghi, N., & Asadpour, M. (2020). Boosting methods for multi-class imbalanced data classification: an experimental review. Journal of Big Data, 7, 1-47. https://doi.org/ 10.1186/s40537-020-00349-y
  40. Tech, R. P. (2018). Financing high-tech startups. Springer. https://doi.org/ 10.1007/978-3-319-66155-1
  41. Tsoumakas, G., Katakis, I., & Vlahavas, I. (2010). Mining multi-label data. Data mining and knowledge discovery handbook, 667-685. https://doi.org/ 10.1007/978-0-387-09823-4_34
  42. Ul Abdin, S. Z., Qureshi, F., Iqbal, J., & Sultana, S. (2022). Overconfidence bias and investment performance: A mediating effect of risk propensity. Borsa Istanbul Review, 22(4), 780-793. https://doi.org/10.1016/j.bir.2022.03.001

Vo, D. H. (2019). Patents and Early‐Stage Financing: Matching versus Signaling. Journal of small business management, 57(4), 1252-1279.  https://doi.org/10.1111/jsbm.12414

CAPTCHA Image