Prediction the Choice of Financing for Start-ups using Machine Learning Algorithms and Behavioral Biases

Document Type : Original Article

Authors

Department of Industrial Engineering, Faculty of Engineering, University of Ferdowsi of Mashhad, Mashhad, Iran

Abstract

The aim of this paper is to predict financing methods to support decision-making for startup founders and their investors. Initially, factors influencing the choice of financing methods, including structural, demographic, and behavioral factors, were identified. These factors were then assessed using a questionnaire consisting of 32 items, which was sent online to startup founders. Based on 70 responses received and using algorithms including binary matching, classification chains, label power set, K-nearest neighbors, extreme gradient boosting, cluster boosting algorithm and random forest, the financing methods chosen by startups were predicted. Comparison of the results from the algorithms shows that the boosting ensemble algorithm, with an F1 score of 89 and precison of 85%, predicts the selected financing methods on the test dataset better than other algorithms. Additionally, data analysis indicates that startups are more inclined towards personal funding methods, which aligns with the prevalence of loss aversion bias among entrepreneurs. Following loss aversion, overconfidence, anchoring, and illusion of control biases were the most frequent among entrepreneurs.

Keywords

Main Subjects


  1. Azoulay, P., Jones, B. F., Kim, J. D., & Miranda, J. (2020). Age and high-growth entrepreneurship. American Economic Review: Insights, 2(1), 65-82. https://dx.doi.org/10.3386/w24489   

    Bailly, A., Blanc, C., Francis, É., Guillotin, T., Jamal, F., Wakim, B., & Roy, P. (2022). Effects of dataset size and interactions on the prediction performance of logistic regression and deep learning models. Computer Methods and Programs in Biomedicine, 213, 106504.  http://dx.doi.org/10.1016/j.cmpb.2021.106504 .

    Bazerman, M. H., & Moore, D. A. (2012). Judgment in managerial decision making. John Wiley & Sons. http://dx.doi.org/10.4324/9780203141939-11

    Bolarinwa, O. A. (2015). Principles and methods of validity and reliability testing of questionnaires used in social and health science researches. Nigerian Postgraduate Medical Journal, 22(4), 195. http://dx.doi.org/10.4103/1117-1936.173959

    Brownlee, J. (2020). Data preparation for machine learning: data cleaning, feature selection, and data transforms in Python. Machine Learning Mastery.

    Cassar, G. (2004). The financing of business start-ups. Journal of Business Venturing, 19(2), 261-283.   https://doi.org/10.1016/S0883-9026(03)00029-6

    Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y., Cho, H., Chen, K., Mitchell, R., Cano, I., & Zhou, T. (2015). Xgboost: extreme gradient boosting. R package version 0.4-2, 1(4), 1-4.  http://dx.doi.org/10.32614/cran.package.xgboost

    Ding, H., Sun, Y., Wang, Z., Huang, N., Shen, Z., & Cui, X. (2023). RGAN-EL: A GAN and ensemble learning-based hybrid approach for imbalanced data classification. Information Processing & Management, 60(2), 103235.  http://dx.doi.org/10.1016/j.ipm.2022.103235

    Dominic, C., & Gupta, A. (2020). Psychological factors affecting investors decision making. Journal of Xi’an University of Architecture and Technology, 7(6), 169-181.  http://dx.doi.org/10.55041/ijsrem30872 .

    Dong, X., Yu, Z., Cao, W., Shi, Y., & Ma, Q. (2020). A survey on ensemble learning. Frontiers of Computer Science, 14, 241-258 .  http://dx.doi.org/10.1007/s11704-019-8208 .

    Dorogush, A. V., Ershov, V., & Gulin, A. (2018). CatBoost: gradient boosting with categorical features support. arXiv preprint arXiv:1810.11363.  https://doi.org/10.48550/arXiv.1810.1136 .

    Elston, D. M. (2021). Survivorship bias. Journal of the American Academy of Dermatology.  http://dx.doi.org/10.1016/j.jaad.2021.06.84 .

    Fotouhi, S., Asadi, S., & Kattan, M. W. (2019). A comprehensive data level analysis for cancer diagnosis on imbalanced data. Journal of biomedical informatics, 90, 103089.  https://doi.org/10.1016/j.jbi.2018.12.003

    Franco, S., Cappa, F., & Pinelli, M. (2021). Founder Education and Start-Up Funds Raised. IEEE Engineering Management Review, 49(3), 42-48.  https://doi.org/10.1109/EMR.2021.3077966

    Frid, C. J., Wyman, D. M., Gartner, W. B., & Hechavarria, D. H. (2016). Low-wealth entrepreneurs and access to external financing. International Journal of Entrepreneurial Behavior & Research.  http://dx.doi.org/10.1108/ijebr-08-2015-0173

    Ganda, D., & Buch, R. (2018). A survey on multi label classification. Recent Trends in Programming Languages, 5(1), 19-23

    Gong, J., & Kim, H. (2017). RHSBoost: Improving classification performance in imbalance data. Computational Statistics & Data Analysis, 111, 1-13.  http://dx.doi.org/10.1016/j.csda.2017.01.005

    Goodfellow, I., Pouget-Abadie, J., Mirza, M., Xu, B., Warde-Farley, D., Ozair, S., Courville, A., & Bengio, Y. (2020). Generative adversarial networks. Communications of the ACM, 63(11), 139-144.  https://doi.org/10.1145/3422622

    Han, J., Kamber, M., & Pei, J. (2012). Data mining concepts and techniques third edition. University of Illinois at Urbana-Champaign Micheline Kamber Jian Pei Simon Fraser University.

    Hebert, C. (2020). Gender stereotypes and entrepreneur financing. 10th Miami Behavioral Finance Conference.

    Israel, G. D. (1992). Determining sample size.  https://dx.doi.org/10.2139/ssrn.3318245

    Jafari, R. (2022). Hands-On Data Preprocessing in Python: Learn How to Effectively Prepare Data for Successful Data Analytics. Packt Publishing. https://books.google.com/books?id=nzmnzgEACAAJ .

    Kahneman, D. (2011). Thinking, fast and slow. Macmillan.

    Krawezik, G. P., Kogge, P. M., Dysart, T. J., Kuntz, S. K., & McMahon, J. O. (2018). Implementing the jaccard index on the migratory memory-side processing emu architecture. 2018 IEEE High Performance extreme Computing Conference (HPEC).

    Langer, E. J. (1975). The illusion of control. Journal of personality and social psychology, 32(2), 311.  http://dx.doi.org/10.1037//0022-3514.32.2.311  

    Lybaert, N., & Umans, I. (2022). Start-up Performance: Looking for an Explanation in Entrepreneurial Characteristics and Financing Choice. European Conference on Innovation and Entrepreneurship,  https://doi.org/10.34190/ecie.17.1.833

    Marsland, S. (2014). Machine Learning: An Algorithmic Perspective, Second Edition. CRC Press. https://books.google.com/books?id=6GvSBQAAQBAJ .

    McCallum, Q. E. (2012). Bad data handbook: cleaning up the data so you can get back to work. " O'Reilly Media, Inc.".

    Myung, I. J. (2000). The importance of complexity in model selection. Journal of mathematical psychology, 44(1), 190-204.  https://doi.org/10.1006/jmps.1999.1283

    Ouimet, P., & Zarutskie, R. (2014). Who works for startups? The relation between firm age, employee age, and growth. Journal of financial Economics, 112(3), 386-407.  https://doi.org/10.1016/j.jfineco.2014.03.003

    Pavlov, Y. L. (2019). Random forests. In Random Forests. De Gruyter.

    Pushpa, M., & Karpagavalli, S. (2017). Multi-label classification: problem transformation methods in Tamil phoneme classification. Procedia Computer Science, 115, 572-579.  https://doi.org/10.1016/j.procs.2017.09.116  

    Ramalakshmi, V., Pathak, V. K., & Mary, C. (2019). Impact of Cognitive Biases on investment decision making. Journal of Critical Reviews, 6(6), 59-64

    Rivolli, A., Read, J., Soares, C., Pfahringer, B., & de Carvalho, A. C. (2020). An empirical analysis of binary transformation strategies and base algorithms for multi-label learning. Machine Learning, 109, 1509-1563.  https://doi.org/10.1007/s10994-020-05879-3

    Rosenfeld, A., & Kraus, S. (2018). Predicting human decision-making: From prediction to action. Synthesis lectures on artificial intelligence and machine learning, 12(1), 1-150.  https://doi.org/10.1007/978-3-031-01578-6_3

    Rosyidah, U., & Pratikto, H. (2022). The role of behavioral bias on financial decision making: a systematic literature review and future research agenda. Journal of Enterprise and Development (JED), 4(1), 156-179.  https://doi.org/10.20414/jed.v4i1.5102

    Simon, M., & Houghton, S. M. (2003). The relationship between overconfidence and the introduction of risky products: Evidence from a field study. Academy of management journal, 46(2), 139-149.  https://doi.org/10.5465/30040610  

    Tanha, J., Abdi, Y., Samadi, N., Razzaghi, N., & Asadpour, M. (2020). Boosting methods for multi-class imbalanced data classification: an experimental review. Journal of Big Data, 7, 1-47.  https://doi.org/ 10.1186/s40537-020-00349-y

    Tech, R. P. (2018). Financing high-tech startups. Springer.  https://doi.org/10.1007/978-3-319-66155-1

    Tsoumakas, G., Katakis, I., & Vlahavas, I. (2010). Mining multi-label data. Data mining and knowledge discovery handbook, 667-685.  https://doi.org/10.1007/978-0-387-09823-434

    Ul Abdin, S. Z., Qureshi, F., Iqbal, J., & Sultana, S. (2022). Overconfidence bias and investment performance: A mediating effect of risk propensity. Borsa Istanbul Review, 22(4), 780-793.  https://doi.org/10.1016/j.bir.2022.03.001

    Vo, D. H. (2019). Patents and Early‐Stage Financing: Matching versus Signaling. Journal of small business management, 57(4), 1252-1279.  https://doi.org/10.1111/jsbm.12414

    Wu, G., & Zhu, J. (2020). Multi-label classification: do Hamming loss and subset accuracy really conflict with each other? Advances in Neural Information Processing Systems, 33, 3130-3140.  https://doi.org/10.48550/arXiv.2011.07805  

    Zahera, S. A., & Bansal, R. (2018). Do investors exhibit behavioral biases in investment decision making? A systematic review. Qualitative Research in Financial Markets.  https://doi.org/10.1108/QRFM-04-2017-0028

    Zhang, S. X., & Cueto, J. (2017). The study of bias in entrepreneurship. Entrepreneurship theory and Practice, 41(3), 419-454.  https://doi.org/10.1111/etap.12212

    Zhang, S. X., Foo, M.-D., & Vassolo, R. S. (2021). The ramifications of effectuation on biases in entrepreneurship–Evidence from a mixed-method approach. Journal of Business Venturing Insights, 15, e00238.  https://doi.org/10.1016/j.jbvi.2021.e00238

    Zhang, Y., & Thorburn, P. J. (2022). Handling missing data in near real-time environmental monitoring: A system and a review of selected methods. Future Generation Computer Systems, 128, 63-72.  https://doi.org/10.1016/j.future.2021.09.033  

CAPTCHA Image