Breast Cancer Detection Using Ensemble Classifiers for Accuracy Improvement

Authors

1 Assistant Prof. faculty of Electrical and Computer, Qom University of Technology, Qom, Irany. Email: shamsi@qut.ac.ir

2 Msc. of Computer Engineering, Faculty of Electrical and Computer Engineering, Shahab Danesh University, Qom, Iran. Email: m.karimian90@gmail.com

3 Msc. of Computer Engineering, Faculty of Electrical and Computer Engineering, Shahab Danesh University, Qom, Iran. Email: m.karimian64@gmail.com

Abstract

Early diagnosis of breast cancer plays a crucial role in treating the patient. Nowadays, data mining algorithms can provide intelligent methods in the health and treatment system that accurately detect breast cancer. The purpose of this study is breast cancer detection using ensemble classifier based on WBC and WDBC prepared databasesa. Our proposed model in the WBC database (reducing features by cfs+ optimizing samples using Resample+ ensemble classifier using data mining algorithms (kstar + random forest + Naïve Bayes and Bayes network)) has the best detection accuracy ( 100%), implementation time (0 seconds) and without any errors and on the WDBC database (reducing features by cfs+ optimizing samples using Resample+ ensemble classifier using data mining algorithms (IBK algorithm+ Naïve Bayes, Bayes network and kstar)) has an accuracy of 99/29, the implementation time is 0 seconds, and the mean absolute error is 0/007. The results of this study show that according to the ensemble classifier methods using data mining algorithms on the prepared database, new systems can be designed to help physicians that facilitate treatment processes.

Keywords


Abdullah, M., Al-Anzi, F., & Al-Sharhan, S. (2018). Hybrid Multistage Fuzzy Clustering System for Medical Data Classification. Computing Sciences and Engineering (ICCSE), 2018 International Conference On, 1–6. IEEE. DOI: https://doi.org/10.1109/ICCSE1.2018.8374213
Adegoke, V. F., Chen, D., Banissi, E., & Barikzai, S. (2017). Prediction of breast cancer survivability using ensemble algorithms. Smart Systems and Technologies (SST), 2017 International Conference On, 223–231. IEEE. DOI: https://doi.org/10.1109/SST.2017.8188699
Alickovic, E., & Subasi, A. (2017). Breast cancer diagnosis using GA feature selection and Rotation Forest. Neural Computing and Applications, 28(4), 753–763. DOI: https://doi.org/10.1007/s00521-015-2103-9
Alyami, R., Alhajjaj, J., Alnajrani, B., Elaalami, I., Alqahtani, A., Aldhafferi, N., … Olatunji, S. O. (2017). Investigating the effect of Correlation based Feature Selection on breast cancer diagnosis using Artificial Neural Network and Support Vector Machines. Informatics, Health & Technology (ICIHT), International Conference On, 1–7. IEEE. DOI: https://doi.org/10.1109/ICIHT.2017.7899011
Ani, R., Jose, J., Wilson, M., & Deepa, O. S. (2018). Modified Rotation Forest Ensemble Classifier for Medical Diagnosis in Decision Support Systems. In Progress in Advanced Computing and Intelligent Engineering (pp. 137–146). Springer. DOI: https://doi.org/10.1016/j.jisa.2023.103541
Arach, S., & Bouden, H. (2019). Performance Analysis on Three Breast Cancer Datasets using Ensemble Classifiers Techniques. Computer Science, 14(4), 935–952. DOI: https://doi.org/10.1016/j.eswa.2023.122641
Avinash, K., Bijoy, M. B., & Jayaraj, P. B. (2020). Early Detection of Breast Cancer Using Support Vector Machine With Sequential Minimal Optimization. In Advanced Computing and Intelligent Engineering (pp. 13–24). Springer DOI: https://doi.org/10.1007/978-981-15-1081-6_2
Chaurasia, V., & Pal, S. (2014). Data mining techniques: to predict and resolve breast cancer survivability. International Journal of Computer Science and Mobile Computing IJCSMC, 3(1), 10–22.
Chaurasia, V., & Pal, S. (2017b). Performance analysis of data mining algorithms for diagnosis and prediction of heart and breast cancer disease.
Chawla, N. V, Japkowicz, N., & Kotcz, A. (2004). Special issue on learning from imbalanced data sets. ACM Sigkdd Explorations Newsletter, 6(1), 1–6. DOI: https://doi.org/10.1145/1007730.1007733
Cleary, J. G., & Trigg, L. E. (1995). K*: An Instance-based Learner Using an Entropic Distance Measure. ICML, 108–114. DOI: https://doi.org/10.1016/B978-1-55860-377-6.50022-0
El-Baz, A. H. (2015). Hybrid intelligent system-based rough set and ensemble classifier for breast cancer diagnosis. Neural Computing and Applications, 26(2), 437–446 DOI: https://doi.org/10.1007/s00521-014-1731-9
Fenton, N. E., & Ohlsson, N. (2000). Quantitative analysis of faults and failures in a complex software system. Software Engineering, IEEE Transactions On, 26(8), 797–814. DOI: https://doi.org/10.1109/32.879815  
Gbenga, D. E., Christopher, N., & Yetunde, D. C. (2017). Performance Comparison of Machine Learning Techniques for Breast Cancer Detection. Nova, 6(1), 1–8 DOI: https://doi.org/10.20286/nova-jeas-060105
Gupta, P., & Shalini, L. (2018). Analysis of Machine Learning Techniques for Breast Cancer Prediction. International Journal Of Engineering And Computer Science, 7(05), 23891–23895. DOI: https://doi.org/10.31033/ijemr.11.1.12
Hall, M. A. (1999). Correlation-based feature selection for machine learning. DOI: https://doi.org/10.4236/ojbm.2021.92030
Han, J., Pei, J., & Kamber, M. (2011). Data mining: concepts and techniques. Elsevier DOI: https://doi.org/10.4236/als.2019.74012
Hazra, A., Mandal, S. K., & Gupta, A. (2016). Study and Analysis of Breast Cancer Cell Detection using Naïve Bayes, SVM and Ensemble Algorithms. International Journal of Computer Applications, 145(2). DOI: https://doi.org/10.5120/ijca2016910595
Huang, M.-W., Chen, C.-W., Lin, W.-C., Ke, S.-W., & Tsai, C.-F. (2017). SVM and SVM ensembles in breast cancer prediction. PloS One, 12(1), e0161501 DOI: https://doi.org/10.1371/journal.pone.0161501
Jensen, F. V. (1996). An introduction to Bayesian networks (Vol. 210). UCL press London. DOI: https://doi.org/10.1016/j.ifacol.2018.07.024  
Joshi, A., & Mehta, A. (2018a). ANALYSIS OF K-NEAREST NEIGHBOR TECHNIQUE FOR BREAST CANCER DISEASE CLASSIFICATION. Machine Learning, 98, 13. DOI: https://doi.org/10.47611/jsrhs.v12i4.5577
Joshi, A., & Mehta, A. (2018b). BREAST CANCER DATA CLASSIFICATION USING NEURAL NETWORK AND DEEP NEURAL NETWORK TECHNIQUES. Int J Recent Sci Res, 9(4), 25788–25792. DOI: https://doi.org/10.1504/IJISDC.2020.10037864
Khuriwal, N., & Mishra, N. (2018). Breast cancer diagnosis using adaptive voting ensemble machine learning algorithm. 2018 IEEMA Engineer Infinite Conference (ETechNxT), 1–5. IEEE. DOI: https://doi.org/10.1109/ETECHNXT.2018.8385355  
Kittler, J., Hatef, M., Duin, R. P. W., & Matas, J. (1998). On combining classifiers. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(3), 226–239. DOI: https://doi.org/10.1109/34.667881  
Koru, A. G., & Liu, H. (2005). Building effective defect-prediction models in practice. Software, IEEE, 22(6), 23–29. DOI: https://doi.org/10.1109/MS.2005.149
Krawczyk, B. (2015). One-class classifier ensemble pruning and weighting with firefly algorithm. Neurocomputing, 150, 490–500. DOI: https://doi.org/10.1016/j.neucom.2014.07.068
Kumar, U. K., Nikhil, M. B. S., & Sumangali, K. (2017). Prediction of breast cancer using voting classifier technique. Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials (ICSTM), 2017 IEEE International Conference On, 108–114. IEEE   DOI: https://doi.org/10.1109/ICSTM.2017.8089135
Mandal, S. K. (2017). Performance Analysis Of Data Mining Algorithms For Breast Cancer Cell Detection Using Naïve Bayes, Logistic Regression and Decision Tree. International Journal Of Engineering And Computer Science, 6(2) DOI: https://doi.org/10.1088/1742-6596/1577/1/012051
Menzies, T., Greenwald, J., & Frank, A. (2007). Data mining static code attributes to learn defect predictors. Software Engineering, IEEE Transactions On, 33(1), 2–13. DOI: https://doi.org/10.1109/TSE.2007.256941  
Michalak, K., & Kwasnicka, H. (2006). Correlation-based feature selection strategy in neural classification. Intelligent Systems Design and Applications, 2006. ISDA’06. Sixth International Conference On, 1, 741–746. IEEE. DOI: https://doi.org/10.1109/ISDA.2006.128
Newman, D. J., Hettich, S., Blake, C. L., Merz, C. J., & Aha, D. W. (1998). UCI repository of machine learning databases. Department of Information and Computer Science, University of California, Irvine, CA. 1998 of Conference, Http://Archive. Ics. Uci. Edu/Ml/Datasets. Html. DOI: https://doi.org/10.4236/me.2013.410068
Nilashi, M., bin Ibrahim, O., Ahmadi, H., & Shahmoradi, L. (2017). An analytical method for diseases prediction using machine learning techniques. Computers & Chemical Engineering, 106, 212–223. DOI: https://doi.org/10.1016/j.compchemeng.2017.06.011
Peng, C.-Y. J., Harwell, M., Liou, S.-M., & Ehman, L. H. (2006). Advances in missing data methods and implications for educational research. Real Data Analysis, 3178 DOI: https://doi.org/10.1007/s42979-022-01249-z
Rachman, G. H., Khodra, M. L., & Widyantoro, D. H. (2017). Rhetorical Sentence Categorization for Scientific Paper Using Word2Vec Semantic 36Representation. Journal of Physics: Conference Series, 801(1), 12070. IOP Publishing DOI: https://doi.org/10.1088/1742-6596/801/1/012070
Rohan, T. I., Siddik, A. B., Islam, M., & Yusuf, M. S. U. (2019). A Precise Breast Cancer Detection Approach Using Ensemble of Random Forest with AdaBoost. 2019 International Conference on Computer, Communication, Chemical, Materials and Electronic Engineering (IC4ME2), 1–4. IEEE. DOI: https://doi.org/10.1109/IC4ME247184.2019.9036697
Salama, G. I., Abdelhalim, M., & Zeid, M. A. (2012). Breast cancer diagnosis on three different datasets using multi-classifiers. Breast Cancer (WDBC), 32(569), 2
Siegel, R. L., Miller, K. D., & Jemal, A. (2017). Cancer statistics, 2017. CA: A Cancer Journal for Clinicians, 67(1), 7–30. DOI: https://doi.org/10.3322/caac.21387
Teh, Y.-C., Tan, G.-H., Taib, N. A., Rahmat, K., Westerhout, C. J., Fadzli, F., … Yip, C.-H. (2015). Opportunistic mammography screening provides effective detection rates in a limited resource healthcare system. BMC Cancer, 15(1), 405 DOI: https://doi.org/10.1186/s12885-015-1419-2
West, D., Mangiameli, P., Rampal, R., & West, V. (2005). Ensemble strategies for a medical diagnostic decision support system: A breast cancer diagnosis application. European Journal of Operational Research, 162(2), 532–551 DOI: https://doi.org/10.1016/j.ejor.2003.10.013
Witten, I. H., & Frank, E. (2005). Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann ISBN:978-0-12-374856-0  
Wozniak, M., Grana, M., & Corchado, E. (2014). A survey of multiple classifier systems as hybrid systems. Information Fusion, 16, 3–17. DOI: https://doi.org/10.1016/j.inffus.2013.04.006
Zhang, H., & Su, J. (2008). Naive Bayes for optimal ranking. Journal of Experimental & Theoretical Artificial Intelligence, 20(2), 79–93 DOI: https://doi.org/10.1080/09528130701476391
CAPTCHA Image