Predicting Student Dropout in Higher Education: An Ensemble Learning Approach with Feature Importance Analysis
DOI:
https://doi.org/10.70619/vol5iss4pp31-40Keywords:
Student dropout, Grade Point Average, Logistic Regression, Random Forest, Hard voting, Soft votingAbstract
Student dropout in higher education remains a global challenge, particularly in developing regions where early interventions are hindered by reliance on traditional indicators like GPA or attendance. This study addresses the issue by proposing a predictive model using ensemble machine learning techniques, integrating Logistic Regression, Random Forest, and AdaBoost. These models were combined using soft and hard voting classifiers to enhance prediction accuracy and reliability. The dataset, comprising 4,424 student records, includes demographic, academic, and socio-economic features. Results showed that the soft voting ensemble achieved the highest accuracy (80.56%) and AUC (91%), outperforming individual classifiers. Feature importance analysis revealed academic performance, tuition status, and parental background as key predictors of dropout. The model not only identifies at-risk students with high precision but also offers actionable insights for early intervention. This approach equips higher learning institutions with data-driven strategies to improve retention and student success outcomes.
References
Aditya, M. (2018). Metrics to Evaluate your Machine Learning Algorithm. Retrieved March 30, 2025, from https://towardsdatascience.com/: https://towardsdatascience.com/metrics-to-evaluate-your-machine-learning-algorithm-f10ba6e38234
Bank, W. (2022). The State of Global Learning Poverty: 2022 Update. Retrieved April 25, 2025, from https://www.worldbank.org: https://www.worldbank.org/en/topic/education/publication/state-of-global-learning-poverty
Chauhan, N. S. (2020, May 28). Model Evaluation Metrics in Machine Learning. Retrieved March 28, 2025, from https://www.kdnuggets.com/: https://www.kdnuggets.com/2020/05/model-evaluation-metrics-machine-learning.html
Terra, J. (2024, Aug 13). What is a ROC Curve, and How Do You Use It in Performance Modeling? Retrieved from https://www.simplilearn.com/: https://www.simplilearn.com/what-is-a-roc-curve-and-how-to-use-it-in-performance-modeling-article
UNESCO. (2022). Higher education global data report. Retrieved April 24, 2025, from https://unesdoc.unesco.org/: https://unesdoc.unesco.org/ark:/48223/pf0000389859
UNESCO. (2023, September). Education Data Release 2023. Retrieved March 26, 2025, from https://uis.unesco.org/: https://uis.unesco.org/en/news/education-data-release
Assegie, T. A., Salau, A. O., Chhabra, G., Kaushik, K., & Braide, S. L. (2024). Evaluation of Random Forest and Support Vector Machine Models in Educational Data Mining. 2024 2nd International Conference on Advancement in Computation & Computer Technologies (InCACCT), 131–135. https://doi.org/10.1109/InCACCT61598.2024.10551110
Bako, H. S., Ambursa, F. U., Galadanci, B. S., & Garba, M. (2023). PREDICTING TIMELY GRADUATION OF POSTGRADUATE STUDENTS USING RANDOM FORESTS ENSEMBLE METHOD. FUDMA JOURNAL OF SCIENCES, 7(3), 177–185. https://doi.org/10.33003/fjs-2023-0703-1773
Bäulke, L., Grunschel, C., & Dresel, M. (2022). Student dropout at university: A phase-orientated view on quitting studies and changing majors. European Journal of Psychology of Education, 37(3), 853–876. https://doi.org/10.1007/s10212-021-00557-x
Chicco, D., Warrens, M. J., & Jurman, G. (2021). The Matthews Correlation Coefficient (MCC) is More Informative Than Cohen’s Kappa and Brier Score in Binary Classification Assessment. IEEE Access, 9, 78368–78381. https://doi.org/10.1109/ACCESS.2021.3084050
Glick, D., Cohen, A., & Chang, C. (Eds.). (2020). Early Warning Systems and Targeted Interventions for Student Success in Online Courses: IGI Global. https://doi.org/10.4018/978-1-7998-5074-8
Gonzalez-Nucamendi, A., Noguez, J., Neri, L., Robledo-Rella, V., & García-Castelán, R. M. G. (2023). Predictive analytics study to determine undergraduate students at risk of dropout. Frontiers in Education, 8, 1244686. https://doi.org/10.3389/feduc.2023.1244686
Islam, M., Islam, M. M., Ali, Md. S., Niloy, N. T., Chowdhury, A., & Avik, S. C. (2024). Ensemble Method for Predicting Student Performance and Dropout Risk. In J. K. Mandal, M. Hinchey, & S. Chakrabarti (Eds.), Recent Advances in Artificial Intelligence and Smart Applications (pp. 269–278). Springer Nature Singapore. https://doi.org/10.1007/978-981-97-3485-6_21
Kemper, L., Vorhoff, G., & Wigger, B. U. (2020). Predicting student dropout: A machine learning approach. European Journal of Higher Education, 10(1), 28–47. https://doi.org/10.1080/21568235.2020.1718520
Lorenzo-Quiles, O., Galdón-López, S., & Lendínez-Turón, A. (2023). Factors contributing to university dropout: A review. Frontiers in Education, 8, 1159864. https://doi.org/10.3389/feduc.2023.1159864
Neupane, B. (2024). Causes of Dropout in Higher Education: An Analysis of Student Dropouts in Bachelor of Education from Marsyangdi Multiple Campus. Marsyangdi Journal, 1–14. https://doi.org/10.3126/mj.v4i1.67750
Xiao, J., Li, Y., Xie, L., Liu, D., & Huang, J. (2018). A hybrid model based on selective ensemble for energy consumption forecasting in China. Energy, 159, 534–546. https://doi.org/10.1016/j.energy.2018.06.161
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Uwimana Olive, Musabe Jean Bosco , Nyesheja Muhire Enan

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.