Predicting Student Dropout in Higher Education: An Ensemble Learning Approach with Feature Importance Analysis

Authors

  • Uwimana Olive University of Lay Adventists of Kigali (UNILAK)
  • Musabe Jean Bosco Kigali Independent University
  • Nyesheja Muhire Enan University of Lay Adventists of Kigali (UNILAK)

DOI:

https://doi.org/10.70619/vol5iss4pp31-40

Keywords:

Student dropout, Grade Point Average, Logistic Regression, Random Forest, Hard voting, Soft voting

Abstract

Student dropout in higher education remains a global challenge, particularly in developing regions where early interventions are hindered by reliance on traditional indicators like GPA or attendance. This study addresses the issue by proposing a predictive model using ensemble machine learning techniques, integrating Logistic Regression, Random Forest, and AdaBoost. These models were combined using soft and hard voting classifiers to enhance prediction accuracy and reliability. The dataset, comprising 4,424 student records, includes demographic, academic, and socio-economic features. Results showed that the soft voting ensemble achieved the highest accuracy (80.56%) and AUC (91%), outperforming individual classifiers. Feature importance analysis revealed academic performance, tuition status, and parental background as key predictors of dropout. The model not only identifies at-risk students with high precision but also offers actionable insights for early intervention. This approach equips higher learning institutions with data-driven strategies to improve retention and student success outcomes.

Author Biographies

Uwimana Olive, University of Lay Adventists of Kigali (UNILAK)

Faculty of Computing and Information Sciences

Musabe Jean Bosco , Kigali Independent University

School of Science & Technology

References

Aditya, M. (2018). Metrics to Evaluate your Machine Learning Algorithm. Retrieved March 30, 2025, from https://towardsdatascience.com/: https://towardsdatascience.com/metrics-to-evaluate-your-machine-learning-algorithm-f10ba6e38234

Bank, W. (2022). The State of Global Learning Poverty: 2022 Update. Retrieved April 25, 2025, from https://www.worldbank.org: https://www.worldbank.org/en/topic/education/publication/state-of-global-learning-poverty

Chauhan, N. S. (2020, May 28). Model Evaluation Metrics in Machine Learning. Retrieved March 28, 2025, from https://www.kdnuggets.com/: https://www.kdnuggets.com/2020/05/model-evaluation-metrics-machine-learning.html

Terra, J. (2024, Aug 13). What is a ROC Curve, and How Do You Use It in Performance Modeling? Retrieved from https://www.simplilearn.com/: https://www.simplilearn.com/what-is-a-roc-curve-and-how-to-use-it-in-performance-modeling-article

UNESCO. (2022). Higher education global data report. Retrieved April 24, 2025, from https://unesdoc.unesco.org/: https://unesdoc.unesco.org/ark:/48223/pf0000389859

UNESCO. (2023, September). Education Data Release 2023. Retrieved March 26, 2025, from https://uis.unesco.org/: https://uis.unesco.org/en/news/education-data-release

Assegie, T. A., Salau, A. O., Chhabra, G., Kaushik, K., & Braide, S. L. (2024). Evaluation of Random Forest and Support Vector Machine Models in Educational Data Mining. 2024 2nd International Conference on Advancement in Computation & Computer Technologies (InCACCT), 131–135. https://doi.org/10.1109/InCACCT61598.2024.10551110

Bako, H. S., Ambursa, F. U., Galadanci, B. S., & Garba, M. (2023). PREDICTING TIMELY GRADUATION OF POSTGRADUATE STUDENTS USING RANDOM FORESTS ENSEMBLE METHOD. FUDMA JOURNAL OF SCIENCES, 7(3), 177–185. https://doi.org/10.33003/fjs-2023-0703-1773

Bäulke, L., Grunschel, C., & Dresel, M. (2022). Student dropout at university: A phase-orientated view on quitting studies and changing majors. European Journal of Psychology of Education, 37(3), 853–876. https://doi.org/10.1007/s10212-021-00557-x

Chicco, D., Warrens, M. J., & Jurman, G. (2021). The Matthews Correlation Coefficient (MCC) is More Informative Than Cohen’s Kappa and Brier Score in Binary Classification Assessment. IEEE Access, 9, 78368–78381. https://doi.org/10.1109/ACCESS.2021.3084050

Glick, D., Cohen, A., & Chang, C. (Eds.). (2020). Early Warning Systems and Targeted Interventions for Student Success in Online Courses: IGI Global. https://doi.org/10.4018/978-1-7998-5074-8

Gonzalez-Nucamendi, A., Noguez, J., Neri, L., Robledo-Rella, V., & García-Castelán, R. M. G. (2023). Predictive analytics study to determine undergraduate students at risk of dropout. Frontiers in Education, 8, 1244686. https://doi.org/10.3389/feduc.2023.1244686

Islam, M., Islam, M. M., Ali, Md. S., Niloy, N. T., Chowdhury, A., & Avik, S. C. (2024). Ensemble Method for Predicting Student Performance and Dropout Risk. In J. K. Mandal, M. Hinchey, & S. Chakrabarti (Eds.), Recent Advances in Artificial Intelligence and Smart Applications (pp. 269–278). Springer Nature Singapore. https://doi.org/10.1007/978-981-97-3485-6_21

Kemper, L., Vorhoff, G., & Wigger, B. U. (2020). Predicting student dropout: A machine learning approach. European Journal of Higher Education, 10(1), 28–47. https://doi.org/10.1080/21568235.2020.1718520

Lorenzo-Quiles, O., Galdón-López, S., & Lendínez-Turón, A. (2023). Factors contributing to university dropout: A review. Frontiers in Education, 8, 1159864. https://doi.org/10.3389/feduc.2023.1159864

Neupane, B. (2024). Causes of Dropout in Higher Education: An Analysis of Student Dropouts in Bachelor of Education from Marsyangdi Multiple Campus. Marsyangdi Journal, 1–14. https://doi.org/10.3126/mj.v4i1.67750

Xiao, J., Li, Y., Xie, L., Liu, D., & Huang, J. (2018). A hybrid model based on selective ensemble for energy consumption forecasting in China. Energy, 159, 534–546. https://doi.org/10.1016/j.energy.2018.06.161

Downloads

Published

2025-06-07

How to Cite

Olive, U. ., Bosco , M. J. ., & Enan, N. M. . (2025). Predicting Student Dropout in Higher Education: An Ensemble Learning Approach with Feature Importance Analysis. Journal of Information and Technology, 5(4), 31–40. https://doi.org/10.70619/vol5iss4pp31-40

Issue

Section

Articles