Bird, J. B., Olvet, D. M., Willey, J. M. & Brenner, J. M. A generalizable approach to predicting performance on USMLE Step 2 CK. Adv Med Educ Pract. 13, 939–944 (2022).
Holmboe, E. S., Sherbino, J., Long, D. M., Swing, S. R. & Frank, J. R. The role of assessment in competency-based medical education. Med Teach. 32(8), 676–682 (2010).
Assessment methods in medical education. Int J Health Sci (Qassim). 2008;2(2):3–7.
Maholtz, D. E., Erickson, M. J. & Cymet, T. Comprehensive osteopathic medical licensing examination-USA level 1 and level 2-cognitive evaluation preparation and outcomes. J. Am. Osteopath. Assoc. 115(4), 232–235 (2015).
Kortz, M. W., Kongs, B. M., Bisesi, D. R., Roffler, M. & Sheehy, R. M. A retrospective and correlative analysis of academic and nonacademic predictors of COMLEX Level 1 performance. J. Am. Osteopath. Assoc. 122(4), 187–194 (2022).
DeMuth, R. H., Gold, J. G., Mavis, B. E. & Wagner, D. P. Progress on a new kind of progress test: assessing medical students’ clinical skills. Acad. Med. 93(5), 724–728 (2018).
Colbert-Getz, J. M. et al. Measuring assessment quality with an assessment utility rubric for medical education. MedEdPORTAL. 13, 10588 (2017).
Kamalov, F., Santandreu Calonge, D. & Gurrib, I. New era of artificial intelligence in education: Towards a sustainable multifaceted revolution. Sustainability. 15(16), 12451 (2023).
Eskinat, A. & Teker, S. Rising value of data in contemporary higher education. PressAcademia Procedia. 20(1), 41–46 (2024).
Tapalova, O. & Zhiyenbayeva, N. Artificial intelligence in education: AIEd for personalised learning pathways. Electronic Journal of e-Learning. 20(5), 639–653 (2022).
Asif, R., Merceron, A., Ali, S. A. & Haider, N. G. Analyzing undergraduate students’ performance using educational data mining. Comput. Educ. 113, 177–194 (2017).
Ahuja, R., Jha, A., Maurya, R., Srivastava, R., editors. Analysis of educational data mining. Harmony Search and Nature Inspired Optimization Algorithms: Theory and Applications, ICHSA 2018;: Springer (2019).
Salloum, S. A., Alshurideh, M., Elnagar, A., Shaalan, K., editors. Mining in educational data: review and future directions. Proceedings of the International Conference on Artificial Intelligence and Computer Vision (AICV2020) Springer (2020).
Mastour, H. et al. Prediction of medical sciences students’ performance on high-stakes examinations using machine learning models: a protocol for a systematic review. BMJ Open 13(5), e064956 (2023).
Psyridou, M. et al. Machine learning predicts upper secondary education dropout as early as the end of primary school. Sci. Rep. 14(1), 12956 (2024).
Rebelo Marcolino, M. et al. Student dropout prediction through machine learning optimization: Insights from moodle log data. Sci. Rep. 15(1), 9840 (2025).
Maalouf, M. Logistic regression in data analysis: An overview. Int. J. Data Anal. Tech. Strategies 3(3), 281–299 (2011).
Kramer, O. Dimensionality reduction with unsupervised nearest neighbors (Springer, 2013).
Lingras, P. & Butz, C. Rough set based 1-v-1 and 1-v-r approaches to support vector machine multi-classification. Inf. Sci. 177(18), 3782–3798 (2007).
Park, Y.-S. & Lek, S. Artificial neural networks: multilayer perceptron for ecological modeling. In Developments in environmental modelling (ed. Ra, K.) 123–140 (Elsevier, 2016).
Myles, A. J., Feudale, R. N., Liu, Y., Woody, N. A. & Brown, S. D. An introduction to decision tree modeling. J. Chemom. 18(6), 275–285 (2004).
Webb, G. I., Keogh, E. & Miikkulainen, R. Naïve Bayes. Encyclopedia of machine learning. 15, 713–714 (2010).
Sheppard, C. Tree-based Machine Learning Algorithms: Decision Trees, Random Forests, and Boosting: CreateSpace Independent Publishing Platform (2017).
Qu, W., Sui, H., Yang, B. & Qian, W. Improving protein secondary structure prediction using a multi-modal BP method. Comput. Biol. Med. 41(10), 946–959 (2011).
Chen, T. et al. Xgboost: Extreme gradient boosting. R package version 04-2 1(4), 1–4 (2015).
Pavlyshenko, B., editor Using Stacking Approaches for Machine Learning Models. 2018 IEEE Second International Conference on Data Stream Mining & Processing (DSMP); (2018).
Guleria, P. & Sood, M. Explainable AI and machine learning: performance evaluation and explainability of classifiers on educational data mining inspired career counseling. Educ. Inf. Technol. 28(1), 1081–1116 (2023).
Jang, Y., Choi, S., Jung, H. & Kim, H. Practical early prediction of students’ performance using machine learning and eXplainable AI. Educ. Inf. Technol. 27(9), 12855–12889 (2022).
Kar, S. P., Das, A. K., Chatterjee, R., Mandal, J. K. Assessment of learning parameters for students’ adaptability in online education using machine learning and explainable AI. Education and Information Technologies.1–16 (2023).
Lundberg, S. M, Lee, S.-I. A unified approach to interpreting model predictions. Advances in neural information processing systems. 30 (2017).
Merrick, L., Taly, A., editors. The explanation game: Explaining machine learning models using shapley values. Machine Learning and Knowledge Extraction: 4th IFIP TC 5, TC 12, WG 84, WG 89, WG 129 International Cross-Domain Conference, CD-MAKE 2020, Dublin, Ireland, August 25–28, 2020, Proceedings 4 Springer, (2020).
Nagy, M. & Molontay, R. Interpretable dropout prediction: Towards XAI-based personalized intervention. Int. J. Artific. Intell. Edu. 2023, 1–27 (2023).
Mastour, H., Dehghani, T., Moradi, E., Eslami, S. Early prediction of medical students’ performance in high-stakes examinations using machine learning approaches. Heliyon. (2023).
Tarik, A., Aissa, H. & Yousef, F. Artificial intelligence and machine learning to predict student performance during the COVID-19. Procedia Comput. Sci. 184, 835–840 (2021).
Qahmash, A., Ahmad, N. & Algarni, A. Investigating students’ pre-university admission requirements and their correlation with academic performance for medical students: An educational data mining approach. Brain Sci. 13(3), 456 (2023).
Tomasevic, N., Gvozdenovic, N. & Vranes, S. An overview and comparison of supervised data mining techniques for student exam performance prediction. Comput. Edu. 143, 103676 (2020).
Niyogisubizo, J., Liao, L., Nziyumva, E., Murwanashyaka, E. & Nshimyumukiza, P. C. Predicting student’s dropout in university classes using two-layer ensemble machine learning approach: A novel stacked generalization. Comput. Edu. Artific. Intell. 3, 100066 (2022).
Alhazmi, E. & Sheneamer, A. Early predicting of students performance in higher education. IEEE Access. 11, 27579–27589 (2023).
Chen, Y. & Zhai, L. A comparative study on student performance prediction using machine learning. Edu. Inform. Technol. 2023, 1–19 (2023).
Collins, G. S., Reitsma, J. B., Altman, D. G. & Moons, K. G. M. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): The TRIPOD statement. BMC Med. 13(1), 1 (2015).
Akoglu, H. User’s guide to correlation coefficients. Turk J Emerg Med. 18(3), 91–93 (2018).
Van Hulse, J., Khoshgoftaar, T. M., Napolitano, A., editors. An empirical comparison of repetitive undersampling techniques. 2009 IEEE international conference on information reuse & integration IEEE (2009).
He, H., Ma, Y. Imbalanced learning: foundations, algorithms, and applications. (2013).
Kaur, P. & Gosain, A. Comparing the Behavior of Oversampling and Undersampling Approach of Class Imbalance Learning by Combining Class Imbalance Problem with Noise. In ICT Based Innovations (Springer, Singapore, 2018).
Ferri, F. J., Albert, J. V. & Vidal, E. Considerations about sample-size sensitivity of a family of edited nearest-neighbor rules. IEEE Trans. Syst. Man Cybern. B Cybern. 29(5), 667–672 (1999).
Alejo, R., Sotoca. J. M., Valdovinos, R. M., Toribio, P., editors. Edited nearest neighbor rule for improving neural networks classifications. Advances in Neural Networks-ISNN 2010: 7th International Symposium on Neural Networks, ISNN 2010, Shanghai, China, June 6–9, 2010, Proceedings, Part I 7 Springer (2010).
Yang, F. et al. A hybrid sampling algorithm combining synthetic minority over-sampling technique and edited nearest neighbor for missed abortion diagnosis. BMC Med. Inform. Decis. Mak. 22(1), 1–14 (2022).
Fernández, A., García, S., Herrera, F. & Chawla, N. V. SMOTE for learning from imbalanced data: Progress and challenges, marking the 15-year anniversary. J. Artific. Intell. Res. 61, 863–905 (2018).
Wongvorachan, T., He, S. & Bulut, O. A comparison of undersampling, oversampling, and SMOTE methods for dealing with imbalanced classification in educational data mining. Information 14(1), 54 (2023).
Zeng, M., Zou, B., Wei, F., Liu, X., Wang, L. Effective prediction of three common diseases by combining SMOTE with Tomek links technique for imbalanced medical data. Proceedings of 2016 IEEE International Conference of Online Analysis and Computing Science, ICOACS 2016. 2016:225–8.
Ying, C., Qi-Guang, M., Jia-Chen, L. & Lin, G. Advance and prospects of AdaBoost algorithm. Acta Auto. Sinica. 39(6), 745–758 (2013).
Jiao, Y. & Du, P. Performance measures in evaluating machine learning based bioinformatics predictors for classifications. Quantitat. Biol. 4(4), 320–330 (2016).
Vuk, M. & Curk, T. ROC curve, lift chart and calibration plot. Metodol. Zvezki. 3(1), 89–108 (2006).
Boyd, K., Eng, K. H., Page, C. D. Area under the precision-recall curve: point estimates and confidence intervals. InJoint European conference on machine learning and knowledge discovery in databases 2013 Sep 22 (pp 451–466) Springer, Berlin, Heidelberg. (2013).
Davis, J., Goadrich, M. The relationship between Precision-Recall and ROC curves. InProceedings of the 23rd international conference on Machine learning 2006 Jun 25 pp 233–240 (2006).
Saito, T. & Rehmsmeier, M. The precision-recall plot is more informative than the ROC plot when evaluating binary classifiers on imbalanced datasets. PLoS ONE 10(3), 118432 (2015).
Pannakkong, W., Thiwa-Anont, K., Singthong, K., Parthanadee, P., Buddhakulsomsiri, J. Hyperparameter Tuning of Machine Learning Algorithms Using Response Surface Methodology: A Case Study of ANN, SVM, and DBN. Mathematical Problems in Engineering 2022:8513719 (2022).
Štrumbelj, E. & Kononenko, I. Explaining prediction models and individual predictions with feature contributions. Knowledge and Inform. Syst. 41(3), 647–665 (2014).
Schmidt, A. E. An approximation of a hierarchical logistic regression model used to establish the predictive validity of scores on a nursing licensure exam. Educ. Psychol. Measur. 60(3), 463–478 (2000).
Zhong, Q. et al. Early prediction of the risk of scoring lower than 500 on the COMLEX 1. BMC Med. Edu. 21(1), 70 (2021).
Himelfarb, I., Shotts, B. L. & Gow, A. R. Examining the validity of chiropractic grade point averages for predicting national board of chiropractic examiners Part I exam scores. J. Chiropr. Educ. 36(1), 1–12 (2022).






