Research Article
BibTex RIS Cite

Eğitim Sektöründe Öğrenci Terkinin Tahmini: Makine Öğrenmesi Tabanlı Karşılaştırmalı Bir Analiz

Year 2026, Volume: 18 Issue: 1, 91 - 104, 26.03.2026
https://doi.org/10.52791/aksarayiibd.1872176
https://izlik.org/JA78RH53YC

Abstract

Yükseköğretimde öğrencilerin akademik başarısızlık yaşamaları ve okuldan ayrılmaları, bireysel eğitim süreçlerinin yanı sıra kurumların akademik etkinliği ve ekonomik sürdürülebilirliği üzerinde etkili olan küresel sorunlar arasında yer almaktadır. Bu çalışma, öğrencilerin akademik başarı durumları ile okul terk etme eğilimlerinin tahmin edilmesinde kullanılan farklı makine öğrenmesi algoritmalarının performanslarını karşılaştırmalı olarak incelemeyi amaçlamaktadır. Araştırmada Kaggle platformundan temin edilen açık kaynaklı bir veri seti kullanılmış olup veri seti 4.424 gözlem ve 36 değişkenden oluşmaktadır. Lojistik regresyon, karar ağacı, rastgele orman ve XGBoost algoritmaları kullanılarak sınıflandırma modelleri geliştirilmiştir. Modellerin performansı doğruluk, kesinlik, duyarlılık ve F1-skoru ölçütleri temelinde değerlendirilmiştir. Bulgular, lojistik regresyon modelinin %91 doğruluk ve %90 F1-skoru ile güçlü bir temel performans sunduğunu göstermektedir. Karar ağacı modeli %90,63 doğruluk ve %89,37 F1-skoru ile yorumlanabilir olmakla birlikte görece daha sınırlı bir başarı sergilemiştir. Buna karşılık, topluluk öğrenmesine dayalı rastgele orman ve XGBoost modelleri %92 doğruluk ve %92 düzeyinde ağırlıklı F1-skoru değerlerine ulaşarak daha dengeli sonuçlar ortaya koymuştur. Özellikle bu modellerin yüksek duyarlılık değerleri, akademik risk altındaki öğrencilerin erken aşamada belirlenmesine olanak sağlamaktadır. Çalışma sonuçları, makine öğrenmesine dayalı karşılaştırmalı yaklaşımların yükseköğretimde erken uyarı sistemleri ve veri temelli karar destek mekanizmalarının geliştirilmesine katkı sağlayabileceğini göstermektedir.

Ethical Statement

Bu çalışma, kamuya açık ve anonimleştirilmiş ikincil veriler kullanılarak gerçekleştirilmiştir. Araştırmada kullanılan veri seti, Kaggle platformu üzerinden açık erişim olarak temin edilmiş olup, bireylere ait herhangi bir kişisel veya tanımlayıcı bilgi içermemektedir. Bu nedenle çalışma, insan katılımcılar üzerinde doğrudan veri toplama süreci içermemekte ve etik kurul onayı gerektirmemektedir. Araştırma sürecinin tüm aşamaları, bilimsel araştırma ve yayın etiği ilkelerine uygun olarak yürütülmüştür.

References

  • Aina, C., Baici, E., Casalone, G., & Pastore, F. (2022). The determinants of university dropout: A review of the socio-economic literature. Socio-Economic Planning Sciences, 79, 101102
  • Albreiki, B., Zaki, N., & Alashwal, H. (2021). A systematic literature review of student’performance prediction using machine learning techniques. Education Sciences, 11(9), 552.
  • Andrade-Girón, D., Sandivar-Rosas, J., Marín-Rodriguez, W., Susanibar-Ramirez, E., Toro-Dextre, E., Ausejo-Sanchez, J., ... & Angeles-Morales, J. (2023). Predicting student dropout based on machine learning and deep learning: A systematic review. EAI Endorsed Transactions on Scalable Information Systems, 10(5).
  • Bargmann, C., Thiele, L., & Kauffeld, S. (2022). Motivation matters: predicting students’ career decidedness and intention to drop out after the first year in higher education. Higher Education, 83(4), 845-861.
  • Bean, J. P., & Metzner, B. S. (1985). A conceptual model of nontraditional undergraduate student attrition. Review of educational Research, 55(4), 485-540.
  • Braxton, J. M., Doyle, W. R., Hartley III, H. V., Hirschy, A. S., Jones, W. A., & McLendon, M. K. (2013). Rethinking college student retention. John Wiley & Sons.
  • Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
  • Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (s. 785–794). ACM. https://doi.org/10.1145/2939672.2939785
  • Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189-1232.
  • Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression. John Wiley & Sons.
  • Huo, H., Cui, J., Hein, S., Padgett, Z., Ossolinski, M., Raim, R., & Zhang, J. (2023). Predicting dropout for nontraditional undergraduate students: A machine learning approach. Journal of College Student Retention: Research, Theory & Practice, 24(4), 1054-1077.
  • Kaggle. (2025). Predict students’ dropout and academic success. https://www.kaggle.com/datasets/thedevastator/higher-education-predictors-of-student-retention/data Khan, A., & Ghosh, S. K. (2021). Student performance analysis and prediction in classroom learning: A review of educational data mining studies. Education and Information Technologies, 26(1), 205-240.
  • López-Zambrano, J., Torralbo, J. A. L., & Romero, C. (2021). Early prediction of student learning performance through data mining: A systematic review. Psicothema, 33(3), 456.
  • Namoun, A., & Alshanqiti, A. (2020). Predicting student performance using data mining and learning analytics techniques: A systematic literature review. Applied Sciences, 11(1), 237.
  • OECD. (2022). Education at a glance 2022: OECD indicators. OECD Publishing. https://doi.org/10.1787/3197152b-en Pek, R. Z., Özyer, S. T., Elhage, T., Özyer, T., & Alhajj, R. (2022). The role of machine learning in identifying students at-risk and minimizing failure. IEEE Access, 11, 1224-1243.
  • Rastrollo-Guerrero, J. L., Gómez-Pulido, J. A., & Durán-Domínguez, A. (2020). Analyzing and predicting students’ performance by means of machine learning: A review. Applied sciences, 10(3), 1042.
  • Realinho, V., Machado, J., Baptista, L., & Martins, M. V. (2021). Predict students' dropout and academic success (Version 1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.5777340
  • Ridwan, A., Priyatno, A. M., & Ningsih, L. (2024). Predict students’ dropout and academic success with XGBoost. Journal of Education and Computer Applications, 1(2), 1-8.
  • Romero, C., & Ventura, S. (2020). Educational data mining and learning analytics: An updated survey. Wiley interdisciplinary reviews: Data mining and knowledge discovery, 10(3), e1355.
  • Salloum, S. A., Basiouni, A., Alfaisal, R., Salloum, A., & Shaalan, K. (2024). Predicting student retention in higher education using machine learning. In International Conference on Breaking Barriers with Generative Intelligence (pp. 197-206). Cham: Springer Nature Switzerland.
  • Quinlan, J. R. (2014). C4. 5: programs for machine learning. Elsevier.
  • Tinto, V. (1975). Dropout from higher education: A theoretical synthesis of recent research. Review of educational research, 45(1), 89-125.
  • Tinto, V. (2012). Leaving college: Rethinking the causes and cures of student attrition. University of Chicago press.

Predicting Student Dropout in the Education Sector: A Comparative Machine Learning Analysis

Year 2026, Volume: 18 Issue: 1, 91 - 104, 26.03.2026
https://doi.org/10.52791/aksarayiibd.1872176
https://izlik.org/JA78RH53YC

Abstract

In higher education, student academic failure and dropout are global challenges that affect individual educational trajectories as well as the academic effectiveness and economic sustainability of institutions. This study aims to comparatively examine the performance of different machine learning algorithms used to predict students’ academic achievement and dropout tendencies. An open-access dataset obtained from the Kaggle platform was used, consisting of 4,424 observations and 36 variables. Classification models were developed using logistic regression, decision tree, random forest, and XGBoost algorithms. Model performance was evaluated based on accuracy, precision, recall, and F1-score metrics. The results indicate that the logistic regression model provides a strong baseline with 91% accuracy and a 90% F1-score. The decision tree model achieved 90.63% accuracy and an 89.37% F1-score, offering interpretability but relatively limited performance. In contrast, ensemble learning–based random forest and XGBoost models demonstrated superior and more balanced results, each achieving 92% accuracy and a weighted F1-score of 92%. Notably, the high recall values of these models provide a clear advantage in identifying students at academic risk early. Overall, the findings suggest that machine learning–based comparative approaches can contribute to the development of early warning systems and data-driven decision support mechanisms in higher education.

Ethical Statement

This study was conducted using publicly available and anonymized secondary data obtained from the Kaggle platform. Since the dataset does not contain any personal or identifiable information and does not involve direct data collection from human participants, ethical committee approval was not required. All stages of the research were carried out in accordance with the principles of scientific research and publication ethics.

References

  • Aina, C., Baici, E., Casalone, G., & Pastore, F. (2022). The determinants of university dropout: A review of the socio-economic literature. Socio-Economic Planning Sciences, 79, 101102
  • Albreiki, B., Zaki, N., & Alashwal, H. (2021). A systematic literature review of student’performance prediction using machine learning techniques. Education Sciences, 11(9), 552.
  • Andrade-Girón, D., Sandivar-Rosas, J., Marín-Rodriguez, W., Susanibar-Ramirez, E., Toro-Dextre, E., Ausejo-Sanchez, J., ... & Angeles-Morales, J. (2023). Predicting student dropout based on machine learning and deep learning: A systematic review. EAI Endorsed Transactions on Scalable Information Systems, 10(5).
  • Bargmann, C., Thiele, L., & Kauffeld, S. (2022). Motivation matters: predicting students’ career decidedness and intention to drop out after the first year in higher education. Higher Education, 83(4), 845-861.
  • Bean, J. P., & Metzner, B. S. (1985). A conceptual model of nontraditional undergraduate student attrition. Review of educational Research, 55(4), 485-540.
  • Braxton, J. M., Doyle, W. R., Hartley III, H. V., Hirschy, A. S., Jones, W. A., & McLendon, M. K. (2013). Rethinking college student retention. John Wiley & Sons.
  • Breiman, L. (2001). Random forests. Machine learning, 45(1), 5-32.
  • Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (s. 785–794). ACM. https://doi.org/10.1145/2939672.2939785
  • Friedman, J. H. (2001). Greedy function approximation: a gradient boosting machine. Annals of statistics, 1189-1232.
  • Hosmer, D. W., Lemeshow, S., & Sturdivant, R. X. (2013). Applied logistic regression. John Wiley & Sons.
  • Huo, H., Cui, J., Hein, S., Padgett, Z., Ossolinski, M., Raim, R., & Zhang, J. (2023). Predicting dropout for nontraditional undergraduate students: A machine learning approach. Journal of College Student Retention: Research, Theory & Practice, 24(4), 1054-1077.
  • Kaggle. (2025). Predict students’ dropout and academic success. https://www.kaggle.com/datasets/thedevastator/higher-education-predictors-of-student-retention/data Khan, A., & Ghosh, S. K. (2021). Student performance analysis and prediction in classroom learning: A review of educational data mining studies. Education and Information Technologies, 26(1), 205-240.
  • López-Zambrano, J., Torralbo, J. A. L., & Romero, C. (2021). Early prediction of student learning performance through data mining: A systematic review. Psicothema, 33(3), 456.
  • Namoun, A., & Alshanqiti, A. (2020). Predicting student performance using data mining and learning analytics techniques: A systematic literature review. Applied Sciences, 11(1), 237.
  • OECD. (2022). Education at a glance 2022: OECD indicators. OECD Publishing. https://doi.org/10.1787/3197152b-en Pek, R. Z., Özyer, S. T., Elhage, T., Özyer, T., & Alhajj, R. (2022). The role of machine learning in identifying students at-risk and minimizing failure. IEEE Access, 11, 1224-1243.
  • Rastrollo-Guerrero, J. L., Gómez-Pulido, J. A., & Durán-Domínguez, A. (2020). Analyzing and predicting students’ performance by means of machine learning: A review. Applied sciences, 10(3), 1042.
  • Realinho, V., Machado, J., Baptista, L., & Martins, M. V. (2021). Predict students' dropout and academic success (Version 1.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.5777340
  • Ridwan, A., Priyatno, A. M., & Ningsih, L. (2024). Predict students’ dropout and academic success with XGBoost. Journal of Education and Computer Applications, 1(2), 1-8.
  • Romero, C., & Ventura, S. (2020). Educational data mining and learning analytics: An updated survey. Wiley interdisciplinary reviews: Data mining and knowledge discovery, 10(3), e1355.
  • Salloum, S. A., Basiouni, A., Alfaisal, R., Salloum, A., & Shaalan, K. (2024). Predicting student retention in higher education using machine learning. In International Conference on Breaking Barriers with Generative Intelligence (pp. 197-206). Cham: Springer Nature Switzerland.
  • Quinlan, J. R. (2014). C4. 5: programs for machine learning. Elsevier.
  • Tinto, V. (1975). Dropout from higher education: A theoretical synthesis of recent research. Review of educational research, 45(1), 89-125.
  • Tinto, V. (2012). Leaving college: Rethinking the causes and cures of student attrition. University of Chicago press.
There are 23 citations in total.

Details

Primary Language Turkish
Subjects Information Systems (Other)
Journal Section Research Article
Authors

Muhammed Akif Yenikaya 0000-0002-3624-722X

Submission Date January 26, 2026
Acceptance Date March 1, 2026
Publication Date March 26, 2026
DOI https://doi.org/10.52791/aksarayiibd.1872176
IZ https://izlik.org/JA78RH53YC
Published in Issue Year 2026 Volume: 18 Issue: 1

Cite

APA Yenikaya, M. A. (2026). Eğitim Sektöründe Öğrenci Terkinin Tahmini: Makine Öğrenmesi Tabanlı Karşılaştırmalı Bir Analiz. Aksaray Üniversitesi İktisadi Ve İdari Bilimler Fakültesi Dergisi, 18(1), 91-104. https://doi.org/10.52791/aksarayiibd.1872176