Arko Chakrabartiroy | Polygence
Symposium presenter banner

Symposium

Of Rising Scholars

Fall 2025

Arko will be presenting at The Symposium of Rising Scholars on Saturday, September 27th! To attend the event and see Arko's presentation.

Go to Polygence Scholars page
Arko Chakrabartiroy's cover illustration
Polygence Scholar2025
Arko Chakrabartiroy's profile

Arko Chakrabartiroy

Class of 2026New York, New York

Project Portfolio

Developing Optimized Machine Learning Models For Timely Prediction and Prevention of College Dropout

Started Mar. 6, 2025

Portfolio item's cover image

Abstract or project description

College graduates earn substantially more and are more likely to be employed. Consequently, it is critically important to understand the predictors of college dropout so that students and administrators can make a difference in college graduation outcomes. Previous studies remain limited in the scope of evaluating machine learning models for dropout prediction. Leveraging a dataset of 4,424 students that includes graduation outcome, demographic, socioeconomic and course data, and macroeconomic data, the objective of this paper is to identify the optimum machine learning model for predicting college dropout as a classification problem. We (a) perform extensive exploratory data analysis, (b) perform feature optimization (c) identify the best performing machine learning model across seven models evaluated, (d) study different testing-to-training ratios, (e) perform a comprehensive model evaluation, and (f) compare a multi-class classification approach to a binary classification one. The models were fine-tuned leveraging a grid search optimization algorithm and validated with k-fold cross-validation. Optimizing the hyperparameters, the grid search optimized random forest model performed the best in predicting college dropout with 0.85 accuracy, 0.72 sensitivity, 0.92 specificity, 0.82 precision, and 0.89 AUC-ROC. Furthermore, the optimized random forest model suggested the key predictors of dropout, in order of importance to be: number of curricular units in the second semester, number of curricular units in the first semester and whether the tuition and fees are up-to-date. The findings underscore the value of using machine learning for timely dropout risk prediction, enabling targeted resource allocation to mitigate risk and support successful graduation outcomes.