Developing Optimized Machine Learning Models For Timely Prediction and Prevention of College Dropout
Project by Polygence alum Arko

Project's result
Research Paper -- published in preprints.org; Revision invited at National High School Journal of Science (NHSJS); Scheduled to Present at the Symposium of Rising Scholars
They started it from zero. Are you ready to level up with us?
Summary
College graduates earn substantially more and are more likely to be employed. Consequently, it is critically important to understand the predictors of college dropout so that students and administrators can make a difference in college graduation outcomes. Previous studies remain limited in the scope of evaluating machine learning models for dropout prediction. Leveraging a dataset of 4,424 students that includes graduation outcome, demographic, socioeconomic and course data, and macroeconomic data, the objective of this paper is to identify the optimum machine learning model for predicting college dropout as a classification problem. We (a) perform extensive exploratory data analysis, (b) perform feature optimization (c) identify the best performing machine learning model across seven models evaluated, (d) study different testing-to-training ratios, (e) perform a comprehensive model evaluation, and (f) compare a multi-class classification approach to a binary classification one. The models were fine-tuned leveraging a grid search optimization algorithm and validated with k-fold cross-validation. Optimizing the hyperparameters, the grid search optimized random forest model performed the best in predicting college dropout with 0.85 accuracy, 0.72 sensitivity, 0.92 specificity, 0.82 precision, and 0.89 AUC-ROC. Furthermore, the optimized random forest model suggested the key predictors of dropout, in order of importance to be: number of curricular units in the second semester, number of curricular units in the first semester and whether the tuition and fees are up-to-date. The findings underscore the value of using machine learning for timely dropout risk prediction, enabling targeted resource allocation to mitigate risk and support successful graduation outcomes.

Morteza
Polygence mentor
PhD Doctor of Philosophy
Subjects
Biology, Engineering, Computer Science
Expertise
Healthcare, Biotech and bioengineering, writing papers (any type), Engineering (especially Mechanical & Biomedical), Medical Device, Physics, Data Science, Programming, Code writing, Machine Learning, Image Processing, Mathematics, App Development
Check out their profile

Arko
Student
Graduation Year
2026
Project review
“Met/exceeded expectations. I learnt a tremendous amount and was able to write a substantive research paper in an area that I care about a lot. I was able to apply my quantitative and computational skills (some that I had and many that I learnt during my Polygence journey) to address real-world questions.”
About my mentor
“My mentor is extremely knowledgeable, encouraging, and patient. I learned a tremendous amount from him, not just in terms of the underlying literature, but also about many new machine learning models, ways to evaluate them, and optimization strategies. The skills that I have learnt from him are invaluable, and I know that I will use them in many other projects, in high school, college, and beyond!”
Check out their profile