Developing Optimized Machine Learning Models For Timely Prediction and Prevention of College Dropout | Polygence

Developing Optimized Machine Learning Models For Timely Prediction and Prevention of College Dropout

Project by Polygence alum Arko

Developing Optimized Machine Learning Models For Timely Prediction and Prevention of College Dropout

Project's result

Research Paper -- published in preprints.org; Revision invited at National High School Journal of Science (NHSJS); Scheduled to Present at the Symposium of Rising Scholars

They started it from zero. Are you ready to level up with us?

Summary

College graduates earn substantially more and are more likely to be employed. Consequently, it is critically important to understand the predictors of college dropout so that students and administrators can make a difference in college graduation outcomes. Previous studies remain limited in the scope of evaluating machine learning models for dropout prediction. Leveraging a dataset of 4,424 students that includes graduation outcome, demographic, socioeconomic and course data, and macroeconomic data, the objective of this paper is to identify the optimum machine learning model for predicting college dropout as a classification problem. We (a) perform extensive exploratory data analysis, (b) perform feature optimization (c) identify the best performing machine learning model across seven models evaluated, (d) study different testing-to-training ratios, (e) perform a comprehensive model evaluation, and (f) compare a multi-class classification approach to a binary classification one. The models were fine-tuned leveraging a grid search optimization algorithm and validated with k-fold cross-validation. Optimizing the hyperparameters, the grid search optimized random forest model performed the best in predicting college dropout with 0.85 accuracy, 0.72 sensitivity, 0.92 specificity, 0.82 precision, and 0.89 AUC-ROC. Furthermore, the optimized random forest model suggested the key predictors of dropout, in order of importance to be: number of curricular units in the second semester, number of curricular units in the first semester and whether the tuition and fees are up-to-date. The findings underscore the value of using machine learning for timely dropout risk prediction, enabling targeted resource allocation to mitigate risk and support successful graduation outcomes.

Morteza

Morteza

Polygence mentor

PhD Doctor of Philosophy

Subjects

Biology, Engineering, Computer Science

Expertise

Healthcare, Biotech and bioengineering, writing papers (any type), Engineering (especially Mechanical & Biomedical), Medical Device, Physics, Data Science, Programming, Code writing, Machine Learning, Image Processing, Mathematics, App Development

Arko

Arko

Student

Graduation Year

2026

Project review

“Met/exceeded expectations. I learnt a tremendous amount and was able to write a substantive research paper in an area that I care about a lot. I was able to apply my quantitative and computational skills (some that I had and many that I learnt during my Polygence journey) to address real-world questions.”

About my mentor

“My mentor is extremely knowledgeable, encouraging, and patient. I learned a tremendous amount from him, not just in terms of the underlying literature, but also about many new machine learning models, ways to evaluate them, and optimization strategies. The skills that I have learnt from him are invaluable, and I know that I will use them in many other projects, in high school, college, and beyond!”