- Research Program Mentor
PhD Doctor of Philosophy candidate
Machine Learning, Data Science, Quantitative Modeling, Statistics, Mathematics
Introduction to Machine Learning
In this project we will walk through how to set up machine learning experiments and discuss foundational models that are used for regression and classification tasks. Topics include but are not limited to cross-validation and testing, overfitting/underfitting, feature selection and dimensionality reduction, linear regression, logistic regression, and neural networks. The culmination is to apply these techniques to a prediction problem of the student's choosing. Topics will be tailored and scoped to the interests and background of the student.
Natural Language Processing (NLP)
In 2018, Google released BERT, a neural language model which helped NLP practitioners outperform previous state of the art benchmarks in language tasks (e.g. question answering, sentiment analysis, machine translation) across the board. In this project we will learn how deep learning researches approach problems in language quantitatively and develop an understanding of "contextual word embeddings", the motivation for BERT, from the ground up. Then we will learn how to apply BERT to a language task of your choosing. One example is quantifying political bias in news articles.
Exploring Genomics Data
In this project the student will get to explore the 1000 Genomes project dataset. The student will learn how to make their own hypothesis about the data and validate them quantitatively. The student will learn how to construct features and find signals in the dataset. The project will involve both statistical inference and prediction.
If you have a particular dataset in mind, I can help you set up an end-to-end project starting from stages as early as scraping data/dataset construction.