profile pic

Anav S

- Research Program Mentor

PhD Doctor of Philosophy candidate

Expertise

Machine Learning, Data Science, Quantitative Modeling, Statistics, Mathematics

Project ideas

Project ideas are meant to help inspire student thinking about their own project. Students are in the driver seat of their research and are free to use any or none of the ideas shared by their mentors.

Introduction to Machine Learning

In this project we will walk through how to set up machine learning experiments and discuss foundational models that are used for regression and classification tasks. Topics include but are not limited to cross-validation and testing, overfitting/underfitting, feature selection and dimensionality reduction, linear regression, logistic regression, and neural networks. The culmination is to apply these techniques to a prediction problem of the student's choosing. Topics will be tailored and scoped to the interests and background of the student.

Natural Language Processing (NLP)

In 2018, Google released BERT, a neural language model which helped NLP practitioners outperform previous state of the art benchmarks in language tasks (e.g. question answering, sentiment analysis, machine translation) across the board. In this project we will learn how deep learning researches approach problems in language quantitatively and develop an understanding of "contextual word embeddings", the motivation for BERT, from the ground up. Then we will learn how to apply BERT to a language task of your choosing. One example is quantifying political bias in news articles.

Exploring Genomics Data

In this project the student will get to explore the 1000 Genomes project dataset. The student will learn how to make their own hypothesis about the data and validate them quantitatively. The student will learn how to construct features and find signals in the dataset. The project will involve both statistical inference and prediction.

Final Notes

If you have a particular dataset in mind, I can help you set up an end-to-end project starting from stages as early as scraping data/dataset construction.

Coding skills

Python, C++, C

Interested in working with expert mentors like Anav?

Apply now