Polypilot product mascot

Introducing PolyPilot:

Our AI-Powered Mentorship Program

Learn More
profile picture

Anav S

- Research Program Mentor

PhD candidate at Stanford University

Expertise

Machine Learning, Data Science, Quantitative Modeling, Statistics, Mathematics

Bio

Hi! I am currently a PhD student in Stanford's statistics department. Prior to joining the PhD program I completed a B.S. in mathematics and M.S. in statistics, both from Stanford. My interests center around using quantitative models to analyze, interpret, and utilize trends in data. Depending on who you ask, this subject goes by many names (e.g. machine learning, data science, statistical learning, deep learning). I approach these problems from a statistical lens, which lends itself to two main kinds of data driven tasks: prediction and inference. I am eager to work with students who want to learn how to better work with, model, and understand data sets!

Project ideas

Project ideas are meant to help inspire student thinking about their own project. Students are in the driver seat of their research and are free to use any or none of the ideas shared by their mentors.

Introduction to Machine Learning

In this project we will walk through how to set up machine learning experiments and discuss foundational models that are used for regression and classification tasks. Topics include but are not limited to cross-validation and testing, overfitting/underfitting, feature selection and dimensionality reduction, linear regression, logistic regression, and neural networks. The culmination is to apply these techniques to a prediction problem of the student's choosing. Topics will be tailored and scoped to the interests and background of the student.

Natural Language Processing (NLP)

In 2018, Google released BERT, a neural language model which helped NLP practitioners outperform previous state of the art benchmarks in language tasks (e.g. question answering, sentiment analysis, machine translation) across the board. In this project we will learn how deep learning researches approach problems in language quantitatively and develop an understanding of "contextual word embeddings", the motivation for BERT, from the ground up. Then we will learn how to apply BERT to a language task of your choosing. One example is quantifying political bias in news articles.

Exploring Genomics Data

In this project the student will get to explore the 1000 Genomes project dataset. The student will learn how to make their own hypothesis about the data and validate them quantitatively. The student will learn how to construct features and find signals in the dataset. The project will involve both statistical inference and prediction.

Final Notes

If you have a particular dataset in mind, I can help you set up an end-to-end project starting from stages as early as scraping data/dataset construction.

Coding skills

Python, C++, C

Languages I know

Hindi

Teaching experience

I worked for Stanford's Continuing Studies program as a TA for an introductory machine learning and data science course and am currently a developer and course assistant for Stanford Center Professional Development's course on deep learning in natural language processing.

Credentials

Work experience

Stanford Center for Professional Development (2019 - Current)
Developer/Course Assistant for XCS224N (Deep Learning for NLP)
Citadel (2019 - 2019)
Quantitative Research Intern
Cruise Automation (2018 - 2018)
Software Engineering Intern
Stanford University (2017 - 2017)
Undergraduate Math Researcher at SURIM

Education

Stanford University
BS Bachelor of Science (2020)
Mathematics
Stanford University
MS Master of Science (2020)
Statistics
Stanford University
PhD Doctor of Philosophy candidate
Statistics

Interested in working with expert mentors like Anav?

Apply now