profile picture

Avery R

- Research Program Mentor

MPH candidate at University of California Berkeley (UC Berkeley)


Public health, Epidemiology, Geospatial Analysis, Natural Language Processing, Data Acquisition.


I am an MPH student in the School of Public Health at UC Berkeley. I have a background in literature and behavioral health, and my current research focuses on interdisciplinary approaches to health data. My general research interests include public health surveillance, geospatial health analysis, prediction, fairness in machine learning, natural language processing, and computational statistics. I write electronic music in my spare time, ride gravel bikes, and have become an expert fine-scale miniature painter over my years as an adult.

Project ideas

Project ideas are meant to help inspire student thinking about their own project. Students are in the driver seat of their research and are free to use any or none of the ideas shared by their mentors.

Predicting the Prevalence of Vaccinated Persons

In this project, you will learn how to incorporate existing vaccination prevalence data, along with environmental covariate data, to predict a granular prevalence of vaccinated persons across a large spatial area. You will document and report your findings with reproducible code in PDF or HTML format. These are a few of the skills used while working on this project: • Collecting and joining multiple data sources together. • Extracting point value data from large geospatial files. • Exploring statistical relationships and predictive attributes between environmental factors and vaccinated person prevalence. • Implementing a spatial regression analysis. • Generating a map with predicted prevalence rasterized over a large spatial domain. This project requires a basic familiarity with regression analysis, as well as the R programming language.

General Methods in Natural Language Processing

Natural Language Processing (NLP) is composed of specialized data science methods where we work with large bodies of "text as data" to help us answer research questions. NLP is used in a variety of practical domains, from linguistics and the humanities, to legal studies, business intelligence and marketing. In this project, you will select a body of text: a book (or books by a certain author), articles, documents, or social media content, and conduct an in-depth analysis that includes sentiment analysis, document and word frequency, n-grams, word correlation, and topic modeling. You will document your findings and results in a reproducible workflow and repository, as well as a formal journal report and website. These are a few of the skills used while working on this project: • Cleaning and organizing complex datasets. • Creating print-quality visualizations, charts and plots. • Exploring quantitative and qualitative interpretation of statistical results. • Implementing and evaluating outputs of machine learning workflows. • Researching interdisciplinary approaches to the humanities, language, and data science. This project requires a basic familiarity with the R programming language and language arts.

Coding skills

R-studio, Python, SQL, Stata

Teaching experience

I teach several coding and data science intensive workshops per semester at UC Berkeley's D-lab.


Work experience

San Francisco Department of Public Health (2020 - Current)
Data Analyst Intern
D-lab, UC Berkeley (2019 - Current)
Consultant, Data Science Fellow
Lifelong Medical Care (2018 - 2020)
Data Management Coordinator


California State University, San Francisco (CSU San Francisco)
BA Bachelor of Arts (2014)
University of California Berkeley (UC Berkeley)
MPH Master of Public Health candidate
Public Health, Epidemiology & Biostatistics

Interested in working with expert mentors like Avery?

Apply now