profile pic

Avery R

- Research Program Mentor

MPH Master of Public Health candidate

Expertise

Public health, Epidemiology, Geospatial Analysis, Natural Language Processing, Data Acquisition.

Project ideas

Project ideas are meant to help inspire student thinking about their own project. Students are in the driver seat of their research and are free to use any or none of the ideas shared by their mentors.

Predicting the Prevalence of Vaccinated Persons

In this project, you will learn how to incorporate existing vaccination prevalence data, along with environmental covariate data, to predict a granular prevalence of vaccinated persons across a large spatial area. You will document and report your findings with reproducible code in PDF or HTML format. These are a few of the skills used while working on this project: • Collecting and joining multiple data sources together. • Extracting point value data from large geospatial files. • Exploring statistical relationships and predictive attributes between environmental factors and vaccinated person prevalence. • Implementing a spatial regression analysis. • Generating a map with predicted prevalence rasterized over a large spatial domain. This project requires a basic familiarity with regression analysis, as well as the R programming language.

General Methods in Natural Language Processing

Natural Language Processing (NLP) is composed of specialized data science methods where we work with large bodies of "text as data" to help us answer research questions. NLP is used in a variety of practical domains, from linguistics and the humanities, to legal studies, business intelligence and marketing. In this project, you will select a body of text: a book (or books by a certain author), articles, documents, or social media content, and conduct an in-depth analysis that includes sentiment analysis, document and word frequency, n-grams, word correlation, and topic modeling. You will document your findings and results in a reproducible workflow and repository, as well as a formal journal report and website. These are a few of the skills used while working on this project: • Cleaning and organizing complex datasets. • Creating print-quality visualizations, charts and plots. • Exploring quantitative and qualitative interpretation of statistical results. • Implementing and evaluating outputs of machine learning workflows. • Researching interdisciplinary approaches to the humanities, language, and data science. This project requires a basic familiarity with the R programming language and language arts.

Coding skills

R-studio, Python, SQL, Stata

Interested in working with expert mentors like Avery?

Apply now