- Research Program Mentor
MPH Master of Public Health candidate
Public health, Epidemiology, Geospatial Analysis, Natural Language Processing, Data Acquisition.
Predicting the Prevalence of Vaccinated Persons
In this project, you will learn how to incorporate existing vaccination prevalence data, along with environmental covariate data, to predict a granular prevalence of vaccinated persons across a large spatial area. You will document and report your findings with reproducible code in PDF or HTML format. These are a few of the skills used while working on this project: • Collecting and joining multiple data sources together. • Extracting point value data from large geospatial files. • Exploring statistical relationships and predictive attributes between environmental factors and vaccinated person prevalence. • Implementing a spatial regression analysis. • Generating a map with predicted prevalence rasterized over a large spatial domain. This project requires a basic familiarity with regression analysis, as well as the R programming language.
General Methods in Natural Language Processing
Natural Language Processing (NLP) is composed of specialized data science methods where we work with large bodies of "text as data" to help us answer research questions. NLP is used in a variety of practical domains, from linguistics and the humanities, to legal studies, business intelligence and marketing. In this project, you will select a body of text: a book (or books by a certain author), articles, documents, or social media content, and conduct an in-depth analysis that includes sentiment analysis, document and word frequency, n-grams, word correlation, and topic modeling. You will document your findings and results in a reproducible workflow and repository, as well as a formal journal report and website. These are a few of the skills used while working on this project: • Cleaning and organizing complex datasets. • Creating print-quality visualizations, charts and plots. • Exploring quantitative and qualitative interpretation of statistical results. • Implementing and evaluating outputs of machine learning workflows. • Researching interdisciplinary approaches to the humanities, language, and data science. This project requires a basic familiarity with the R programming language and language arts.