Polygence blog / Research Opportunities and Ideas

15 Data Science Passion Project Ideas For High School Students

7 minute read

man looking at charts on a computer screen

We talk a lot about passion project ideas for high school students at Polygence and we specifically cover data science projects in this post.

Although there are many options to learn through a data science summer research opportunity or internship, there are also many data science projects that you can try and execute on your own. In the process, you can build your technical skills and learn about one of the most important fields in our world today. Data science helps with advancing developments in healthcare, urban planning, and disaster response to name a few.

What Makes a Good Data Science Project?

A good data science project idea, as with any passion project, should be centered around a topic that genuinely interests you. When you’re passionate about the topic, everything about the work becomes more enjoyable and rewarding and you’ll give yourself the best chance to go through with completing the project.

This type of project uses relevant and reliable data that aligns with the project goal. If there isn’t quality data to analyze, then there isn’t much that can be done. On a similar note, a great project demonstrates meticulous attention to cleaning and preparing the data for analysis. 

After preparing the data, an ideal data science project idea presents a thorough analysis of the data, supported by relevant visualizations and statistical measures. The value of the project comes from the insights and takeaways that you’re able to derive from your analysis. 

Finally, keep in mind that a passion project doesn't have to be perfect from the start. You’ll most likely make some small tweaks as you go along, but as long as the topic is exciting to you, you’ll find a way to make it work.

Data Science Portfolio Project Ideas

1. Investigating the Relationship between Air Pollution and Health Outcomes in Rural and Metropolitan Areas

This project would involve obtaining publicly available data on air pollution levels and health outcomes (e.g., hospital admissions for respiratory illnesses, mortality rates, lung cancer prevalence/incidence). You could then analyze the data to determine if there is a correlation between air pollution levels and negative health outcomes. You could also explore the potential impact of factors such as socioeconomic status, age, or sex/gender on the relationship between air pollution and health outcomes.

Possible data sets: air pollution levels, hospital admissions data

2. Predictive Stock Market Analysis

This project would aim to predict stock prices and identify market trends by analyzing historical financial data and sentiment from news and social media. By accurately forecasting stock movements, investors can make informed decisions about the stock market.. Some data science methodologies for tackling this problem are time series analysis, Long Short-Term Memory (LTSM) for sequence prediction, and sentiment analysis using  Natural Language Processing (NLP) and machine learning (ML).

Possible data sets: historical stock prices (e.g., Yahoo Finance API), financial news articles, social media data (e.g., Twitter API)

3. Social Justice Engagement Project

An example of a social justice engagement project could be if there is a new crime law in your community, use a dataset released from your community to help portray whether this new law has positively affected your community or not. This would require using data visualization graphs to report findings in an interesting and interpretable way. You will also use fundamental statistical tests to validate your results. If the findings are interesting, the findings can be written about in a blog post and/or be reported to an elected official in your community.

Possible data sets: municipality records, studies and reports (e.g., United States Census Bureau surveys)

4. Recommendation System for Movies, Music, or Books

With this project, build a recommendation engine that suggests personalized content based on user preferences. This is a very relevant project to today’s world because it can help users discover new and relevant content, leading to increased user satisfaction and retention for streaming platforms and online retailers. Here’s a written resource to get you started.

Possible data sets: movie ratings (e.g., MovieLens dataset), music listening history, book ratings (e.g., Goodreads dataset)

5. COVID-19 Data Analysis

Analyzing COVID-19 data allows us to gain insights into the pandemic's progression, track the effectiveness of public health measures, and identify regions that require additional support. This data-driven approach is crucial for policymakers and healthcare professionals to make informed decisions in managing the pandemic and dealing with future potential pandemics. Methodologies for analyzing this project can include data visualization, time series analysis, geographical mapping, epidemiological modeling. Here’s a resource from the CDC that goes more in-depth into epidemiological modeling and why it matters.

Possible data sets: COVID-19 case data (e.g., Johns Hopkins University dataset), vaccination data, mobility data (e.g., Google Mobility Reports)

Complete a research project in just 6-weeks!

Dive into highly concentrated content on a specific topic with the guidance of expert mentors in artificial intelligence, computer science, finance and creative writing, and build your own unique project in just 6 weeks!

Computer Science Student

6. Customer Churn Prediction

Predicting customer churn is essential for businesses to retain valuable customers. By identifying factors leading to churn, companies can proactively address issues, enhance customer satisfaction, and improve their services, ultimately increasing customer loyalty and profitability. Some data science techniques that you could look into learning to do this project include logistic regression, decision trees, random forests, and gradient boosting. These techniques are more advanced data science methodologies, so consider this project if you’ve already had experience with data science projects.

Possible data sets: customer usage data for specific companies

7. Climate Change Data Analysis

Analyzing climate data helps us understand the impact of climate change, identify patterns, and assess potential risks. This knowledge is vital for policymakers, scientists, and communities to work towards a more sustainable future. You can conduct time series analyses and data visualizations to see how temperatures or sea levels have changed over time and identify patterns.

Possible data sets: climate data from government agencies (e.g., National Ocean and Atmospheric Administrator (NOAA), NASA Center for Climate Simulation), temperature records, and sea level data

8. Predicting Air Quality

Predicting air quality is essential for public health and environmental protection. By forecasting air quality, authorities can implement measures to reduce pollution and minimize health risks. For this project, you can perform regressions and time series forecasting to analyze how air quality has changed over time, and maybe even compare between specific regions or cities in the US.

Possible data sets: air quality data from environmental agencies (e.g., Environmental Protection Agency (EPA), weather data, pollutant concentration records

9. Healthcare Fraud Detection

Healthcare fraud imposes significant financial burdens on healthcare systems and compromises patient care. Detecting fraudulent activities using data science methods helps save costs, preserve resources, and maintain the integrity of healthcare services.

Possible data sets: healthcare insurance claims data with fraud labels (e.g., Kaggle Healthcare Fraud dataset)

10. Social Network Analysis

Social network analysis helps us understand the structure and dynamics of relationships in social media apps. This knowledge is valuable for marketers, policymakers, and sociologists to identify influencers, target audiences, and study the spread of information.

Possible data sets: social network data (e.g., Meta Graph API, Twitter network data)

Motivated student with mentor

The Core Polygence Program

The Polygence Core Program pairs students with an expert mentor to explore their unique research interest over a structured 10-session project, building a student-driven research outcome they can proudly showcase.

The Polygence core program consists of 10 one-on-one sessions between the student and mentor. The program is structured around 3 Milestones in addition to regular assignments to ensure students make steady progress on their project. Given the diversity of projects, these milestones are designed to be as flexible as possible to accommodate any and all types of projects.

  • 10 one-on-one sessions in total

  • Opportunity to apply to present at Symposium

  • Opportunity to ask mentor for Letter of Recommendation

11. Web Scraping Projects

Knowing how to scrape data from the web is a very useful skill to have. Building a web scraper allows you to automatically retrieve large amounts of data from specific websites so that you don’t have to do it all manually. You can build a scraper for a ton of use cases, like analyzing real estate data, job market trends, and movie reviews. Be sure to check a website’s terms of service before you scrape.

Watch this Build a Web Scraper YouTube video to learn more!

Possible data sets: product information, customer reviews

12. Housing Predictions

Predicting house prices is crucial for homebuyers, sellers, and real estate investors. By understanding price trends and factors influencing housing costs in their area, buyers and sellers can both make well-informed decisions in the real estate market. This project will likely require regression techniques.

Possible data sets: housing price data, real estate listings

13. Transportation Traffic Congestion Analysis

Analyzing traffic congestion patterns helps to optimize urban transportation and reduce commuting time. For this project you have the option of analyzing either your hometown or any town/city that’s of interest to you. You should be able to find local traffic databases for the specific town or region that you’ve chosen. For example, here are traffic data and statistics for the state of Texas.

Possible data sets: traffic count studies, traffic congestion trackers, Bureau of Transportation Statistics

14. Food Recommendation System

A food recommendation system helps people discover new recipes or restaurants that align with their preferences and dietary needs. A data science skill that would be helpful for this project and all recommendation systems in general is collaborative filtering, which is a technique that can filter out items that a user might like based on the reactions from similar users.

Possible data sets: recipe databases, restaurant reviews

15. Energy Consumption Forecasting

This could be an interesting project for you if you’re interested in climate change and sustainability. Forecasting energy consumption enables better energy resource planning and allows better optimization of energy production, leading to cost savings and environmental benefits. Again, this kind of project will use techniques like time series forecasting and regression.

Possible data sets: historical energy consumption data, weather data

How to Choose the Right Data Science Project For You

The right data science passion project for you is a project that you find exciting and meaningful! The above ideas are just a few of many projects that span different industries and fields. Don’t feel pressured to settle for a project that isn’t as exciting to you even though it may seem more impressive or complex.

Certain projects may also use more advanced data science techniques than others. For example, if you’re new to data science, then projects that require maybe a simple regression analysis may be a better fit than projects that use neural networks. However, if you’re up for learning about an advanced data science technique then you should definitely go for it!

If you’re interested in a data science project, Polygence is a great option. Our research program mentors have worked with middle and high school students on exciting data science project ideas such as predicting loan defaults using logistic regression and using a statistical model to identify unclear signs in Indus script.

Do Your Own Research Through Polygence

Your passion can be your college admissions edge! Polygence provides high schoolers a personalized, flexible research experience proven to boost your admission odds. Get matched to a mentor now!"

File searching