Polygence Scholar2022

Kunal Sachdev

R N Podar International School, Powai, MumbaiClass of 2022Waterloo, Ontario

About

Hello! My name is Kunal Sachdev and my Polygence project is on “NBA Free Throw Prediction”. I chose to work on this project because I am really interested in understanding data science concepts, and I am an avid fan of basketball. This project ties both my areas of interest together and gives me a complete experience into how a data scientist actually works. At the end of this project, I would like to come up with a research paper summarizing the methods I used, the data models that I created, and the results that I obtained.

Projects

"NBA Free Throw Prediction" with mentor Ian (Sept. 6, 2022)

Project Portfolio

NBA Free Throw Prediction

Started May 2, 2022

Abstract or project description

Tech Stack: Python, Jupyter Notebook, Pandas, NumPy, Seaborn, Matplotlib, Scipy, Statsmodels, Scikit-learn

The overall goal of the project is to predict the free throw shooting accuracy of NBA players.

First, we obtained data pertaining to the stats of NBA players in the 2021-22 (the latest non-COVID NBA season) and 2018-19 NBA season from nba.com

We then used various tools of the Python programming language to analyze the 2021-22 NBA data set collected and carry out exploratory data analysis (EDA).

The columns that were not needed and those that did not show appropriate distributions as per the assumptions of a linear regression model were removed from the data set. By setting a certain limit for the number of free throws attempted throughout the season, we eliminated low information points from the data set. We also evaluated the correlation of the various features of the data set with our target variable free throw percentage and constructed regression plots to visualize the same.

Once our data set was clean and ready for further analysis, we constructed two competing linear regression models. For the first model, using a descending list of correlation of features with free throw percentage, we iteratively created a model and removed a variable until the model with the lowest Akaike Information Criteria (AIC) was obtained. For the second model, using the same list generated earlier, we iteratively removed one variable until all the variables in the model had a Variation Inflation Factor (VIF) less than 10.

Furthermore, we interpreted the coefficients of the features in the two models and offered insights into possible reasons for the observed coefficients.

Finally, we compared the ability of the models to accurately predict the free throw percentage of NBA players in 2018-19 data set and arrived at the conclusion that although the lowest AIC model predicted the data better than model with features having the lowest VIF, the lowest VIF model should be preferred because many variables in the lowest AIC model are directly derived from one another, causing multicollinearity.