Class of 2025Aurora, Illinois
AboutHello! I am Sahasra Chatakondu, and plan to pursue Data Science in college. I am interested in Machine Learning, Artificial Intelligence, and various coding languages. I was first interested in this project after my favorite teacher disclosed her battle with Breast Cancer; I have had a passion ever since trying to find effective strategies to prevent detrimental spread. I am very content with the progress of my project and would love to continue research on how data science has vastly changed medical processes. A few more things about me are that I love to draw, code, and play tennis in my free time!
- "An Analysis of the k-Nearest Neighbor Classifier to Predict Benign and Malignant Breast Cancer Tumors" with mentor Kevin (Aug. 22, 2023)
An Analysis of the k-Nearest Neighbor Classifier to Predict Benign and Malignant Breast Cancer Tumors
Started June 1, 2023
Abstract or project description
Because of Breast Cancer's high mortality rate and being a leading cause of death among women worldwide, there has been importance given to machine learning (ML) algorithms to detect early signs of benign and malignant tumors effectively. Assistance from ML classifiers allows for a more efficient evaluation of mammographic results, surpassing the capabilities of radiologists who manually classify extensive patient data. This study aims to evaluate the effectiveness of the k-Nearest Neighbor (kNN) classifier in characterizing cancer tumor stages based on concavity, texture, area, perimeter, and smoothness. We employ scatterplots to differentiate between benign and malignant classes using the Breast Cancer Wisconsin Dataset (WBCD) from the University of California at Irvine Machine Learning Repository. Employing the k-Fold Cross Validation (k-FCV) technique, we determine the optimal value for k to assign anonymous data to their respective categories. The analysis conducted in this study finds that the most favorable value for the hyperparameter k is 12, resulting in a highly effective diagnostic outcome from administering four distinct tests. Given the absence of a predefined value for the k parameter, guesswork could lead to accuracy errors and misdiagnosis; therefore, employing k-FCV provides a more precise approach to determining the optimal class for unknown tumor attributes. Additionally, meticulous preprocessing of this dataset and measuring how different data splits impact accuracy are used to organize the data effectively and achieve reliable results. Recognizing that early detection is essential in preventing Breast Cancer-related deaths, ML techniques like kNN can greatly reduce mortality rates associated with the disease.