Polygence Scholar2022

Ankhita Sathanur

Eastlake High SchoolClass of 2023

About

Projects

How can a random forest classifier be used to identify pulsar candidates? with mentor Kristen (Aug. 24, 2022)

Project Portfolio

How can a random forest classifier be used to identify pulsar candidates?

Started May 16, 2022

Abstract or project description

Pulsars are a unique type of neutron star that emit pulses of radio emission in beams that can often be detected from Earth. As pulsars rapidly spin, the beams sweep across the earth, which allows for the detection of their periodic, repetitive pulses. Pulsars are extremely useful in the study of extreme states of matter and exoplanets, and are useful tools in measuring cosmic distances and searching for gravitational waves. Traditionally, pulsar candidates have been identified through manual signal processing. As data volumes increase, automated methods, such as artificial neural networks and other machine learning tools, have recently been proposed. In our project, we used another machine learning tool, the random forest classifier--an algorithm that takes the majority output of multiple decision trees--to accurately separate real pulsar candidates from radio frequency interference (RFI) and other noise. Once identified, these candidates can be further studied and possibly allotted telescope time to confirm them as pulsars. In developing our tool, we used the HTRU2 dataset from the UCI Machine Learning Repository, which contained 1,639 real pulsar examples and 16,259 samples of RFI/noise. Features of the data we used included the mean, standard deviation, excess kurtosis, and skewness of the integrated pulse profile and DM-SNR curve. Our model demonstrated a 98% accuracy in identifying pulsars. Our results indicated that the excess kurtosis, skewness, and mean of integrated profile respectively were the most important factors in differentiating between real pulsars and interference. This tool could be used to process data from future surveys to narrow down the candidates that need to be directly processed by humans, for example data from future pulsar surveys conducted by the Square kilometer Array (SKA).