Polygence Scholar2024

Sophie Wang

Class of 2026

About

Projects

"Detection of cult language using automated machine learning algorithm" with mentor Akhila (Sept. 8, 2024)

Project Portfolio

Detection of cult language using automated machine learning algorithm

Started May 3, 2024

Abstract or project description

Although the identification of harmful or persuasive language become very popular, especially in hate-speech detection, another kind of language has yet to be widely researched and studied; it is a type of language categorized as 'cultish language.' This paper will explore the characteristics of cultish languages, provide background information about cult languages, and examine their traits and history. We will also explore the detection of cultish language using various automated machine-learning algorithms. Specifically, we use feature extraction models like Bag of Words (BoW), Word2Vec, N-Gram, and the BERT model to classify cult language from other speeches. The dataset was customized based on transcripts of different cult meetings and lectures. To contrast against those cult languages, a 'non-cult' language dataset, based on many other datasets and paragraphs, was also created. Our results indicate that while all models received reasonable results, the BERT model achieved the best performance with a f1 score of 91.74%. This suggests that deep learning language models like BERT better identify patterns in cultish language.