Of Rising ScholarsFall 2022

Varun will be presenting at The Symposium of Rising Scholars on Saturday, September 24th! To attend the event and see Varun's presentation,

Register here!
Go to Polygence Scholars page
Varun Venkatesh's cover illustration
Polygence Scholar2022
Varun Venkatesh's profile

Varun Venkatesh

Dublin High SchoolClass of 2024Dublin, CA



  • "Statistical Model for Identifying Unclear and Doubtfully Restored Signs of the Indus Script" with mentor Ali (Working project)

Project Portfolio

Statistical Model for Identifying Unclear and Doubtfully Restored Signs of the Indus Script

Started Apr. 8, 2022

Abstract or project description

A writing system developed between 2500 and 1800 BCE in the Indus Valley civilization that remains undeciphered. Indus script texts found so far in the archeological digs are limited in number and include a lot of damaged artifacts with unclear and missing signs. Identifying the missing and unclear signs and extending this text corpus will be beneficial for further research. Here, we attempt to predict the missing and unclear signs using n-gram Markov chain models using the ICIT text corpus. First, we analyze patterns and concordance of the signs, pairs, triplets, and other n-grams and how the signs behave with respect to their positions in the texts. With that understanding, we built Markov chain language models based on n-grams, augmented with positional probability. Since signs could be missing in any location of the texts, we devised and implemented effective sign fill-in models on top of these Markov chain models. Using the language  models and the sign fill-in models, we then identified missing single signs on the test dataset and tuned our parameters to improve the accuracy of a match of about 60% for a match among the five signs output by our model. Then we filled in the real unclear texts with our predicted signs. We hope that the statistical models we developed here and the results from this work adds to the text corpus and aids in understanding the Indus script and contributes to the decipherment effort.