Polygence Scholar2022

Nikhil Nayak

Sunset High SchoolClass of 2024Portland, Oregon

About

Projects

"Compressing Music using an AI-based Autoencoder" with mentor Sejal (Feb. 10, 2022)

Project Portfolio

Compressing Music using an AI-based Autoencoder

Started June 24, 2021

Abstract or project description

Audio streaming, specifically online music streaming, has become more popular in the past two decades. Through services like Soundcloud and Spotify, streaming music has been made more straightforward. However, for areas with extreme limitations with regard to internet bandwidth, streaming audio can prove to be difficult. When recorded, audio most commonly exists in the WAVEformat. The WAVE format perfectly represents the audio waveform in a lossless manner. However, the WAVE format is completely uncompressed. While this allows for perfect quality, this makes WAVE files extremely large. While popular codecs like FLAC or MPEG (mp3) allow for audio compression, no solution allows for the rate of compression necessary to stream audio on ultra-low bandwidths. While AI-based solutions do exist, all AI-based solutions are general purpose. While this does create a single code for all types of audio waveforms, it also creates a model that lacks domain knowledge, the knowledge that can be used to compress audio more faithfully. For this project, the domain of audio used was classical music, specifically piano audio waveforms. The objective of this project is to apply AI and Subpixel Interpolation to quickly and accurately compress and decompress audio domain- specific audio waveform. Specifically, this project aimed at compressing and decompressing classical music. To compress the audio, sinc interpolation was used to downsample the original waveform. Subsequently, sets of convolution and subpixel layers were used to reconstruct the original waveform. To evaluate the differences between the original and reconstructed waveforms, a variation of Mean Squared Error that gave more weight to outliers was used. This project showed that Subpixel-based AI compression/decompression is not only possible but can be effectively used to compress domain-specific audio waveforms. If further researched, a consideration for the architecture of the models would be an AI-based discriminator, which could accurately judge how well the encoder/decoder models are performing, continuously learning as the encoder and decoder would.