Files in this item



application/pdfYAN-THESIS-2019.pdf (10MB)
(no description provided)PDF


Title:Audio compression via nonlinear transform coding and stochastic binary activation
Author(s):Yan, Yuanheng
Advisor(s):Smaragdis, Paris
Department / Program:Electrical & Computer Eng
Discipline:Electrical & Computer Engr
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Audio compression
Neural network
Convolutional neural network (CNN)
Stochastic binary activation
Abstract:Engineers have pushed the boundaries of audio compression and designed numerous lossy audio compression codecs, such as ACC, WNA, and others, that have surpassed the longstanding MP3 coding format. However most of the methods are laboriously engineered using psychoacoustic modeling, and some of them are proprietary and only see limited use. This thesis, inspired by recent major breakthroughs in lossy image compression via machine learning methods, explores the possibilities of a neural network trained for lossy audio compression. Currently there are few if any audio compression methods that utilize machine learning. This thesis presents a brief introduction to lossy transform compression and compares it to similar machine learning concepts, then systematically presents a convolutional autoencoder network with a stochastic binary activation for a sparse representation of the code space to achieve compression. A similar network is employed for encoding the residual of the main network. Our network achieves average compression rates of roughly 5 to 2 and introduces few if any audible artifacts, presenting a promising opening to audio compression using machine learning.
Issue Date:2019-07-18
Rights Information:Copyright 2019 Yuanheng Yan
Date Available in IDEALS:2019-11-26
Date Deposited:2019-08

This item appears in the following Collection(s)

Item Statistics