Files in this item

FilesDescriptionFormat

application/pdf

application/pdfXIA-THESIS-2015.pdf (4MB)Restricted Access
(no description provided)PDF

Description

Title:FPGA implementation of a Restricted Boltzmann Machine for handwriting recognition
Author(s):Xia, Tian
Department / Program:Electrical & Computer Eng
Discipline:Electrical & Computer Engr
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:M.S.
Genre:Thesis
Subject(s):Field-Programmable Gate Array (FPGA)
Restricted Boltzmann Machine (RBM)
Abstract:Despite the recent success of neural network in the research eld, the num- ber of resulting applications for non-academic settings is very limited. One setback for its popularity is that neural networks are typically implemented as software running on a general-purpose processor. The time complexity of the software implementation is usually O(n2). As a result, neural net- works are inadequate to meet the scalability and performance requirements for commercial or industrial uses. Several research works have dealt with accelerating neural networks on Field-Programmable Gate Arrays (FPGAs), particularly for Restricted Boltzmann Machines (RBMs) | a very popular and hardware-friendly neural network model. However, when using their implementations for handwriting recognition, there are two major setbacks. First, the implementations assume that the sizes of the neural networks are symmetric, while the size of RBM model for handwriting recognition is in fact highly asymmetric. Second, these implementations cannot t a model with a visible layer larger than 512 nodes on a single FPGA. Thus, they are highly ine cient when apply to handwriting recognition application. In this thesis, a new framework was proposed for an RBM with asymmetric weights optimizing for handwriting recognition. The framework is tested on an Altera Stratix IV GX(EP4SGX230KF40C2) FPGA running at 100 MHz. The resources support a complete RBM model of 784 by 10 nodes. The experimental results show the computational speed of 4 billion connection- update-per-second and a speed-up of 134 fold with I/O time and a speed- up of 161 fold without I/O time compared with an optimized MATLAB implementation running on a 2.50 GHz Intel processor. Compared with previous works, our implementation is able to achieve a much higher speed- up while maintaining comparable resources used.
Issue Date:2015-04-30
Type:Thesis
URI:http://hdl.handle.net/2142/78800
Rights Information:Copyright 2015 Tian Xia
Date Available in IDEALS:2015-07-22
Date Deposited:May 2015


This item appears in the following Collection(s)

Item Statistics