Files in this item



application/pdfECE499-Fa2011-kejriwal.pdf (1MB)Restricted to U of Illinois
(no description provided)PDF


Title:Machine Learning Techniques in Offline Handwriting Transcription
Author(s):Kejriwal, Mayank
Contributor(s):McHenry, Kenton; Carney, P. Scott
Subject(s):automatic handwriting transcription
handwriting recognition
machine learning
Abstract:In this thesis, we investigate techniques for the automatic transcription of handwritten text in digitally scanned United States Census forms from the 1930s. We experimentally show that Word Spotting techniques like Dynamic Time Distance Matching, Corner Feature Correspondence, and to a certain extent, Euclidean Distance Matching, outperform recognition approaches using Linear Algorithms and Decision Trees on our task. We achieve accuracy rates of over 80% on a 2500 image binary classification task in competitive time using DTW, and above 75% on the more complicated 36 class States column on a top 5 evaluation, using both CFC and DTW. We discuss preprocessing algorithms and heuristics, and show visual results of these techniques. In addition, we briefly discuss feature generation and the relevance of certain features in that context. Specifically, we show that even a small set of relevant features can be sufficient for good DTW performance, and that good preprocessing can greatly increase accuracy rates on almost all algorithms. We conclude with the overall significance of this work, future prospects of the research, as well as early results of some experimental work.
Issue Date:2011-12
Publication Status:unpublished
Peer Reviewed:not peer reviewed
Date Available in IDEALS:2014-01-13

This item appears in the following Collection(s)

Item Statistics