Files in this item



application/pdf9990052.pdf (14MB)Restricted to U of Illinois
(no description provided)PDF


Title:Sequentialized Language Models
Author(s):Lake, John Michael
Doctoral Committee Chair(s):DeJong, Gerald F.
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Computer Science
Abstract:We then turn to construction of a sequentialized grammatical model of linguistic objects in text compression. We develop the Prediction by Grammatical Match technique, a new compression framework employing a static context-free grammar and an adaptive finite-context statistical model. These compressors are adaptive, general compressors that operate in linear time and bounded space. We show these compressors can deliver substantial reductions in both bits-per-character rates and space usage, and suffer almost no penalty when the grammar does not apply. The new technique rests on three primary technical innovations: an algorithm for designing an optimal, strictly bottom-up parseable metalanguage for a compression scheme comprising multiple grammars; a principled approach to ambiguity and agrammatical text; and an incremental analysis selection algorithm. The metalanguage construction emphasizes lexical left-corner analysis descriptions, with each symbol in a description representing a maximal bundle of bottom-up and top-down information by naming the production introducing the next lexical left-corner item. These three innovations combine into a very powerful compression system that solves an important, long standing problem: efficient and effective use of context-free grammars in general data compression.
Issue Date:2000
Description:196 p.
Thesis (Ph.D.)--University of Illinois at Urbana-Champaign, 2000.
Other Identifier(s):(MiAaPQ)AAI9990052
Date Available in IDEALS:2015-09-25
Date Deposited:2000

This item appears in the following Collection(s)

Item Statistics