Files in this item
|(no description provided)|
|Title:||Change of representation in machine learning, and an application to protein structure prediction|
|Author(s):||Ioerger, Thomas Richard|
|Doctoral Committee Chair(s):||Rendell, Larry A.|
|Department / Program:||Computer Science|
|Degree Granting Institution:||University of Illinois at Urbana-Champaign|
|Abstract:||While many excellent induction algorithms are known for making predictions from databases in well-studied domains, learning systems still perform poorly in many difficult real-world domains, such as weather prediction or financial risk analysis. Two characteristics of real-world domains are inadequately addressed by current machine learning research. First, the difficulty in these domains is often caused by a low-level representation, which necessitates shifting to a higher-level representation. But the space of possible representations is very large, so we need intelligent methods for finding higher-level representations. Second, background knowledge is almost always available in real-world domains, which we would like to take advantage of to increase predictive accuracy. However, known roles for domain knowledge in machine learning are often inflexible, requiring the use of a specific induction algorithm or being sensitive to incorrectness or incompleteness in the knowledge.
We propose a general framework for change-of-representation based on searching for alternative representations to improve the accuracy of an underlying induction algorithm. Representations are selected as candidates by querying a strategy component, which relies on domain knowledge to suggest which alternatives to search. An evaluation component then compares these representations by applying each representation to a set of examples and running the induction algorithm on the transformed examples to empirically determine the effect of the change on accuracy. This approach provides solutions to the two characteristic problems of learning in real-world domains. First, domain knowledge is used as a heuristic to guide the search for alternative representations, enabling more intelligent decisions during change-of-representation. Second, the framework provides a flexible role for knowledge that can be used with any learning algorithm and is tolerant of uncertainty. An implementation of this framework could be used as an interface between a human expert and a learning program in which: (1) the human uses background knowledge to generate and prioritize alternative representations, and (2) the system empirically evaluates these to discover the best change for improving accuracy.
We apply our framework for change-of-representation to the difficult, real-world domain of protein tertiary (3D) structure prediction. The best computational method to date for determining the structure of a protein from its amino acid sequence is homology modeling, which is based on sequence alignments with a protein database. Homology modeling can fail in cases where the sequence similarity is low between proteins with similar structures. However, the physical and chemical properties of amino acids are believed to relevant to protein structure. Using an instantiation of our framework, we incorporate this domain knowledge to suggest ways to change the representation of amino acid sequences. Efficient search procedures are derived from the knowledge that lead to the discovery of representations that improve the ability to predict protein structures by homology modeling.
|Rights Information:||Copyright 1996 Ioerger, Thomas Richard|
|Date Available in IDEALS:||2011-05-07|
|Identifier in Online Catalog:||AAI9712321|