Files in this item



application/pdfYE-THESIS-2019.pdf (4MB)
(no description provided)PDF


Title:Efficient algorithm for selecting protein residue-residue contacts
Author(s):Ye, Qing
Advisor(s):Peng, Jian
Department / Program:Computer Science
Discipline:Computer Science
Degree Granting Institution:University of Illinois at Urbana-Champaign
Subject(s):Protein Contacts
Protein Structure
Contact Selection
Protein Folding
Integer Programming
Abstract:The functions of proteins are largely determined by their structures. Determination of the protein three-dimensional structure is experimentally and computationally challenging. Since amino acids residues that are spatially close often co-evolve, the correlation allows us to predict the contacts from multiple sequence alignments. The predicted contacts can then be used as spatial constraints and offer guidance in protein structure prediction. The constraints can be used as inputs to a protein structure prediction algorithm to produce "decoy" models as tentative 3D structures for proteins. However, the computation power required for structural prediction grows exponentially with respect to the number of contacts selected. Thus selecting few and yet informative contacts are essential for producing high-quality models quickly. Existing contact prediction methods aim for improving precision and recall. However, not all contacts offer the same level of structural information in terms of structure prediction. Therefore, the strategy to select contacts of highest confidence may not be ideal for structure prediction. Here we present an efficient algorithm, ContactSel, to select contacts for assisting contact-guided ab inito folding. We take the key idea that contacts that involve residues far apart (long-ranged) and collections of contacts that are most diverse contains more information than contacts that are shorter ranged and closed by. We formulate the contact selection problem into an integer programming algorithm to select structurally diverse contacts. For evaluation, we generated decoy models using L/2 contacts selected by ContactSel and a naive selection baseline. We show that we achieved significant improvement on the CASP 12 domain set.
Issue Date:2019-04-26
Rights Information:Copyright 2019 Qing Ye
Date Available in IDEALS:2019-08-23
Date Deposited:2019-05

This item appears in the following Collection(s)

Item Statistics