IDEALS Home University of Illinois at Urbana-Champaign logo The Alma Mater The Main Quad

Inference of degree of significance of single amino acids from the literature

Show full item record

Bookmark or cite this item: http://hdl.handle.net/2142/29776

Files in this item

File Description Format
PDF Becker_Anthony.pdf (46MB) (no description provided) PDF
ZIP fishaal.zip (14MB) (no description provided) ZIP
Unknown barplot.py (1KB) (no description provided) Unknown
Text file completequerylist.txt (49KB) (no description provided) Text file
Microsoft Excel kcna2vertalignment.xls (3MB) (no description provided) Microsoft Excel
Microsoft Excel pdbdistBE112.xls (974KB) (no description provided) Microsoft Excel
Microsoft Excel pdbdistBF85.xls (1MB) (no description provided) Microsoft Excel
Unknown Perlprogram.pl (78KB) (no description provided) Unknown
Unknown SFFinderv2.0 (3KB) (no description provided) Unknown
Microsoft Word supplementreferences.doc (344KB) (no description provided) Microsoft Word
Other Available Formats
CSV file pdbdistBF85.xls.csv (31KB) Automatically converted using OpenOffice.org CSV file
CSV file pdbdistBE112.xls.csv (15KB) Automatically converted using OpenOffice.org CSV file
CSV file kcna2vertalignment.xls.csv (133KB) Automatically converted using OpenOffice.org CSV file
Title: Inference of degree of significance of single amino acids from the literature
Author(s): Becker, Anthony
Advisor(s): Jakobsson, Eric
Contributor(s): Nelson, Mark E.; Chung, Hee Jung; Anastasio, Thomas J.; Grosman, Claudio F.
Department / Program: Molecular & Integrative Physl
Discipline: Molecular & Integrative Physi
Degree Granting Institution: University of Illinois at Urbana-Champaign
Degree: Ph.D.
Genre: Doctoral
Subject(s): homologous amino acid residue and mutated residue search in journal article literature publications amino acid homologous amino acid residue mutated residue search homologous amino acid residue search mutated residue journal article literature journal literature
Abstract: Several subfamilies of potassium channels are highly conserved along the vast majority of the protein sequence among a wide array of very distantly related animals. We call this characteristic “hyperconservation”. In this work we create a quantitative definition and explore the degree of hyperconservation and characterization in each of the well-known potassium channel subfamilies. In general the potassium channels seem to exhibit a large degree of hyperconservation within the subfamilies but a wide diversity (to the point of confounding alignment) between subfamilies. Here we examine the literature of one potassium channel subfamily (KCNA2) to determine whether or not all of the completely conserved residues have been noted and considered for functional inference. Out of several thousand papers, we find four residues that are completely conserved but unmentioned in any article; F85, E112, P156, S159. F85 and E112 are in fact completely conserved within and across different K channel subfamilies. The challenges encountered during this search, plus the fact that some completely conserved residues have been overlooked, make it clear that there needs to be a more automated method for extracting sequence-related information from literature articles. The work in this thesis emerged from considering the problem of how to intensively study a protein family based on the sequences for the family. In the first part of the thesis, we consider the issue of studying a family of potassium channels, residue-by-residue. This involves accounting for a history in which residue numbering systems and protein nomenclature are variable throughout the literature on this family. Discovering information in literature about single residues in any protein family can be daunting considering that the residues have a different number placement in each sequence. Then one must consider the change in numbers for each isoform or if an author renumbers them from a sequence section. This problem is greatly compounded when one wishes to consider orthologs and paralogs to these orthologs (homologs) in all species. This involves accounting for a history in which residue numbering systems and protein nomenclature are variable throughout the literature on any family. This has resulted in the creation of a program called FiSHAAL-Finding Single Homologous Amino Acids. It is offered as a prototype literature amino acid location determination program for partial automation of identifying homologous residues and linking any corresponding residues in an alignment column to their PubMed IDs. Ultimate Hypothesis: Can accurate homologous amino acid residue mention information be linked effectively to all PubMed articles in a semi-automated fashion?
Issue Date: 2012-02-06
Genre: thesis
URI: http://hdl.handle.net/2142/29776
Rights Information: Copyright 2011 Anthony Becker
Date Available in IDEALS: 2012-02-06
Date Deposited: 2011-12
 

This item appears in the following Collection(s)

Show full item record

Item Statistics

  • Total Downloads: 3269
  • Downloads this Month: 93
  • Downloads Today: 3

Browse

My Account

Information

Access Key