Inference of degree of significance of single amino acids from the literature
- Inference of degree of significance of single amino acids from the literature
- Becker, Anthony
- Issue Date
- Director of Research (if dissertation) or Advisor (if thesis)
- Jakobsson, Eric
- Committee Member(s)
- Nelson, Mark E.
- Chung, Hee Jung
- Anastasio, Thomas J.
- Grosman, Claudio F.
- Department of Study
- Molecular & Integrative Physl
- Molecular & Integrative Physi
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Degree Level
- homologous amino acid residue and mutated residue search in journal article literature publications
- amino acid
- homologous amino acid residue
- mutated residue search
- homologous amino acid residue search
- mutated residue
- journal article literature
- journal literature
- Several subfamilies of potassium channels are highly conserved along the vast majority of the protein sequence among a wide array of very distantly related animals. We call this characteristic “hyperconservation”. In this work we create a quantitative definition and explore the degree of hyperconservation and characterization in each of the well-known potassium channel subfamilies. In general the potassium channels seem to exhibit a large degree of hyperconservation within the subfamilies but a wide diversity (to the point of confounding alignment) between subfamilies. Here we examine the literature of one potassium channel subfamily (KCNA2) to determine whether or not all of the completely conserved residues have been noted and considered for functional inference. Out of several thousand papers, we find four residues that are completely conserved but unmentioned in any article; F85, E112, P156, S159. F85 and E112 are in fact completely conserved within and across different K channel subfamilies. The challenges encountered during this search, plus the fact that some completely conserved residues have been overlooked, make it clear that there needs to be a more automated method for extracting sequence-related information from literature articles. The work in this thesis emerged from considering the problem of how to intensively study a protein family based on the sequences for the family. In the first part of the thesis, we consider the issue of studying a family of potassium channels, residue-by-residue. This involves accounting for a history in which residue numbering systems and protein nomenclature are variable throughout the literature on this family. Discovering information in literature about single residues in any protein family can be daunting considering that the residues have a different number placement in each sequence. Then one must consider the change in numbers for each isoform or if an author renumbers them from a sequence section. This problem is greatly compounded when one wishes to consider orthologs and paralogs to these orthologs (homologs) in all species. This involves accounting for a history in which residue numbering systems and protein nomenclature are variable throughout the literature on any family. This has resulted in the creation of a program called FiSHAAL-Finding Single Homologous Amino Acids. It is offered as a prototype literature amino acid location determination program for partial automation of identifying homologous residues and linking any corresponding residues in an alignment column to their PubMed IDs. Ultimate Hypothesis: Can accurate homologous amino acid residue mention information be linked effectively to all PubMed articles in a semi-automated fashion?
- Graduation Semester
- Copyright and License Information
- Copyright 2011 Anthony Becker
Edit Collection Membership