Files in this item

FilesDescriptionFormat

application/pdf

application/pdfLONDON-THESIS-2021.pdf (7MB)Restricted Access
(no description provided)PDF

Description

Title:Improving the genome assembly and annotation of the white-tailed deer (Odocoileus virginianus borealis)
Author(s):London, Evan W.
Advisor(s):Mateus-Pinilla, Nohra E
Contributor(s):Novakofski, Jan E; Roca, Alfred L; Catchen, Julian M
Department / Program:Animal Sciences
Discipline:Bioinformatics
Degree Granting Institution:University of Illinois at Urbana-Champaign
Degree:M.S.
Genre:Thesis
Subject(s):Genomic resource
PacBio sequencing, Wildlife disease
Abstract:Widely distributed in North America, the white-tailed deer (Odocoileus virginianus) has recreational and commercial value and is a food source for many communities. The impacts that deer impose on agriculture, conservation, and public health are rising. They are responsible for deer-vehicle collisions and damage to crops and natural areas. The species is affected by infectious diseases such as chronic wasting disease, epizootic hemorrhagic disease, and bovine tuberculosis. Genomic resources facilitate the study of pathogens, host-pathogen interactions, host genetic variation, and behavior. Repetitive elements are ubiquitous within mammalian genomes, and long single-molecule reads produced by third-generation sequencing can span these regions. I present a genome produced with DNA from a single white-tailed deer sequenced on the PacBio Sequel II platform and assembled using Redbean (WTDBG2) long-read assembly software. Post-assembly, long and short reads from the same animal were used for error-correcting and polishing the assembly. Gene models were predicted with the BRAKER annotation pipeline using RNA and protein sequences as extrinsic evidence. The final assembly was highly contiguous, with 90% of the total length represented by 134 contigs. The largest contig was 108 million base pairs. Functional annotation was performed using reciprocal best hits with cattle protein sequences. Protein function was able to be assigned to 16,125 coding sequences. The locations of genes related to CWD, EHD, and bTB were also identified. An analysis using the sequentially Markovian coalescent was used to infer population diversity of white-tailed deer for the past 2 million years. This accurate and more complete assembly will support future genomic studies on white-tailed deer and permit the use of chromatin-contact information to construct a chromosome-level assembly of the genome.
Issue Date:2021-04-27
Type:Thesis
URI:http://hdl.handle.net/2142/110854
Rights Information:© 2021 Evan W. London
Date Available in IDEALS:2021-09-17
Date Deposited:2021-05


This item appears in the following Collection(s)

Item Statistics