Withdraw
Loading…
Efficient and scalable algorithms for phylogenomics and overlapping communities
Liu, Baqiao
Loading…
Permalink
https://hdl.handle.net/2142/125530
Description
- Title
- Efficient and scalable algorithms for phylogenomics and overlapping communities
- Author(s)
- Liu, Baqiao
- Issue Date
- 2024-06-24
- Director of Research (if dissertation) or Advisor (if thesis)
- Warnow, Tandy
- Doctoral Committee Chair(s)
- Warnow, Tandy
- Committee Member(s)
- Rauchwerger, Lawrence
- El-Kebir, Mohammed
- Tong, Hanghang
- Bader, David A
- Department of Study
- Computer Science
- Discipline
- Computer Science
- Degree Granting Institution
- University of Illinois at Urbana-Champaign
- Degree Name
- Ph.D.
- Degree Level
- Dissertation
- Keyword(s)
- phylogenomics
- species trees
- phylogenetics
- incomplete lineage sorting
- multispecies coalescent
- multiple sequence alignment
- chapel
- computational biology
- sequence length heterogeneity
- Abstract
- In computational biology, the reconstruction of evolutionary histories is a fundamental problem. Phylogenomics, the reconstruction of evolutionary history using genomics data, like many other sub-areas of computational biology, has benefited tremendously from computational techniques and has a rich tradition of both computational theory and practice. This thesis contributes to this field by developing scalable and accurate computational algorithms in two sub-areas of phylogenomics -- species tree estimation, and multiple sequence alignment. Moreover, we generalize our discrete algorithm research into overlapping community detection in large-scale networks, with a slightly different focus on high performance computing. Our work in phylogenomics centers first on species tree estimation, a challenging task due to evolutionary complexities like incomplete lineage sorting and gene tree reconstruction error. We introduce Weighted ASTRID, an extension of the ASTRID method; we notably enhance species tree estimations by incorporating gene tree uncertainty inspired by recent works. This approach significantly boosts both accuracy and robustness in the face of gene tree reconstruction error, and we designed a more efficient implementation of ASTRID. Additionally, we present NJst-J and FASTRAL-J, methods leveraging prior species tree knowledge for refined inference. We venture briefly into multiple sequence alignment with WITCH-NG, an optimized version of the prior method WITCH both algorithmically and implementation-wise for adding new sequences into existing alignments. These contributions collectively offer not only theoretical advancements but also efficiently implemented practical tools for the phylogenomics community. In parallel, the dissertation extends into overlapping community detection in large-scale networks, acting as a generalization of our study of discrete algorithms in trees to real world networks. Using the Chapel programming language, we propose novel methodologies within the Arachne graph framework, designing a distributed version of the HDBSCAN clustering method specialized for graph clustering. This body of work emphasizes a balance between theoretical accuracy and practical efficiency. We not only contribute to the theoretical efficiency of the algorithms, but all our algorithms are also efficiently implemented, leveraging modern programming languages, hardware capabilities, and software engineering practices.
- Graduation Semester
- 2024-08
- Type of Resource
- Thesis
- Handle URL
- https://hdl.handle.net/2142/125530
- Copyright and License Information
- Copyright 2024 Baqiao Liu
Owning Collections
Graduate Dissertations and Theses at Illinois PRIMARY
Graduate Theses and Dissertations at IllinoisManage Files
Loading…
Edit Collection Membership
Loading…
Edit Metadata
Loading…
Edit Properties
Loading…
Embargoes
Loading…