Statistical binning improves species tree estimation in the presence of gene tree incongruence
Gene tree incongruence arising from incomplete lineage sorting (ILS) can reduce accuracy of concatenation-based estimations of species trees. While coalescent-based species tree estimation methods can have good accuracy in the presence of ILS, they are sensitive to gene tree estimation error. We propose a pipeline that uses bootstrapping to evaluate whether two genes are likely to have the same tree, then groups genes into sets using a graph-theoretic optimization and estimates a tree on each subset using concatenation, and finally produces an estimated species tree from these trees using the preferred coalescent-based method. Statistical binning improves the accuracy of MP-EST, a popular coalescent-based method, and we use it to produce the first genome-scale coalescent-based avian tree of life.
Siavash Mirarab, Md. Shamsuzzoha Bayzid, Bastien Boussau, and Tandy Warnow Science 12 December 2014: 346 (6215), 1250463 [DOI:10.1126/science.1250463]
Thirteen datasets are available for download and use. Use the browse menu at right to access all of the datasets. Please refer to this README file for a description of the files included in this collection.
To reference the entire dataset collection, please cite:
Mirarab, Siavash; Bayzid, Md Shamsuzzoha; Boussau, Bastien; Warnow, Tandy (2014) Statistical binning improves species tree estimation in the presence of gene tree incongruence; IDEALS. http://dx.doi.org/10.13012/C5MW2F2P.
DOIs for specific datasets are provided on the individual dataset pages.