Files in this item

FilesDescriptionFormat

application/pdf

application/pdftagdigger2016.pdf (449kB)
Main articlePDF

application/zip

application/ziptagdigger-1.0.tar.gz (30kB)
Source code for TagDiggerZIP

Description

Title:TagDigger: user-friendly extraction of read counts from GBS and RAD-seq data
Author(s):Clark, Lindsay V.; Sacks, Erik J.
Subject(s):Genotyping-by-sequencing
Meta-analysis
Read depth
Restriction site-associated DNA sequencing
Single nucleotide polymorphism (SNP)
Tag counts
Abstract:Background: In genotyping-by-sequencing (GBS) and restriction site-associated DNA sequencing (RAD-seq), read depth is important for assessing the quality of genotype calls and estimating allele dosage in polyploids. However, existing pipelines for GBS and RAD-seq do not provide read counts in formats that are both accurate and easy to access. Additionally, although existing pipelines allow previously-mined SNPs to be genotyped on new samples, they do not allow the user to manually specify a subset of loci to examine. Pipelines that do not use a reference genome assign arbitrary names to SNPs, making meta-analysis across projects difficult. Results: We created the software TagDigger, which includes three programs for analyzing GBS and RAD-seq data. The first script, tagdigger_interactive.py, rapidly extracts read counts and genotypes from FASTQ files using user-supplied sets of barcodes and tags. Input and output is in CSV format so that it can be opened by spreadsheet software. Tag sequences can also be imported from the Stacks, TASSEL-GBSv2, TASSEL-UNEAK, or pyRAD pipelines, and a separate file can be imported listing the names of markers to retain. A second script, tag_manager.py, consolidates marker names and sequences across multiple projects. A third script, barcode_splitter.py, assists with preparing FASTQ data for deposit in a public archive by splitting FASTQ files by barcode and generating MD5 checksums for the resulting files. Conclusions: TagDigger is open-source and freely available software written in Python 3. It uses a scalable, rapid search algorithm that can process over 100 million FASTQ reads per hour. TagDigger will run on a laptop with any operating system, does not consume hard drive space with intermediate files, and does not require programming skill to use.
Issue Date:2016-07-11
Publisher:BioMed Central
Citation Info:Lindsay V. Clark and Erik J. Sacks (2016) "TagDigger: User-friendly extraction of read counts from GBS and RAD-seq data" Source Code for Biology and Medicine 11:11. doi: 10.1186/s13029-016-0057-7
Genre:Article
Type:Text
Language:English
URI:http://hdl.handle.net/2142/95132
DOI:https://doi.org/10.1186/s13029-016-0057-7
Sponsor:DOE Office of Science, Office of Biological and Environmental Research (grant number DE-SC0012379)
Date Available in IDEALS:2017-02-07


This item appears in the following Collection(s)

Item Statistics