Título: ChromatoGate: A Tool for Detecting Base Mis-Calls in Multiple Sequence Alignments by Semi-Automatic Chromatogram Inspection
Autores: Alachiotis, Nikolaos; Heidelberg Institute for Theoretical Studies (HITS gGmbH)
Vogiatzi, Emmanouella; Institute of Marine Biology and Genetics, HCMR
Pavlidis, Pavlos; Heidelberg Institute for Theoretical Studies (HITS gGmbH)
Stamatakis, Alexandros; Heidelberg Institute for Theoretical Studies (HITS gGmbH)
Fecha: 2013-05-08
Publicador: Computacional and structural biotechnology journal
Fuente:
Tipo: info:eu-repo/semantics/article
info:eu-repo/semantics/publishedVersion

Tema: No aplica
Descripción: Automated DNA sequencers generate chromatograms that contain raw sequencing data. They also generate data that translates the chromatograms into molecular sequences of A, C, G, T, or N (undetermined) characters. Since chromatogram translation programs frequently introduce errors, a manual inspection of the generated sequence data is required. As sequence numbers and lengths increase, visual inspection and manual correction of chromatograms and corresponding sequences on a per-peak and per-nucleotide basis becomes an error-prone, time-consuming, and tedious process. Here, we introduce ChromatoGate (CG), an open-source software that accelerates and partially automates the inspection of chromatograms and the detection of sequencing errors for bidirectional sequencing runs. To provide users full control over the error correction process, a fully automated error correction algorithm has not been implemented. Initially, the program scans a given multiple sequence alignment (MSA) for potential sequencing errors, assuming that each polymorphic site in the alignment may be attributed to a sequencing error with a certain probability. The guided MSA assembly procedure in ChromatoGate detects chromatogram peaks of all characters in an alignment that lead to polymorphic sites, given a user-defined threshold. The threshold value represents the sensitivity of the sequencing error detection mechanism. After this pre-filtering, the user only needs to inspect a small number of peaks in every chromatogram to correct sequencing errors. Finally, we show that correcting sequencing errors is important, because population genetic and phylogenetic inferences can be misled by MSAs with uncorrected mis-calls. Our experiments indicate that estimates of population mutation rates can be affected two- to three-fold by uncorrected errors.
Idioma: Inglés

Artículos similares:

Systems biology and metabolic engineering of Arthrospira cell factories por Klanchui, Amornpan,Vorapreeda, Tayvich,Vongsangnak, Wanwipa,Kannapho, Chiraphan,Cheevadhanarak, Supapon,Meechai, Asawin
The Role of INDY in Metabolic Regulation por Willmes, Diana M; Charité University School of Medicine Berlin,Birkenfeld, Andreas L; Charité University School of Medicine Berlin
Structure-based Methods for Computational Protein Functional Site Prediction por KC, Dukka B; North Carolina A&T State University
The Biochemistry of Vitreoscilla hemoglobin por Stark, Benjamin C.; Illinois Institute of Technology,Dikshit, Kanak L.; Institute of Microbial Technology,Pagilla, Krishna R.; Illinois Institute of Technology
Computer-Aided Protein Directed Evolution: a Review of Web Servers, Databases and other Computational Tools for Protein Engineering por Verma, Rajni; Jacobs University Bremen,Schwaneberg, Ulrich; RWTH Aachen University,Roccatano, Danilo; Jacobs University Bremen
A method to predict edge strands in beta-sheets from protein sequences por Guilloux, Antonin,Caudron, Bernard,Jestin, Jean-Luc
MD simulation studies to investigate iso-energetic conformational behaviour of modified nucleosides m2G and m22G present in tRNA por Bavi, Rohit S,Sambhare, Susmit B,Sonawane, Kailas D; Structural Bioinformatics Unit, Department of Biochemistry, Shivaji University, Kolhapur 416 004, Maharashtra (M.S.), India.
Metabolomics in the identification of biomarkers of dietary intake por O’Gorman, Aoife,Gibbons, Helena,Brennan, Lorraine
10