Mercurial > repos > vipints > qseq2fastq
changeset 0:6682236a1432 default tip
Migrated tool version 0.2 from old tool shed archive to new tool shed repository
author | vipints |
---|---|
date | Tue, 07 Jun 2011 17:42:46 -0400 |
parents | |
children | |
files | qseq2fastq/README qseq2fastq/qseq2fastq.pl qseq2fastq/qseq2fastq.xml qseq2fastq/qseq_test.fastq qseq2fastq/qseq_test.txt |
diffstat | 5 files changed, 380 insertions(+), 0 deletions(-) [+] |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/qseq2fastq/README Tue Jun 07 17:42:46 2011 -0400 @@ -0,0 +1,62 @@ +Analyzing next-generation sequencing data: Convert from qseq to fastqsanger + +HISTORY + +This tool was uploaded to the community site at http://community.g2.bx.psu.edu by +Vipin T Sreedharan as version 0.1 + +Ross Lazarus added a simple Galaxy functional test, some additional +documentation in the help section and changed the input data format to interval because this is +what the current sniffers will determine as the datatype of any uploaded qseq files if 'autodetect' is +used (which is how most users are likely to do it!) + +This is likely to be added to the default distribution if there's enough interest. +Until then, you may need to move the test files (qseq_test.*) into your tool-data directory +so galaxy can find them, if you want to run the functional tests eg as: +sh run_functional_tests -id qseq2fastq +from your Galaxy root, which should run this tool's test only +if you want to make sure it's all working right. + +AFAIK the output is already groomed so is set to fastqsanger - but I honestly don't know if Illumina +output files of this format are always so clean - if anyone knows, please let me (or Vipin) know +on the galaxy-dev list so we can make that clear for the next version + +CONTENTS + +qseq2fastq.xml: Tool configuration file. + +qseq2fastq.pl: The file converter program written in PERL. + +qseq_test.txt: A small qseq file for testing -> [your galaxy root]/test-data/ + +qseq_test.fastq: Output fastqsanger file for testing -> [your galaxy root]/test-data/ + +LICENSE + +Vipin's original did not mention a license, +so to keep things 'simple', +I'll add the same license as the Galaxy distribution to all the materials provided in +this source distribution as at October 2010 unless +there's some objection... + +Copyright (c) 2005 Pennsylvania State University + +Permission is hereby granted, free of charge, to any person obtaining +a copy of this software and associated documentation files (the +"Software"), to deal in the Software without restriction, including +without limitation the rights to use, copy, modify, merge, publish, +distribute, sublicense, and/or sell copies of the Software, and to +permit persons to whom the Software is furnished to do so, subject to +the following conditions: + +The above copyright notice and this permission notice shall be +included in all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, +EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF +MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. +IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY +CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, +TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE +SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/qseq2fastq/qseq2fastq.pl Tue Jun 07 17:42:46 2011 -0400 @@ -0,0 +1,51 @@ +#!/usr/bin/perl -w +use strict; +use Carp; + +my $usage = q( +qseq2fastq.pl - a script to convert all qseq files in a directory into a single fastq file with sanger-style ASCII q-score encoding +USAGE: qseq2fastq.pl <qseq.txt file> <output file> +); + +if (scalar(@ARGV) != 2) { + print $usage; + exit; +} + +my $in_file = $ARGV[0]; +my $output_fastq_file = $ARGV[1]; + +my $qfilter = ""; +open(OUTFASTAQFILE, "> $output_fastq_file"); + +open INFILE, "< $in_file" || die "Error: Couldn't open $in_file\n"; +while(<INFILE>) +{ + chomp; + my @this_line = split/\t/, $_; + croak("Error: invalid column number in $in_file\n") unless(scalar(@this_line) == 11); + if($this_line[10] == 1) { + $qfilter = "Y"; + } else { + $qfilter = "N"; + } + # Convert quality scores + my @quality_array = split(//, $this_line[9]); + my $phred_quality_string = ""; + # convert each char to Phred quality score + foreach my $this_char (@quality_array){ + my $phred_quality = ord($this_char) - 64; # convert illumina scaled phred char to phred quality score + my $phred_char = chr($phred_quality + 33); # convert phred quality score into phred char (sanger style) + $phred_quality_string = $phred_quality_string . $phred_char; + } + # replace "." gaps with N + $this_line[8] =~ s/\./N/g; + # output line + print OUTFASTAQFILE "@" . $this_line[2] . ":" . $this_line[3] . ":" . $this_line[4] . ":" . $this_line[5] . ":" . $qfilter . "\n" . #header line + $this_line[8] . "\n" . # output sequence + "+" . $this_line[2] . ":" . $this_line[3] . ":" . $this_line[4] . ":" . $this_line[5] . ":" . $qfilter . "\n" . # header line + $phred_quality_string . "\n"; # output quality string +} +close INFILE; +close OUTFASTAQFILE; +exit;
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/qseq2fastq/qseq2fastq.xml Tue Jun 07 17:42:46 2011 -0400 @@ -0,0 +1,67 @@ +<tool id="qseq2fastq" name="qseq_to_fastq" version="0.2"> + <description>Illumina HiSeq QSEQ output to FASTQ format</description> + <command interpreter="perl">qseq2fastq.pl + $qseq_input + $fastq_file + </command> + <inputs> + <param format="interval" name="qseq_input" type="data" label="Illumina QSEQ file from your current history" + help="File in QSEQ format, see below"/> + </inputs> + <outputs> + <data format="fastqsanger" name="fastq_file" label="FASTQ file"/> + </outputs> +<tests> +<test> +<param name='qseq_input' value='qseq_test.txt' ftype='interval' /> +<param name='fastq_file' file='qseq_test.fastq' ftype='fastqsanger' /> +</test> +</tests> + + <help> +**What it does** + +This tool converts Illumina QSEQ files into Phred FASTQ files. + +Convert an Illumina QSEQ file into Phred FASTQ format in your Galaxy history for downstream tools that require fastq. +Typically the output would be aligned with BWA or handled other Galaxy SRS tools. + +-------------- + +**Examples** + +- The following data in QSEQ file format:: + + HWI-EAS431 1 3 100 1792 1317 0 1 ................................A...G..............A..GG....G............... BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 + +- Will be converted to FASTQ file format:: + + @3:100:1792:1317:N + NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNANNNGNNNNNNNNNNNNNNANNGGNNNNGNNNNNNNNNNNNNNN + +3:100:1792:1317:N + ############################################################################ + +-------------- + +**About formats** + +**QSEQ format** QSEQ files are the output of the Illumina pipeline. These files contain the sequence, corresponding qualities, as well as lane, tile and X/Y position of clusters. + +According to Illumina manual qseq files have the following format: + +(1) Machine name: (hopefully) unique identifier of the sequencer. +(2) Run number: (hopefully) unique number to identify the run on the sequencer. +(3) Lane number: positive integer (currently 1-8). +(4) Tile number: positive integer. +(5) X: x coordinate of the spot. Integer (can be negative). +(6) Y: y coordinate of the spot. Integer (can be negative). +(7) Index: positive integer. No indexing should have a value of 1. +(8) Read Number: 1 for single reads; 1 or 2 for paired ends. +(9) Sequence +(10) Quality: the calibrated quality string. +(11) Filter: Did the read pass filtering? 0 - No, 1 - Yes. + + +**FASTQ format** A FASTQ file normally uses four lines per sequence. Line 1 begins with a '@' character and is followed by a sequence identifier and an optional description (like a FASTA title line). Line 2 is the raw sequence letters. Line 3 begins with a '+' character and is optionally followed by the same sequence identifier (and any description) again. Line 4 encodes the quality values for the sequence in Line 2, and must contain the same number of symbols as letters in the sequence. + </help> +</tool>
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/qseq2fastq/qseq_test.fastq Tue Jun 07 17:42:46 2011 -0400 @@ -0,0 +1,160 @@ +@1:1:0:652:N +TACTANGNTNNNNNNNNNNNNNNNNNNNNNNNNNNNTNTA ++1:1:0:652:N +######################################## +@1:1:0:792:N +CCCTTNTNTNNNNNNNNNNNNNNNNNNNNNNNNNNNANCC ++1:1:0:792:N +B####################################### +@1:1:0:2004:N +AGCTTNANANNNNNNNNNNNNNNNNNNNNNNNNNNNANGT ++1:1:0:2004:N +######################################## +@1:1:0:1386:N +TACTGNGNGNNNNNNNNNNNNNNNNNNNNNNNNNNNCNAG ++1:1:0:1386:N +######################################## +@1:1:0:1069:N +CTGTTNTNGNNNNNNNNNNNNNNNNNNNNNNNNNNNGNCA ++1:1:0:1069:N +######################################## +@1:1:0:704:N +AGCTTNCNTNNNNNNNNNNNNNNCNNNNNNNNNNNNGNAG ++1:1:0:704:N +9####################################### +@1:1:0:462:N +CTGTTNCNTNNNNNNNNNNNANNCNNNNNNNNNNNNGNAG ++1:1:0:462:N +######################################## +@1:1:0:99:N +CTGTTNCNTNNNNNNNNNNNANNCNNNNNNNNNNNGGNCG ++1:1:0:99:N +B####################################### +@1:1:0:297:N +GCTTANCNGNNNNNNNNNNNTNTTNNNNNNNNNNTCGNGA ++1:1:0:297:N +<####################################### +@1:1:0:1097:N +CCCTGNGCGNNNNNNNNNNNTNTTNNNNNNNNNNTCCTAG ++1:1:0:1097:N +B####################################### +@1:1:0:1739:N +AGCTANTAANNNNNNNNNNNTNGANNNNNNNNNNGTTATC ++1:1:0:1739:N +BC###################################### +@1:1:0:18:N +CCGTTNCTGNNNNNNNNNNNGNCTNNNNNNNNNNGCTGAG ++1:1:0:18:N +######################################## +@1:1:0:90:N +GCTTCNACGNNNNNNNNNNNTNTTNNNNNNNNNNTCCTCT ++1:1:0:90:N +######################################## +@1:1:0:123:N +ATCTTNCCTNNNNNNNNNNNCNTTNNNNNNNNNNCCTCTC ++1:1:0:123:N +######################################## +@1:1:0:188:N +GCTTANCAGNNNNNNNNNNNGNTTNNNNNNNNNNGCCGAG ++1:1:0:188:N +@####################################### +@1:1:0:254:N +GCTTANCCGNNNNNNNNNNNTNTTNNNNNNNNNNTCCTAG ++1:1:0:254:N +######################################## +@1:1:0:268:N +GCTTANCAGNNNNNNNNNNNGNTTNNNNNNNNNNGCCGAG ++1:1:0:268:N +######################################## +@1:1:0:289:N +GCTTANCCGNNNNNNNNNNNTNTTNNNNNNNNNNTCCTAT ++1:1:0:289:N +######################################## +@1:1:0:447:N +CCTTCNCCGNNNNNNNNNNNTNTTNNNNNNNNNNTCCTAG ++1:1:0:447:N +######################################## +@1:1:0:505:N +CTATANCCGNNNNNNNNNNNCNCCNNNNNNNNNNCGGCAT ++1:1:0:505:N +######################################## +@1:1:0:570:N +GAATANGTGNNNNNNNNNNNTNTCNNNNNNNNNNCCCCAA ++1:1:0:570:N +######################################## +@1:1:0:643:N +GTATTNCCTNNNNNNNNNNNANCCNNNNNNNNNNAGGCAG ++1:1:0:643:N +?####################################### +@1:1:0:773:N +GCTTANCCGNNNNNNNNNNNTNTTNNNNNNNNNNTCCTAG ++1:1:0:773:N +9####################################### +@1:1:0:789:N +GTTCTNGAANNNNNNNNNNNGNCGNNNNNNNNNNCAGTTT ++1:1:0:789:N +######################################## +@1:1:0:799:N +CCTTCNCCGNNNNNNNNNNNTNTTNNNNNNNNNNTCCTCT ++1:1:0:799:N +######################################## +@1:1:0:867:N +GTATANCAANNNNNNNNNNNTNTTNNNNNNNNNNTTCATC ++1:1:0:867:N +B####################################### +@1:1:0:888:N +CTTTCNTGGNNNNNNNNNNNANAANNNNNNNNNNTTAACC ++1:1:0:888:N +######################################## +@1:1:0:1092:N +GTATANACGNNNNNNNNNNNTNATNNNNNNNNNNTCATTT ++1:1:0:1092:N +B####################################### +@1:1:0:1129:N +GTATGNTTANNNNNNNNNNNCNTTNNNNNNNNNNACTTAA ++1:1:0:1129:N +######################################## +@1:1:0:1206:N +GCGTCNCCTNNNNNNNNNNNTNTTNNNNNNNNNNTCCTAG ++1:1:0:1206:N +######################################## +@1:1:0:1229:N +GNATGNATANNNNNNNNNNNCNTCNNNNNNNNNNCGGCAG ++1:1:0:1229:N +######################################## +@1:1:0:1268:N +TACTCNTCTNNNNNNNNNNNANCGNNNNNNNNNNAATTGG ++1:1:0:1268:N +B####################################### +@1:1:0:1349:N +GTATTNGTANNNNNNNNNNNCNCTNNNNNNNNNNTCCCCA ++1:1:0:1349:N +######################################## +@1:1:0:1382:N +GCTTCNCCTNNNNNNNNNNNTNTTNNNNNNNNNNTCCTCT ++1:1:0:1382:N +?####################################### +@1:1:0:1415:N +GATTCNCCTNNNNNNNNNNNCNTTNNNNNNNNNNCCTCGC ++1:1:0:1415:N +######################################## +@1:1:0:1422:N +CCCTCNTGTNNNNNNNNNNNTNTTNNNNNNNNNNTTTCTT ++1:1:0:1422:N +B####################################### +@1:1:0:1613:N +TCTTCNCCTNNNNNNNNNNNTNTTNNNNNNNNNNTCCTCT ++1:1:0:1613:N +:####################################### +@1:1:0:1621:N +CCTCTNCGCNNNNNNNNNNNANTTNNNNNNNNNNTCTATT ++1:1:0:1621:N +######################################## +@1:1:0:1715:N +GTATANCCGNNNNNNNNNNNCNCCNNNNNNNNNNTCCCAC ++1:1:0:1715:N +A####################################### +@1:1:0:1793:N +GTATCNTGTNNNNNNNNNNNGNTTNNNNNNNNNNTAGTTT ++1:1:0:1793:N +B#######################################
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/qseq2fastq/qseq_test.txt Tue Jun 07 17:42:46 2011 -0400 @@ -0,0 +1,40 @@ +HWI-EAS121 1 1 1 0 652 0 1 TACTA.G.T...........................T.TA BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 792 0 1 CCCTT.T.T...........................A.CC aBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 2004 0 1 AGCTT.A.A...........................A.GT BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 1386 0 1 TACTG.G.G...........................C.AG BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 1069 0 1 CTGTT.T.G...........................G.CA BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 704 0 1 AGCTT.C.T..............C............G.AG XBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 462 0 1 CTGTT.C.T...........A..C............G.AG BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 99 0 1 CTGTT.C.T...........A..C...........GG.CG aBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 297 0 1 GCTTA.C.G...........T.TT..........TCG.GA [BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 1097 0 1 CCCTG.GCG...........T.TT..........TCCTAG aBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 1739 0 1 AGCTA.TAA...........T.GA..........GTTATC abBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 18 0 1 CCGTT.CTG...........G.CT..........GCTGAG BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 90 0 1 GCTTC.ACG...........T.TT..........TCCTCT BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 123 0 1 ATCTT.CCT...........C.TT..........CCTCTC BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 188 0 1 GCTTA.CAG...........G.TT..........GCCGAG _BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 254 0 1 GCTTA.CCG...........T.TT..........TCCTAG BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 268 0 1 GCTTA.CAG...........G.TT..........GCCGAG BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 289 0 1 GCTTA.CCG...........T.TT..........TCCTAT BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 447 0 1 CCTTC.CCG...........T.TT..........TCCTAG BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 505 0 1 CTATA.CCG...........C.CC..........CGGCAT BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 570 0 1 GAATA.GTG...........T.TC..........CCCCAA BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 643 0 1 GTATT.CCT...........A.CC..........AGGCAG ^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 773 0 1 GCTTA.CCG...........T.TT..........TCCTAG XBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 789 0 1 GTTCT.GAA...........G.CG..........CAGTTT BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 799 0 1 CCTTC.CCG...........T.TT..........TCCTCT BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 867 0 1 GTATA.CAA...........T.TT..........TTCATC aBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 888 0 1 CTTTC.TGG...........A.AA..........TTAACC BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 1092 0 1 GTATA.ACG...........T.AT..........TCATTT aBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 1129 0 1 GTATG.TTA...........C.TT..........ACTTAA BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 1206 0 1 GCGTC.CCT...........T.TT..........TCCTAG BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 1229 0 1 G.ATG.ATA...........C.TC..........CGGCAG BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 1268 0 1 TACTC.TCT...........A.CG..........AATTGG aBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 1349 0 1 GTATT.GTA...........C.CT..........TCCCCA BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 1382 0 1 GCTTC.CCT...........T.TT..........TCCTCT ^BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 1415 0 1 GATTC.CCT...........C.TT..........CCTCGC BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 1422 0 1 CCCTC.TGT...........T.TT..........TTTCTT aBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 1613 0 1 TCTTC.CCT...........T.TT..........TCCTCT YBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 1621 0 1 CCTCT.CGC...........A.TT..........TCTATT BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 1715 0 1 GTATA.CCG...........C.CC..........TCCCAC `BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0 +HWI-EAS121 1 1 1 0 1793 0 1 GTATC.TGT...........G.TT..........TAGTTT aBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB 0