view SMART/galaxy/WrappGetLetterDistribution.xml @ 44:5f796c5c579f

Uploaded
author m-zytnicki
date Wed, 18 Sep 2013 08:32:38 -0400
parents 2c0c0a89fad7
children
line wrap: on
line source

<tool id="getLetterDistribution1" name="get letter distribution">
    <description>Calculate distribution for each nucleotide per position for all short reads</description>
	<requirements>
		<requirement type="set_environment">PYTHONPATH</requirement>
	</requirements>
    <command interpreter="python">
	    WrappGetLetterDistribution.py -i $inputFileName
	#if $formatType.FormatInputFileName == 'fasta':
		-f fasta
	#else :
		-f fastq
	#end if
	-c $ouputFileNameCSV -a $ouputFileNamePNG1 -b $ouputFileNamePNG2
    </command>
    <inputs>
             <conditional name="formatType">
      			<param name="FormatInputFileName" type="select" label="Input File Format">
	        		<option value="fasta">fasta</option>
       				<option value="fastq" selected="true">fastq</option>
      			</param>
      			<when value="fasta">
             			<param name="inputFileName" format="fasta" type="data" label="Fasta Input File"/>
      			</when>
      			<when value="fastq">
             			<param name="inputFileName" format="fastq" type="data" label="Fastq Input File"/>
      			</when>
             </conditional>
    </inputs>
        
    <outputs>
               	<data name="ouputFileNameCSV" format="tabular" label="[get letter distribution] CSV file"/>
               	<data name="ouputFileNamePNG1" format="png" label="[get letter distribution] PNG file 1"/>
               	<data name="ouputFileNamePNG2" format="png" label="[get letter distribution] PNG file 2"/>
    </outputs>
    <tests>
    	<test>
            <param name="FormatInputFileName" value="fastq" />
            <param name="inputFileName" value="short_fastq.fastq" />
            <output name="outputFileNameCSV" file="exp_getletterdistribution_short_fastq.csv" />     
        </test>
    </tests>

	<help>
The script gets the nucleotide distribution of the input sequence list. It outputs two files. The first file shows the nucleotide distribution of the data. More precisely, a point (*x*, *y*) on the curve **A** shows that *y* sequences have *x* % of **A**.
  
The second plot shows the average nucleotide distribution for each position of the read. You can use it to detect a bias in the first nucleotides, for instance. A point *x*, *y* on the curve **A** shows that at the position *x*, there are *y*% of **A**. A point (*x*, *y*) on the curve **#** tells you that *y* % of the sequences contain not less than *x* nucleotides. By definition, this latter line is a decreasing function. It usually explains why the tail of the other curves are sometimes erratic: there are few sequences.
	</help>
</tool>