view deseq-hts_1.0/galaxy/deseq.xml @ 1:8ab01cc29c4b draft

Uploaded
author vipints
date Wed, 27 Jun 2012 15:35:11 -0400
parents 94a108763d9e
children e27b4f7811c2
line wrap: on
line source

<tool id="deseq-hts" name="DESeq" version="1.6.1">
  <description>Determines differentially expressed transcripts from read alignments</description>
  <command> 
deseq-hts/src/deseq-hts.sh $anno_input_selected $deseq_out $deseq_out.extra_files_path/gene_map.mat 
#for $i in $replicate_groups
#for $j in $i.replicates
$j.bam_alignment:#slurp
#end for

#end for
    >> $Log_File </command>
  <inputs>
	<param format="gff3" name="anno_input_selected" type="data" label="Genome annotation in GFF3 file" help="A tab delimited format for storing sequence features and annotations"/>
   <repeat name="replicate_groups" title="Replicate group" min="2">
     <repeat name="replicates" title="Replicate">
      <param format="bam" name="bam_alignment" type="data" label="BAM alignment file" help="BAM alignment file. Can be generated from SAM files using the SAM Tools."/>
     </repeat>
   </repeat>
  </inputs>

  <outputs>
    <data format="txt" name="deseq_out" label="DESeq result"/>
    <data format="txt" name="Log_File" label="DESeq log file"/>
  </outputs>

  <tests>
    <test> 
	command:
	./deseq-hts.sh ../test_data/deseq_c_elegans_WS200-I-regions.gff3 ../test_data/deseq_c_elegans_WS200-I-regions_deseq.txt ../test_data/genes.mat ../test_data/deseq_c_elegans_WS200-I-regions-SRX001872.bam ../test_data/deseq_c_elegans_WS200-I-regions-SRX001875.bam
	
      <param name="anno_input_selected" value="deseq_c_elegans_WS200-I-regions.gff3" ftype="gff3" />
      <param name="bam_alignments1" value="deseq_c_elegans_WS200-I-regions-SRX001872.bam" ftype="bam" />
      <param name="bam_alignments2" value="deseq_c_elegans_WS200-I-regions-SRX001875.bam" ftype="bam" />
      <output name="deseq_out" file="deseq_c_elegans_WS200-I-regions_deseq.txt" />
    </test>
  </tests> 

  <help>

.. class:: infomark

**What it does** 

`DESeq` is a tool for differential expression testing of RNA-Seq data.


**Inputs**

`DESeq` requires three input files to run:

1. Annotation file in GFF3, containing the necessary information about the transcripts that are to be quantified.
2. The BAM alignment files grouped into replicate groups, each containing several replicates. BAM files store the read alignments in a compressed format. They can be generated using the `SAM-to-BAM` tool in the NGS: SAM Tools section. (The script will also work with only two groups containing only a single replicate each. However, this analysis has less statistical power and is  therefor not recommended.)

**Output**

`DESeq` generates a text file containing the gene name and the p-value.

------

**Licenses**

If **DESeq** is used to obtain results for scientific publications it
should be cited as [1]_.

**References** 

.. [1] Anders, S and Huber, W (2010): `Differential expression analysis for sequence count data`_. 

.. _Differential expression analysis for sequence count data: http://dx.doi.org/10.1186/gb-2010-11-10-r106

------

.. class:: infomark

**About formats**


**GFF3 format** General Feature Format is a format for describing genes
and other features associated with DNA, RNA and protein
sequences. GFF3 lines have nine tab-separated fields:

1. seqid - The name of a chromosome or scaffold.
2. source - The program that generated this feature.
3. type - The name of this type of feature. Some examples of standard feature types are "gene", "CDS", "protein", "mRNA", and "exon". 
4. start - The starting position of the feature in the sequence. The first base is numbered 1.
5. stop - The ending position of the feature (inclusive).
6. score - A score between 0 and 1000. If there is no score value, enter ".".
7. strand - Valid entries include '+', '-', or '.' (for don't know/care).
8. phase - If the feature is a coding exon, frame should be a number between 0-2 that represents the reading frame of the first base. If the feature is not a coding exon, the value should be '.'.
9. attributes - All lines with the same group are linked together into a single item.

For more information see http://www.sequenceontology.org/gff3.shtml

**SAM/BAM format** The Sequence Alignment/Map (SAM) format is a
tab-limited text format that stores large nucleotide sequence
alignments. BAM is the binary version of a SAM file that allows for
fast and intensive data processing. The format specification and the
description of SAMtools can be found on
http://samtools.sourceforge.net/.

------

DESeq-hts Wrapper Version 0.3 (Feb 2012)

</help>
</tool>