comparison data_manager/rna_star_index_builder.xml @ 1:cdc4d8a998e1 draft

planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 0d434bca5083e908114d93e11094e48f49b98ed1
author iuc
date Fri, 21 Apr 2017 12:36:31 -0400
parents
children 6ef6520f14fc
comparison
equal deleted inserted replaced
0:e4b87a00b1df 1:cdc4d8a998e1
1 <tool id="rna_star_index_builder_data_manager" name="rnastar index2" tool_type="manage_data" version="0.0.4" profile="17.01">
2 <description>builder</description>
3
4 <macros>
5 <import>macros.xml</import>
6 </macros>
7
8 <expand macro="requirements" />
9
10 <command><![CDATA[
11 #import json, os
12 #set params = json.loads( open( str($out_file) ).read() )
13 #set target_directory = $params[ 'output_data' ][0]['extra_files_path'].encode('ascii', 'replace')
14 #set subdir = os.path.basename(target_directory)
15
16 mkdir -p '${target_directory}/${subdir}' &&
17
18 STAR
19 --runMode genomeGenerate
20 --genomeFastaFiles '${all_fasta_source.fields.path}'
21 --genomeDir '${target_directory}/${subdir}'
22 #if str($GTFconditional.GTFselect) == "withGTF":
23 --sjdbGTFfile '${GTFconditional.sjdbGTFfile}'
24 --sjdbOverhang '${GTFconditional.sjdbOverhang}'
25 #end if
26 --runThreadN \${GALAXY_SLOTS:-2} &&
27
28 python ${__tool_directory__}/rna_star_index_builder.py
29 --config-file '${out_file}'
30 --value '${all_fasta_source.fields.value}'
31 --dbkey '${all_fasta_source.fields.dbkey}'
32 #if $name:
33 --name '$name'
34 #else
35 --name '${all_fasta_source.fields.name}'
36 #end if
37 #if str($GTFconditional.GTFselect) == "withGTF":
38 --withGTF 1
39 #end if
40 --data-table 'rnastar_index2'
41 --subdir '${subdir}'
42 ]]></command>
43 <inputs>
44 <param name="all_fasta_source" type="select" label="Source FASTA Sequence">
45 <options from_data_table="all_fasta"/>
46 </param>
47 <param name="name"
48 type="text"
49 value=""
50 label="Informative name for sequence index"
51 help="By using different settings, you may have several indices per reference genome. Give an appropriate description to the index to distinguish between indices"/>
52 <conditional name="GTFconditional">
53 <param name="GTFselect" type="select" label="Reference genome with or without an annotation" help="Must the index have been created WITH a GTF file (if not you can specify one afterward).">
54 <option value="withoutGTF">use genome reference without builtin gene-model</option>
55 <option value="withGTF">use genome reference with builtin gene-model</option>
56 </param>
57 <when value="withGTF">
58 <param argument="--sjdbGTFfile" type="data" format="gff3,gtf" label="Gene model (gff3,gtf) file for splice junctions" optional="false" help="Exon junction information for mapping splices"/>
59 <param argument="--sjdbOverhang" type="integer" min="1" value="100" label="Length of the genomic sequence around annotated junctions" help="Used in constructing the splice junctions database. Ideal value is ReadLength-1"/>
60 </when>
61 <when value="withoutGTF" />
62 </conditional>
63 </inputs>
64
65 <outputs>
66 <data name="out_file" format="data_manager_json"/>
67 </outputs>
68
69 <!-- not available in planemo at the moment of writing
70 <tests>
71 <test>
72 <param name="all_fasta_source" value="phiX.fa"/>
73 <param name="sequence_name" value="phiX"/>
74 <param name="sequence_id" value="minimal-settings"/>
75 <param name="modelformat" value="None"/>
76
77 <output name="out_file" file="test_star_01.data_manager_json"/>
78 </test>
79 </tests>
80 -->
81
82 <help>
83
84 .. class:: infomark
85
86 <![CDATA[
87 *What it does*
88
89 This is a Galaxy datamanager for the rna STAR gap-aware RNA aligner.
90
91 Please read the fine manual - that and the google group are the places to learn about the options above.
92
93 *Memory requirements*
94
95 To run efficiently, RNA-STAR requires enough free memory to
96 hold the SA-indexed reference genome in RAM. For Human Genome hg19 this
97 index is about 27GB and running RNA-STAR requires approximately ~30GB of RAM.
98 For custom genomes, the rule of thub is to multiply the size of the
99 reference FASTA file by 9 to estimated required amount of RAM.
100
101 *Note on sjdbOverhang*
102
103 From https://groups.google.com/forum/#!topic/rna-star/h9oh10UlvhI::
104
105 James is right, using large enough --sjdbOverhang is safer and should not generally cause any problems with reads of varying length. If your reads are very short, &lt;50b, then I would strongly recommend using optimum --sjdbOverhang=mateLength-1
106 By mate length I mean the length of one of the ends of the read, i.e. it's 100 for 2x100b PE or 1x100b SE. For longer reads you can simply use generic --sjdbOverhang 100.
107 It is a bit confusing because of the way I named this parameter. --sjdbOverhang Noverhang is only used at the genome generation step for constructing the reference sequence out of the annotations.
108 Basically, the Noverhang exonic bases from the donor site and Noverhang exonic bases from the acceptor site are spliced together for each of the junctions, and these spliced sequences are added to the genome sequence.
109
110 At the mapping stage, the reads are aligned to both genomic and splice sequences simultaneously. If a read maps to one of spliced sequences and crosses the "junction" in the middle of it, the coordinates of two pspliced pieces are translated back to genomic space and added to the collection of mapped pieces, which are then all "stitched" together to form the final alignment. Since in the process of "maximal mapped length" search the read is split into pieces of no longer than --seedSearchStartLmax (=50 by default) bases, even if the read (mate) is longer than --sjdbOverhang, it can still be mapped to the spliced reference, as long as --sjdbOverhang > --seedSearchStartLmax.
111
112 Cheers
113 Alex
114
115 *Note on gene model requirements for splice junctions*
116
117 From https://groups.google.com/forum/#!msg/rna-star/3Y_aaTuzBrE/lUylTB8h5vMJ::
118
119 When you generate a genome with annotations, you need to specify --sjdbOverhang value, which ideally should be equal to (oneMateLength-1), or you could use a generic value of ~100.
120
121 Your gtf lines look fine to me. STAR needs 3 features from a GTF file:
122 1. Chromosome names in col.1 that agree with chromosome names in genome .fasta files. If you have "chr2L" names in the genome .fasta files, and "2L" in the .gtf file, then you need to use --sjdbGTFchrPrefix chr option.
123 2. 'exon' in col.3 for the exons of all transcripts (this name can be changed with --sjdbGTFfeatureExon)
124 3. 'transcript_id' attribute that assigns each exon to a transcript (--this name can be changed with --sjdbGTFtagExonParentTranscript)
125
126 Cheers
127 Alex
128
129 **Notice:** If you leave name, description, or id blank, it will be generated automatically.
130 ]]>
131 </help>
132 <expand macro="citations" />
133 </tool>