Mercurial > repos > iuc > data_manager_star_index_builder
annotate README @ 0:e4b87a00b1df draft
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
author | iuc |
---|---|
date | Wed, 23 Nov 2016 17:55:57 -0500 |
parents | |
children |
rev | line source |
---|---|
0
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
1 *What it does* |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
2 |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
3 This is a Galaxy datamanager for the rna STAR gap-aware RNA aligner. It's a hack of Dan Blankenberg's BWA data manager |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
4 and works on any fasta file you have already downloaded with the all fasta data manager - start there! |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
5 |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
6 Warning - this is not well tested and there are some complexities to do with splice junction annotation in rna star |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
7 indexes - feedback welcomed. Send code. |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
8 |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
9 Note, currently you'll need a small patch to prevent an error when you try to generate splice junction indexes described at |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
10 https://bitbucket.org/galaxy/galaxy-central/pull-request/510/fix-for-data-manager-failure-to-update-a#comment-3265356 |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
11 |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
12 Please read the fine manual - that and the google group are the places to learn about the options above. |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
13 |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
14 *Note on sjdbOverhang* |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
15 |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
16 From https://groups.google.com/forum/#!topic/rna-star/h9oh10UlvhI:: |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
17 |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
18 James is right, using large enough --sjdbOverhang is safer and should not generally cause any problems with reads of varying length. If your reads are very short, <50b, then I would strongly recommend using optimum --sjdbOverhang=mateLength-1 |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
19 By mate length I mean the length of one of the ends of the read, i.e. it's 100 for 2x100b PE or 1x100b SE. For longer reads you can simply use generic --sjdbOverhang 100. |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
20 It is a bit confusing because of the way I named this parameter. --sjdbOverhang Noverhang is only used at the genome generation step for constructing the reference sequence out of the annotations. |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
21 Basically, the Noverhang exonic bases from the donor site and Noverhang exonic bases from the acceptor site are spliced together for each of the junctions, and these spliced sequences are added to the genome sequence. |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
22 |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
23 At the mapping stage, the reads are aligned to both genomic and splice sequences simultaneously. If a read maps to one of spliced sequences and crosses the "junction" in the middle of it, the coordinates of two pspliced pieces are translated back to genomic space and added to the collection of mapped pieces, which are then all "stitched" together to form the final alignment. Since in the process of "maximal mapped length" search the read is split into pieces of no longer than --seedSearchStartLmax (=50 by default) bases, even if the read (mate) is longer than --sjdbOverhang, it can still be mapped to the spliced reference, as long as --sjdbOverhang > --seedSearchStartLmax. |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
24 |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
25 Cheers |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
26 Alex |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
27 |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
28 *Note on gene model requirements for splice junctions* |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
29 |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
30 From https://groups.google.com/forum/#!msg/rna-star/3Y_aaTuzBrE/lUylTB8h5vMJ:: |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
31 |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
32 When you generate a genome with annotations, you need to specify --sjdbOverhang value, which ideally should be equal to (oneMateLength-1), or you could use a generic value of ~100. |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
33 |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
34 Your gtf lines look fine to me. STAR needs 3 features from a GTF file: |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
35 1. Chromosome names in col.1 that agree with chromosome names in genome .fasta files. If you have "chr2L" names in the genome .fasta files, and "2L" in the .gtf file, then you need to use --sjdbGTFchrPrefix chr option. |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
36 2. 'exon' in col.3 for the exons of all transcripts (this name can be changed with --sjdbGTFfeatureExon) |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
37 3. 'transcript_id' attribute that assigns each exon to a transcript (--this name can be changed with --sjdbGTFtagExonParentTranscript) |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
38 |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
39 Cheers |
e4b87a00b1df
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/data_managers/data_manager_star_index_builder commit 3265247e909410db2a6d6087a2c0d3a9885c120c-dirty
iuc
parents:
diff
changeset
|
40 Alex |