Mercurial > repos > portiahollyoak > temp
diff temp.xml @ 17:e7d3dc3e0ec9 draft
planemo upload for repository https://github.com/portiahollyoak/Tools commit 22724a75342e3097cd0976e4bfcfe7a19308ac4f-dirty
author | portiahollyoak |
---|---|
date | Wed, 15 Jun 2016 10:30:06 -0400 |
parents | c613f8c96e6d |
children | e198b686bfe4 |
line wrap: on
line diff
--- a/temp.xml Tue Jun 14 04:36:25 2016 -0400 +++ b/temp.xml Wed Jun 15 10:30:06 2016 -0400 @@ -1,4 +1,4 @@ -<tool id ="run_TEMP" name="TEMP" version="0.1.8"> +<tool id ="run_TEMP" name="TEMP" version="0.2.0"> <description></description> <requirements> <!-- The following are classical toolshed packages and should be removed @@ -25,6 +25,9 @@ -s $__tool_directory__/scripts -r "$consensus_te_seqs" -t "$te_locations" + #if $te_families: + -u "$te_families" + #end if -m "$mismatches" -f "$median_insertsize" -c \${GALAXY_SLOTS:-2} && @@ -44,10 +47,11 @@ <param format="bam" name="alignment" type="data" label="Alignment bam file"/> <param format="twobit" name="reference2bit" type="data" label="Reference twobit file"/> <param format="fasta" name="consensus_te_seqs" type="data" label="Consensus TE Seqs fasta file"/> - <param format="bed" name="te_locations" type="data" label="TE Locations bed file"/> + <param format="bed" name="te_locations" type="data" label="TE Annotations bed file"/> + <param format="tabular" name="te_families" type="data" optional="True" label="TE Identifiers and Families"/> <param name="median_insertsize" value="" type="integer" label="Median Insert Length"/> <param name="mismatches" min="0" max="5" type="integer" value="3" label="Allow this many mismatches when aligning to TEs"/> - <param name="minimum_score_difference" type="integer" min="10" max="37" value="30" label="Minimum score difference between optimal and suboptimal alignment to consider read uniquely mapped"/> + <param name="minimum_score_difference" type="integer" min="10" max="37" value="30" label="Minimum difference between mapping scores"/> </inputs> <outputs> <data format="bed" type="data" name="insertion_summary" label="${alignment.element_identifier} Insertions" /> @@ -76,38 +80,47 @@ Author: Jiali Zhuang (jiali.zhuang@umassmed.edu) and Jie Wang (jie.wangj@umassmed.edu) Weng Lab, University of Massachusetts Medical School, Worcester, MA, USA +*Input files/variables* +------------------------- +* Alignment file in BAM format +* Reference genome used in aligning, in fasta or twobit format. +* Transposable Elements' Consensus Sequences in fasta format. +* Annotations of TEs in reference genome in bed format. +* TE Identifiers and Families (optional) - A file containing in the first column the TE names/identifiers from the consensus sequences file, and in the second column, their respective TE family names as in the TE annotations file. When supplied, if a detected insertion overlaps with an annotated TE of the same family, the detected insertion will be excluded from the results. +* Median Insert Length +* Number of Mismatches allowed (default 3) +* Minimum difference between mapping scores. The minimum difference in scores between the optimal and suboptimal alignments to consider a read uniquely mapped. -Output files -------------- +*Output files* +----------------- +* **In the Insertions output file there are 14 columns:** +* Column 1: The chromosome where the detected insertion happens. +* Column 2: The coordinate of the start position of the detected insertion. +* Column 3: The coordinate of the end position of the detected insertion. +* Column 4: The TE family that the detected insertion belongs to. +* Column 5: The direction of the insertion. “Plus” means that the TE is integrated with the plus strand of the genome while “minus” means the TE is integrated with the minus strand. +* Column 6: The class of the insertion. “1p1” means that the detected insertion is supported by reads at both sides. “2p” means the detected insertion is supported by more than 1 read at only 1 side. “Singleton” means the detected insertion is supported by only 1 read at 1 side. +* Column 7: The total number of read pairs that support the detected insertion. +* Column 8: The estimated population frequency of the detected insertion. +* Columns 9 & 10: The coordinate of a junction and the number of the reads supporting it. If the junction is not found column 9 will be the arithmetic mean of the start and end coordinates and column 10 will have the value 0. +* Columns 11 & 12: Same as Columns 9 & 10 except for the junction on the other strand. +* Column 13: The number of reads supporting the detected insertion at the 5’ end of the TE (not including junction spanning reads). +* Column 13: The number of reads supporting the detected insertion at the 3’ end of the TE (not including junction spanning reads). -For TE insertion analysis there are 14 columns in the summary file:: - - Column 1: The chromosome where the detected insertion happens. - Column 2: The coordinate of the start position of the detected insertion. - Column 3: The coordinate of the end position of the detected insertion. - Column 4: The TE family that the detected insertion belongs to. - Column 5: The direction of the insertion. “Plus” means that the TE is integrated with the plus strand of the genome while “minus” means the TE is integrated with the minus strand. - Column 6: The class of the insertion. “1p1” means that the detected insertion is supported by reads at both sides. “2p” means the detected insertion is supported by more than 1 read at only 1 side. “Singleton” means the detected insertion is supported by only 1 read at 1 side. - Column 7: The total number of read pairs that support the detected insertion. - Column 8: The estimated population frequency of the detected insertion. - Columns 9 & 10: The coordinate of a junction and the number of the reads supporting it. If the junction is not found column 9 will be the arithmetic mean of the start and end coordinates and column 10 will have the value 0. - Columns 11 & 12: Same as Columns 9 & 10 except for the junction on the other strand. - Column 13: The number of reads supporting the detected insertion at the 5’ end of the TE (not including junction spanning reads). - Column 13: The number of reads supporting the detected insertion at the 3’ end of the TE (not including junction spanning reads). +----- -For TE absence analysis there are 9 columns in the summary file:: - - Column 1: The chromosome where the detected absence happens. - Column 2: The coordinate of the start position of the detected absence. - Column 3: The coordinate of the end position of the detected absence. - Column 4: The TE family that the detected insertion belongs to. - Column 5: Junctions at 5’ of the excised TE. The two numbers are the coordinates of the junctions on the two strands. - Column 6: Junctions at 3’ of the excised TE. The two numbers are the coordinates of the junctions on the two strands. - Column 7: The number of reads supporting the absence. - Column 8: The number of reads supporting the reference (no absence). - Column 9: Estimated population frequency of the detected absence event. +* **In the Absences output file there are 14 columns:** +* Column 1: The chromosome where the detected absence happens. +* Column 2: The coordinate of the start position of the detected absence. +* Column 3: The coordinate of the end position of the detected absence. +* Column 4: The TE family that the detected insertion belongs to. +* Column 5: Junctions at 5’ of the excised TE. The two numbers are the coordinates of the junctions on the two strands. +* Column 6: Junctions at 3’ of the excised TE. The two numbers are the coordinates of the junctions on the two strands. +* Column 7: The number of reads supporting the absence. +* Column 8: The number of reads supporting the reference (no absence). +* Column 9: Estimated population frequency of the detected absence event. ]]> </help>