comparison combine_cnt_tables.xml @ 0:e979cb57a5d5 draft default tip

"planemo upload for repository https://github.com/McIntyre-Lab/BayesASE/tree/main/galaxy commit 9b70598ef46a73632d9e0fa0c6ce6776fb5e9d6a"
author malex
date Thu, 14 Jan 2021 21:51:36 +0000
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:e979cb57a5d5
1 <tool id="combine_cnt_tables" name="Combine ASE Count Tables:" version="21.1.13">
2 <description>sum technical replicates</description>
3 <macros>
4 <import>macros.xml</import>
5 </macros>
6 <expand macro="requirements" />
7 <command><![CDATA[
8 mkdir outputs;
9 cd outputs;
10 combine_count_tables.py
11 --design=$design
12 --bed=$bed
13 --collection_identifiers="${",".join($collection.keys())}"
14 --collection_filenames="${",".join(map(str, $collection))}"
15 --begin=$begin
16 --end=$end
17 $sim
18 --outdir=`pwd`
19 --outdesign=$output_designFile
20 ]]></command>
21 <inputs>
22 <param name="design" type="data" format="tabular,tsv" label="Design File" help="Select the Alignment Design File. The design file must be sorted by biological replicate (sampleID) "/>
23 <param name="bed" type="data" format="tabular" label="Bed File" help="Select the input BED file. Do NOT use the reformatted BED file created in the Align and Count workflow"/>
24 <param name="collection" type="data_collection" collection_type="list" label="ASE Count Tables" help="Select collection containing ASE count tables"/>
25 <param name="begin" type="text" label="Start" help="Enter start point in design file, see below for description [OPTIONAL]"/>
26 <param name="end" type="text" label="End" help="Enter end point in design file, see below for description [OPTIONAL]"/>
27 <param name="sim" type="boolean" checked="false" truevalue="--sim" falsevalue="" label="Simulated Dataset" help="Select if the dataset is generated from simulated reads"/>
28 </inputs>
29 <outputs>
30 <collection name="split_output" type="list" label="${tool.name} on ${on_string}: Combined and Summed ASE Table">
31 <discover_datasets pattern="(?P&lt;name&gt;.*)" ext="tabular" sort_by="reverse_filename" directory="outputs" />
32 </collection>
33 <data name="output_designFile" format="tabular" label="${tool.name} on ${on_string}: Sample Design File" />
34 </outputs>
35 <tests>
36 <test>
37 <param name="design" value="summarize_counts_testdata/alignment_design_test_file.tsv" ftype="tsv"/>
38 <param name="bed" value="align_and_counts_test_data/BASE_testData_BEDfile.bed" ftype="bed"/>
39 <param name="collection" value="summarize_counts_testdata/ASE_counts_tables" ftype="data_collection"/>
40 <output name="output_designFile" file="summarize_counts_testdata/sample_design_file.tabular" ftype="tabular"/>
41 <output_collection name="split_output" type="list">
42 <element name="FEATURE_ID">
43 <assert_contents>
44 <has_text_matching expression="Combine_counts_output"/>
45 </assert_contents>
46 </element>
47 </output_collection>
48 </test>
49 </tests>
50 <help><![CDATA[
51 **Tool Description**
52
53 The Combine ASE Counts Tables tool sums the input ASE Counts tables across the technical replicates specified in the Alignment Design File.
54 Technical replicates within the same biological replicate are combined for each comparate and the average per nucleotide (APN) value is calculated for each feature.
55
56 The tool outputs a single ASE counts file for each sample replicated with APN values.
57 The tool also outputs a Sample Design File that is required for the Summarization and Filter ASE Counts Tables tool that follows.
58
59 APN_total= ([total alignment count] x [readLength]) /(genic feature length)
60 APN_both= ([reads mapping equally to both parental genomes] x (readlength)/ (genic feature length)
61
62
63 -------------------------------------------------------------------------------------------
64
65 **Inputs**
66
67 -There are four required inputs for this tool.
68
69 **Alignment Design File [REQUIRED]**
70
71 **TIP**: Check if the design file is in the correct format by using the *Check Align Design file* tool
72
73 The Alignment Design File is required as input and must have the following format:
74
75 Example design file::
76
77 G1 G2 sampleID fqName fqExtension techRep readLength
78 W1118 W55 mel_W55_Mated_E1 mel_W55_Mated_E1_R1 .fq 1 150
79 W1118 W55 mel_W55_Mated_E1 mel_W55_Mated_E1_R2 .fq 2 150
80 W1118 W55 mel_W55_Mated_E1 mel_W55_Mated_E1_R3 .fq 3 150
81
82 In the example design file above, one summed ASE Counts table would be generated from the three technical replicates belonging to biological replicate 1 (E1).
83
84 **NOTE:** If using simulated reads, include the technical replicate column, but label the technical replicates with the same number
85
86 Example design file for simulated data ::
87
88 G1 G2 sampleID fqName fqExtension techRep readLength
89 W1118 W55 W55_M_1 SRR1989586_1 .fq 1 96
90 W1118 W55 W55_M_2 SRR1989588_1 .fq 1 96
91 W1118 W55 W55_V_1 SRR1989592_1 .fq 1 96
92 W1118 W55 W55_V_2 SRR1989594_1 .fq 1 96
93
94 **BED File [REQUIRED]**
95
96 The user-supplied BED file containing locations of genic features. Do **not** use the reformatted BED file created in the Align and Count workflow where feature ID is in the first column and chromosome name is in the last. Chromosome name must be in the first column.
97
98 **ASE Count Tables [REQUIRED]**
99
100 A collection of ASE Count Tables generated by the Sam Compare with Feature Tool.
101
102
103 **Start [OPTIONAL]**
104
105 Enter the row number in the Alignment Design File at which to start. Use if only a certain subset of FASTQ files in the design file is wanted in data analysis.
106
107 **End [OPTIONAL]**
108
109 Enter the row number in the Alignment Design File at which to end.
110
111 **Simulated Dataset**
112
113 Select if the input dataset is simulated.
114
115 ------------------------------------------------------------------------------------------------------
116
117 **Output**
118
119 The tool generates two output files:
120
121 (1) A TSV file for each sample replicate containing the summed ASE Counts and the APN values for the uniquely mapping reads per feature.
122
123 Example of a Combined ASE Count Table::
124
125
126 +------------+-------------------+-------------------+------------+------------------+----------------+----------------+--------------------------+-------------------------+-------------------------+--------------------------+--------------------+--------------------+
127 |Feature_ID |APN_both |APN_total_reads | BOTH_EXACT |BOTH_INEXACT_EQUAL|SAM_A_ONLY_EXACT|SAM_B_ONLY_EXACT| SAM_A_EXACT_SAM_B_INEXACT|SAM_B_EXACT_SAM_A_INEXACT|SAM_A_ONLY_SINGLE_INEXACT|SAM_B_ONLY_SINGLE_INEXACT |SAM_A_INEXACT_BETTER|SAM_B_INEXACT_BETTER|
128 +============+===================+===================+============+==================+================+================+==========================+=========================+=========================+==========================+====================+====================+
129 | l(1)G0196 |10.255101044615834 |12.723420872791175 | 721 |1476 |120 |173 |0 | 2 |96 |136 |0 |2 |
130 +------------+-------------------+-------------------+------------+------------------+----------------+----------------+--------------------------+-------------------------+-------------------------+--------------------------+--------------------+--------------------+
131 | CG8920 |7.0372442219932285 |8.62888267334020 | 207 |293 |31 |62 |0 | 0 |8 |12 |0 |0 |
132 +------------+-------------------+-------------------+------------+------------------+----------------+----------------+--------------------------+-------------------------+-------------------------+--------------------------+--------------------+--------------------+
133
134
135 (2) A Sample Design File containing the names of the biological replicates that were summed in the generated ASE Counts Table collection.
136 G1 and G2 refer to the names of the parental genomes.
137
138
139 Example Sample Design File:
140
141 +------+-----+------------------+----------------+
142 | G1 | G2 | sampleID | comparate |
143 +------+-----+------------------+----------------+
144 |W1118 | W55 | mel_W55_M_1 | mel_W55_M |
145 +------+-----+------------------+----------------+
146 |W1118 | W55 | mel_W55_M_2 | mel_W55_M |
147 +------+-----+------------------+----------------+
148 |W1118 | W55 | mel_W55_V_1 | mel_W55_V |
149 +------+-----+------------------+----------------+
150 |W1118 | W55 | mel_W55_V_2 | mel_W55_V |
151 +------+-----+------------------+----------------+
152
153
154
155 ]]></help>
156 <citations>
157 <citation type="bibtex">@ARTICLE{Miller20BASE,
158 author = {Brecca Miller, Alison M. Morse, Elyse Borgert, Zihao Liu, Kelsey Sinclair, Gavin Gamble, Fei Zou, Jeremy Newman, Luis Leon Novello, Fabio Marroni, Lauren M. McIntyre},
159 title = {Testcrosses are an efficient strategy for identifying cis regulatory variation: Bayesian analysis of allele imbalance among conditions (BASE)},
160 journal = {????},
161 year = {submitted for publication}
162 }</citation>
163 </citations>
164 </tool>