comparison sam_compare_w_feature.xml @ 0:e979cb57a5d5 draft default tip

"planemo upload for repository https://github.com/McIntyre-Lab/BayesASE/tree/main/galaxy commit 9b70598ef46a73632d9e0fa0c6ce6776fb5e9d6a"
author malex
date Thu, 14 Jan 2021 21:51:36 +0000
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:e979cb57a5d5
1 <tool id="sam_compare_w_feature" name="Compare SAM Files and Create ASE Counts Tables" version="21.1.13">
2 <description> containing read counts for each parental genome </description>
3 <macros>
4 <import>macros.xml</import>
5 </macros>
6 <expand macro="requirements"/>
7 <command><![CDATA[
8 sam_compare_w_feature.py
9 --fastq=$fastq
10 --sama=$sama
11 --samb=$samb
12 --feature=$feature
13 --length=`awk '{if(NR%4==2 && length) {count++; bases += length}} END {print bases/count}' ${fastq} | awk '{printf "%.0f", $1}'`
14 $nofqids
15 --counts=$counts
16 --totals=$totals
17 ]]></command>
18 <inputs>
19 <param name="fastq" type="data" format="fastq" label="FASTQ File" help="Select the FASTQ file used to generate the 2 SAM files [REQUIRED]"/>
20 <param name="sama" type="data" format="tabular" label="SAM file with G1 reference" help="Select SAM file aligned to updated genome1 containing feature ID in RNAME field [REQUIRED]"/>
21 <param name="samb" type="data" format="tabular" label="SAM file with G2 reference" help="Select SAM file for aligned to updated genome2 containing feature ID in RNAME field [REQUIRED]"/>
22 <param name="feature" type="data" format="tabular" label="Reformatted BED file" help="Select BED file containing features to assign reads to, with feature ID names in first column. Can be a .tsv or .bed [REQUIRED]"/>
23 <param name="nofqids" type="boolean" checked="false" truevalue="--nofqids" falsevalue="" label="Check FASTQ IDs" help="Select to skip checking SAM QNAME against the fastq sequence IDs. Saves time if already known to be good."/>
24 </inputs>
25 <outputs>
26 <data name="counts" format="tsv" label="SAM Compare with Feature on ${on_string}: Create ASE Counts Tables" default_identifier_source="sama"/>
27 <data name="totals" format="tsv" label="SAM Compare with Feature on ${on_string}: Create ASE Totals Tables" default_identifier_source="sama"/>
28 </outputs>
29 <tests>
30 <test>
31 <param name="fastq" ftype="data" value="align_and_counts_test_data/W55_M_1_1.fastq"/>
32 <param name="sama" ftype="data" value="align_and_counts_test_data/W1118_G1_create_new_SAM_file_with_features_BASE_test_data.sam"/>
33 <param name="samb" ftype="data" value="align_and_counts_test_data/W55_G2_create_new_SAM_file_with_features_BASE_test_data.sam" />
34 <param name="feature" ftype="data" value="align_and_counts_test_data/reformat_BED_file_for_BASE.bed" />
35 <param name="nofqids" ftype="boolean" value="--nofqids" />
36 <output name="counts" file="align_and_counts_test_data/ASE_counts_table_BASE_test_data.tsv" />
37 <output name="totals" file="align_and_counts_test_data/ASE_totals_table_BASE_test_data.tsv" />
38 </test>
39 </tests>
40 <help><![CDATA[
41 **Tool Description**
42
43 This tool compares the read mapping across two SAM files generated by mapping a FASTQ file to 2 genotype specific references.
44 The tool creates an ASE Counts Table for each FASTQ file by comparing the SAM files with genic features (created by the Create new SAM file tool) that were aligned to both genotype specific references.
45 Reads in each SAM file are processed and compared to determine whether a given read maps better to one genome or equally well to both.
46 The results are also displayed in a read count summary table.
47
48 **Input**
49
50 **FASTQ file [REQUIRED]**
51
52 The tool requires the FASTQ file used to generate the 2 SAM files. The FASTQ file is used to determine the APN (average number of reads per nucleotide).
53
54 **SAM files [Required]**
55
56 The tool requires the two SAM files of uniquely mapping reads output by the *Create New SAM file* tool.
57
58 (1) SAM file containing uniquely mapping reads that overlap with features of interest for updated reference genome G1 (SAM A).
59 (2) SAM file containing uniquely mapping reads that overlap with features of interest for updated reference genome G2 (SAM B).
60
61 **4- column BED file [Required]**
62
63 A four column BED file containing genic features in the 1st column and chromosome name in the 4th column.
64 This input BED file can be created using the *Reformat BED file* tool.
65
66 Example input BED File::
67
68 +---------------+-----------+------------+------------+
69 | name | start | end | chrom |
70 +===============+===========+============+============+
71 | featureA | 2345 | 2899 | 2R |
72 +---------------+-----------+------------+------------+
73
74 **Output**
75 The tools generates 2 output TSV files:
76
77 1. An ASE counts table containing the orientation that unique reads mapped to each feature listed in the input Feature/BED
78
79 Example of ASE Counts Table
80
81 +---------------+------------+------------------+----------------+----------------+--------------------------+-------------------------+-------------------------+--------------------------+--------------------+--------------------+
82 |Feature_ID | BOTH_EXACT |BOTH_INEXACT_EQUAL|SAM_A_ONLY_EXACT|SAM_B_ONLY_EXACT| SAM_A_EXACT_SAM_B_INEXACT|SAM_B_EXACT_SAM_A_INEXACT|SAM_A_ONLY_SINGLE_INEXACT|SAM_B_ONLY_SINGLE_INEXACT |SAM_A_INEXACT_BETTER|SAM_B_INEXACT_BETTER|
83 +===============+============+==================+================+================+==========================+=========================+=========================+==========================+====================+====================+
84 | l(1)G0196 | 4 |2 | 0 | 0 | 0 | 0 |0 | 0 | 1 | 1 |
85 +---------------+------------+------------------+----------------+----------------+--------------------------+-------------------------+-------------------------+--------------------------+--------------------+--------------------+
86 | CG8920 |0 | 1 | 0 | 0 | 0 | 0 | 0 |0 |0 |0 |
87 +---------------+------------+------------------+----------------+----------------+--------------------------+-------------------------+-------------------------+--------------------------+--------------------+--------------------+
88 |CG10932 | 0 | 1 | 0 | 0 | 0 | 0 |0 |0 |0 | 0 |
89 +---------------+------------+------------------+----------------+----------------+--------------------------+-------------------------+-------------------------+--------------------------+--------------------+--------------------+
90 |Mapmodulin | 2 |1 |0 |0 | 0 | 0 | 0 | 0 | 0 | 1 |
91 +---------------+------------+------------------+----------------+----------------+--------------------------+-------------------------+-------------------------+--------------------------+--------------------+--------------------+
92
93
94 ASE Counts Table headers:
95
96 (1) BOTH_EXACT: Number of reads that mapped uniquely to both genomes, and it was an exact match
97 (2) BOTH_INEXACT_EQUAL: Number of reads that mapped equally to both genomes (ie did not preferentially to a certain parental genome), but it was an inexact match
98 (3) {SAM_A/B}_ONLY_EXACT: Reads that mapped preferentially to only one of the two genomes and it was an exact match
99 (4) SAM_A_EXACT_SAM_B_INEXACT: Reads that mapped exactly to genome 1 and inexactly to parental genome 2
100 (5) SAM_B_EXACT_SAM_A_INEXACT: Reads that mapped exactly to genome 2 and inexactly to genome 1
101 (6) {SAM_A/B}_ONLY_SINGLE_INEXACT: Reads that mapped preferentially to only onegenome, and it was in an inexact manner
102 (7) {SAM_A/B}_INEXACT_BETTER: Reads that mapped preferentially to their opposite parent, but the match is inexact
103
104
105
106 2. An ASE totals table containing a summary of the reads aligning to the the G1 and G2 references
107
108 Example of ASE Totals Table::
109
110 Count totals:
111 1: a_single_exact 0
112 2: a_single_inexact 0
113 3: a_multi_exact 0
114 4: a_multi_inexact 0
115 5: b_single_exact 0
116 6: b_single_inexact 0
117 7: b_multi_exact 0
118 8: b_multi_inexact 0
119 9: both_single_exact_same 0
120 10: both_single_exact_diff 6
121 11: both_single_inexact_same 0
122 12: both_single_inexact_diff 8
123 13: both_inexact_diff_equal 5
124 14: both_inexact_diff_a_better 1
125 15: both_inexact_diff_b_better 2
126 16: both_multi_exact 0
127 17: both_multi_inexact 0
128 18: a_single_exact_b_single_inexact 0
129 19: a_single_inexact_b_single_exact 0
130 20: a_single_exact_b_multi_exact 0
131 21: a_multi_exact_b_single_exact 0
132 22: a_single_exact_b_multi_inexact 0
133 23: a_multi_inexact_b_single_exact 0
134 24: a_single_inexact_b_multi_exact 0
135 25: a_multi_exact_b_single_inexact 0
136 26: a_single_inexact_b_multi_inexact 0
137 27: a_multi_inexact_b_single_inexact 0
138 28: a_multi_exact_b_multi_inexact 0
139 29: a_multi_inexact_b_multi_exact 0
140 30: total_count 14
141
142
143 ]]></help>
144 <citations>
145 <citation type="bibtex">@ARTICLE{Miller20BASE,
146 author = {Brecca Miller, Alison M. Morse, Elyse Borgert, Zihao Liu, Kelsey Sinclair, Gavin Gamble, Fei Zou, Jeremy Newman, Luis Leon Novello, Fabio Marroni, Lauren M. McIntyre},
147 title = {Testcrosses are an efficient strategy for identifying cis regulatory variation: Bayesian analysis of allele imbalance among conditions (BASE)},
148 journal = {????},
149 year = {submitted for publication}
150 }</citation>
151 </citations>
152 </tool>