comparison check_for_lost_reads.xml @ 0:e979cb57a5d5 draft default tip

"planemo upload for repository https://github.com/McIntyre-Lab/BayesASE/tree/main/galaxy commit 9b70598ef46a73632d9e0fa0c6ce6776fb5e9d6a"
author malex
date Thu, 14 Jan 2021 21:51:36 +0000
parents
children
comparison
equal deleted inserted replaced
-1:000000000000 0:e979cb57a5d5
1 <tool id="check_for_lost_reads" name="Check for lost reads" version="21.1.13">
2 <description>verify starting FASTQ read number equals read number after running BWASplitSAM tool</description>
3 <macros>
4 <import>macros.xml</import>
5 </macros>
6 <expand macro="requirements"/>
7 <command><![CDATA[
8 check_lost_reads.py
9 --alnSum1=$alnSum1
10 --alnSum2=$alnSum2
11 --fq=$fq
12 --out=$out
13 ]]></command>
14 <inputs>
15 <param name="alnSum1" type="data" format="tabular" label="BWASplitSAM Alignment Summary G1" help="The G1 alignment summary file [from BWASplitSAM tool] for updated genome1 containing all read types [Required]"/>
16 <param name="alnSum2" type="data" format="tabular" label="BWASplitSAM Alignment Summary G2" help="The G2 alignment summary file [from BWASplitSAMtool] for updated genome2 containing all read types [Required]"/>
17 <param name="fq" type="data" format="fastq" label="Name of the FASTQ file" help="Name of FASTQ file used to generate the alignments selected above."/>
18 </inputs>
19 <outputs>
20 <data format="tabular" name="out" label="${tool.name} on ${on_string}: Check start readNum = alignment readNum"/>
21 </outputs>
22 <tests>
23 <test>
24 <param name="alnSum1" ftype="data" value="align_and_counts_test_data/W1118_G1_BWASplitSAM_summary.tabular"/>
25 <param name="alnSum2" ftype="data" value="align_and_counts_test_data/W55_G2_BWASplitSAM_summary.tabular"/>
26 <param name="fq" ftype="data" value="align_and_counts_test_data/W55_M_1_1.fastq"/>
27 <output name="out" file="align_and_counts_test_data/check_for_lost_reads_BASE_test_data.tabular" />
28 </test>
29 </tests>
30 <help><![CDATA[
31 **Tool Description**
32
33 This tool checks that all reads in the starting FASTQ file are accounted for in the G1 and G2 SAM files after running the BWASplitSAM tool.
34 The reads counts in the input FASTQ file are compared to the 'count_total_reads' column in the summary of aligned reads TSV files generated byt he BWASplitSAM tool.
35
36
37 **Input**
38 -The tool requires three input files
39
40 (1) The output summary TSV file generated from the BWASplitSAM tool for the updated genome1 (G1) SAM file
41 (2) The output summary TSV file generated from the BWASplitSAM tool for the updated genome2 (G2) SAM file
42 (3) The FASTQ file using to generate the above G1 and G2 SAM files - used to calculate the number of starting reads
43
44 Example summary TSV file from BWASplitSAM script:
45
46 +---------------+---------------------+---------------------------------------+---------------------+---------------------+----------------------+---------------------+-----------------+
47 | Name | count_total_reads | count_mapped_read_opposite_strand | count_unmapped_read | count_mapped_read | count_ambiguous_read |count_chimeric_read | count_notprimary|
48 +===============+=====================+=======================================+=====================+=====================+======================+=====================+=================+
49 | dataset_2216 | 14 | 5 | 0 | 9 |0 | 0 | 0 |
50 +---------------+---------------------+---------------------------------------+---------------------+---------------------+----------------------+---------------------+-----------------+
51
52
53 **Output**
54
55 A TSV file containing:
56 (1) starting read counts in the FASTQ file [start_read_num]
57 (2) read counts in the G1 alignment [readNum_G1]
58 (3) read counts in the G2 alignment [readNum_G2]
59 (4) indicator flag for whether the starting count = G1 count [flag_start_readNum_eq_readNum_G1]
60 (5) indicator flag for whether the starting count = G2 count [flag_start_readNum_eq_readNum_G2]
61
62 Sample Output TSV file
63
64 +---------------+---------------------+---------------+------------+------------------------------------+------------------------------------+
65 | fqName | start_read_num | readNum_G1 | readNum_G2 | flag_start_readNum_eq_readNum_G1 | flag_start_readNum_eq_readNum_G2 |
66 +===============+=====================+===============+============+====================================+====================================+
67 | dataset_2216 | 14 | 14 | 14 | 1 |1 |
68 +---------------+---------------------+---------------+------------+------------------------------------+------------------------------------+
69
70 Columns are::
71
72 ◦ FqName
73 ◦ start_read_num: The total number of reads in the FASTQ file
74 ◦ readNum_G1: The total number of reads in the summary TSV file output from BWASplitSAM for updated parental genome 1 (G1)
75 ◦ readNum_G2: The number of reads found in the summary TSV file output from BWASplitSAM for updated parental genome 2 (G2)
76 ◦ flag_start_readNum_eq_readNum_{G1/G2}: 0/1 indicator flag where “1” means that the number of reads in the FASTQ file matches the total read number in the G1 or G2 BWASplitSAM summary file.
77
78 In the above example, flag_start_readNum_eq_readNum_G1 and flag_start_readNum_eq_readNum_G2 are both 1, indicating all reads are accounted for.
79
80 The BayesASE align and count workflow should be rerun if flag_start_readNum_eq_readNum_{G1/G2} is a 0.
81
82 ]]></help>
83 <citations>
84 <citation type="bibtex">@ARTICLE{Miller20BASE,
85 author = {Brecca Miller, Alison M. Morse, Elyse Borgert, Zihao Liu, Kelsey Sinclair, Gavin Gamble, Fei Zou, Jeremy Newman, Luis Leon Novello, Fabio Marroni, Lauren M. McIntyre},
86 title = {Testcrosses are an efficient strategy for identifying cis regulatory variation: Bayesian analysis of allele imbalance among conditions (BASE)},
87 journal = {????},
88 year = {submitted for publication}
89 }</citation>
90 </citations>
91 </tool>