view check_samcomp_for_lost_reads.xml @ 0:e979cb57a5d5 draft default tip

"planemo upload for repository https://github.com/McIntyre-Lab/BayesASE/tree/main/galaxy commit 9b70598ef46a73632d9e0fa0c6ce6776fb5e9d6a"
author malex
date Thu, 14 Jan 2021 21:51:36 +0000
parents
children
line wrap: on
line source

<tool id="check_samcomp_for_lost_reads" name="Check SAM Compare Output" version="21.1.13">
    <description> - check read numbers in and out of Compare SAM Files and Create ASE Counts Tables tool</description>
    <macros>
      <import>macros.xml</import>
    </macros>
    <expand macro="requirements" />
    <command><![CDATA[
    check_samcomp_lost_reads.py
    --summary1=$sum1
    --summary2=$sum2
    --ase_names=$ase.element_identifier
    --ase=$ase
    --out=$out
]]></command>
    <inputs>
        <param name="sum1" type="data" format="tabular" label="Remove Reads Summary file for updated genome 1 (G1)" help="Select the summary file containing the read counts after dropping non-overlapping reads [Required]"/>
        <param name="sum2" type="data" format="tabular" label="Remove Reads Summary file for updated genome 2 (G2)" help="Select the summary file containing the read counts after dropping non-overlapping reads [Required]"/>
        <param name="ase" type="data" format="tsv" label="ASE Totals Table" help="Select the ASE Totals tables containing the read counts generated by the Compare SAM Files and Create ASE Counts Tables tool [Required]"/>
    </inputs>
    <outputs>
        <data name="out" format="tabular" label="${tool.name} on ${on_string}: Check SAM Compare Output"/>
    </outputs>
    <tests>
        <test>
            <param name="sum1" ftype="data"      value="align_and_counts_test_data/number_rows_left_after_removed_reads.tabular"/>
            <param name="sum2" ftype="data"      value="align_and_counts_test_data/number_rows_left_after_removed_reads_G2.tabular"/>
            <param name="ase"  ftype="data"      value="align_and_counts_test_data/ASE_totals_table_BASE_test_data.tsv" />
            <output name="out"     file="align_and_counts_test_data/check_SAM_compare_for_lost_reads_BASE_test_data.tabular" />
        </test>
    </tests>
    <help><![CDATA[
**Tool Description**

The Check SAM Compare Output tool checks that the number of reads into and out of the Compare SAM Files and Create ASE Counts Tables Tool are the same.
The total of all reads mapped to each feature should be the sum of all unique reads mapped to that feature from the two initial alignment files.
This implies that the total must be at least the number of reads mapping to one genome and no more than the sum of reads mapping to both genomes.
Numbers of reads in the ASE Totals file outside this range inicate that the Compare SAM Files and Create ASE Counts Tables tools should be rerun.

**This tool takes the following input:**

        (1) Remove Reads Summary for G1 - the summary file generated from the Remove Reads tool containing the number of rows left after non-overlapping reads were removed for G1
        (2) Remove Reads Summary for G2 - the summary file generated from the Remove Reads tool containing the number of rows left after non-overlapping reads were removed for G2
        (3) ASE Totals Table - contains read counts generated by the SAM compare tool

An example Remove Reads summary file:

    +---------------+---------------------+---------------------------+
    |   fqNa        |  number_overlapping_rows  | total_number_rows   |
    +===============+===========================+=====================+
    | dataset_2215  |   918                     |     919             |
    +---------------+---------------------------+---------------------+


An example of a ASE total file::

        Count totals:
    1:	a_single_exact	0
    2:	a_single_inexact 0
    3:	a_multi_exact	0
    4:	a_multi_inexact	0
    5:	b_single_exact	0
    6:	b_single_inexact 0
    7:	b_multi_exact	0
    8:	b_multi_inexact	0
    9:	both_single_exact_same	0
    10:	both_single_exact_diff	6
    11:	both_single_inexact_same  0
    12:	both_single_inexact_diff  8
    13:	both_inexact_diff_equal	5
    14:	both_inexact_diff_a_better  1
    15:	both_inexact_diff_b_better	2
    16:	both_multi_exact	0
    17:	both_multi_inexact	0
    18:	a_single_exact_b_single_inexact	0
    19:	a_single_inexact_b_single_exact	0
    20:	a_single_exact_b_multi_exact	0
    21:	a_multi_exact_b_single_exact	0
    22:	a_single_exact_b_multi_inexact	0
    23:	a_multi_inexact_b_single_exact	0
    24:	a_single_inexact_b_multi_exact	0
    25:	a_multi_exact_b_single_inexact	0
    26:	a_single_inexact_b_multi_inexact 0
    27:	a_multi_inexact_b_single_inexact 0
    28:	a_multi_exact_b_multi_inexact	0
    29:	a_multi_inexact_b_multi_exact	0
    30:	total_count	14

**This tool will output a tabular file containing the following columns:**

        (1) fqName
        (2) min_uniq_g1_uniq_g2: The minimum number of unique reads of the two BWA files
        (3) sum_uniq_g1_uniq_g2: The sum of the unique reads in the two BWA files
        (4) total_counts_ase_table: The final total count in the ASE totals file (should be between (2) and (3) doubled for the check to be successful)
        (5) flag_readnum_in_range: A 0/1 indicator flag that is equal to 1 if the check was successful or 0 if the check was unsuccessful

An example of an unsuccessful output file where reads were lost:

    +---------------+---------------------+--------------------------------------+----------------------+----------------------+
    |   fqName      |min_uniq_g1_uniq_g2  | sum_uniq_g1_uniq_g2                  |total_counts_ase_table| flag_readnum_in_range|
    +===============+=====================+======================================+======================+======================+
    | name_of_fq    |  14                 |   28                                 |    8                 |    0                 |
    +---------------+---------------------+--------------------------------------+----------------------+----------------------+

    ]]></help>

    <citations>
            <citation type="bibtex">@ARTICLE{Miller20BASE,
            author = {Brecca Miller, Alison M. Morse, Elyse Borgert, Zihao Liu, Kelsey Sinclair, Gavin Gamble, Fei Zou, Jeremy Newman, Luis Leon Novello, Fabio Marroni, Lauren M. McIntyre},
            title = {Testcrosses are an efficient strategy for identifying cis regulatory variation: Bayesian analysis of allele imbalance among conditions (BASE)},
            journal = {????},
            year = {submitted for publication}
            }</citation>
        </citations>
</tool>