view sam_compare_w_feature.xml @ 0:e979cb57a5d5 draft default tip

"planemo upload for repository https://github.com/McIntyre-Lab/BayesASE/tree/main/galaxy commit 9b70598ef46a73632d9e0fa0c6ce6776fb5e9d6a"
author malex
date Thu, 14 Jan 2021 21:51:36 +0000
parents
children
line wrap: on
line source

<tool id="sam_compare_w_feature" name="Compare SAM Files and Create ASE Counts Tables" version="21.1.13">
    <description> containing read counts for each parental genome </description>
    <macros>
        <import>macros.xml</import>
    </macros>
    <expand macro="requirements"/>
    <command><![CDATA[
    sam_compare_w_feature.py
    --fastq=$fastq
    --sama=$sama
    --samb=$samb
    --feature=$feature
    --length=`awk '{if(NR%4==2 && length) {count++; bases += length}} END {print bases/count}' ${fastq} | awk '{printf "%.0f", $1}'`
    $nofqids
    --counts=$counts
    --totals=$totals
]]></command>
    <inputs>
        <param name="fastq" type="data" format="fastq" label="FASTQ File" help="Select the FASTQ file used to generate the 2 SAM files [REQUIRED]"/>
        <param name="sama" type="data" format="tabular" label="SAM file with G1 reference" help="Select SAM file aligned to updated genome1 containing feature ID in RNAME field [REQUIRED]"/>
        <param name="samb" type="data" format="tabular" label="SAM file with G2 reference" help="Select SAM file for aligned to updated genome2 containing feature ID in RNAME field [REQUIRED]"/>
        <param name="feature" type="data" format="tabular" label="Reformatted BED file" help="Select BED file containing features to assign reads to, with feature ID names in first column. Can be a .tsv or .bed [REQUIRED]"/>
        <param name="nofqids" type="boolean" checked="false" truevalue="--nofqids" falsevalue="" label="Check FASTQ IDs" help="Select to skip checking SAM QNAME against the fastq sequence IDs. Saves time if already known to be good."/>
    </inputs>
    <outputs>
        <data name="counts" format="tsv" label="SAM Compare with Feature on ${on_string}: Create ASE Counts Tables" default_identifier_source="sama"/>
        <data name="totals" format="tsv" label="SAM Compare with Feature on ${on_string}: Create ASE Totals Tables" default_identifier_source="sama"/>
    </outputs>
    <tests>
        <test>
            <param name="fastq" ftype="data"      value="align_and_counts_test_data/W55_M_1_1.fastq"/>
            <param name="sama"  ftype="data"     value="align_and_counts_test_data/W1118_G1_create_new_SAM_file_with_features_BASE_test_data.sam"/>
            <param name="samb"  ftype="data"     value="align_and_counts_test_data/W55_G2_create_new_SAM_file_with_features_BASE_test_data.sam" />
            <param name="feature" ftype="data"   value="align_and_counts_test_data/reformat_BED_file_for_BASE.bed" />
            <param name="nofqids" ftype="boolean"   value="--nofqids" />
            <output name="counts" file="align_and_counts_test_data/ASE_counts_table_BASE_test_data.tsv" />
            <output name="totals" file="align_and_counts_test_data/ASE_totals_table_BASE_test_data.tsv" />
        </test>
    </tests>
    <help><![CDATA[
**Tool Description**

This tool compares the read mapping across two SAM files generated by mapping a FASTQ file to 2 genotype specific references.
The tool creates an ASE Counts Table for each FASTQ file by comparing the SAM files with genic features (created by the Create new SAM file tool) that were aligned to both genotype specific references. 
Reads in each SAM file are processed and compared to determine whether a given read maps better to one genome or equally well to both. 
The results are also displayed in a read count summary table.

**Input**

**FASTQ file [REQUIRED]**

The tool requires the FASTQ file used to generate the 2 SAM files. The FASTQ file is used to determine the APN (average number of reads per nucleotide).

**SAM files [Required]**

The tool requires the two SAM files of uniquely mapping reads output by the *Create New SAM file* tool.

(1) SAM file containing uniquely mapping reads that overlap with features of interest for updated reference genome G1 (SAM A).
(2) SAM file containing uniquely mapping reads that overlap with features of interest for updated reference genome G2 (SAM B).

**4- column BED file [Required]**

A four column BED file containing genic features in the 1st column and chromosome name in the 4th column.
This input BED file can be created using the *Reformat BED file* tool.

Example input BED File::

    +---------------+-----------+------------+------------+
    |   name        | start     |  end       |    chrom   |
    +===============+===========+============+============+
    |  featureA     |   2345    |   2899     |    2R      |
    +---------------+-----------+------------+------------+

**Output**
The tools generates 2 output TSV files:

1. An ASE counts table containing the orientation that unique reads mapped to each feature listed in the input Feature/BED

Example of ASE Counts Table

    +---------------+------------+------------------+----------------+----------------+--------------------------+-------------------------+-------------------------+--------------------------+--------------------+--------------------+
    |Feature_ID     | BOTH_EXACT |BOTH_INEXACT_EQUAL|SAM_A_ONLY_EXACT|SAM_B_ONLY_EXACT| SAM_A_EXACT_SAM_B_INEXACT|SAM_B_EXACT_SAM_A_INEXACT|SAM_A_ONLY_SINGLE_INEXACT|SAM_B_ONLY_SINGLE_INEXACT |SAM_A_INEXACT_BETTER|SAM_B_INEXACT_BETTER|
    +===============+============+==================+================+================+==========================+=========================+=========================+==========================+====================+====================+
    | l(1)G0196     |  4         |2                 |   0            |    0           |    0                     |    0                    |0                        |                 0        |    1               | 1                  |
    +---------------+------------+------------------+----------------+----------------+--------------------------+-------------------------+-------------------------+--------------------------+--------------------+--------------------+
    | CG8920        |0           | 1                | 0              | 0              |  0                       | 0                       | 0                       |0                         |0                   |0                   |
    +---------------+------------+------------------+----------------+----------------+--------------------------+-------------------------+-------------------------+--------------------------+--------------------+--------------------+
    |CG10932        |  0         | 1                | 0              |  0             | 0                        | 0                       |0                        |0                         |0                   | 0                  |
    +---------------+------------+------------------+----------------+----------------+--------------------------+-------------------------+-------------------------+--------------------------+--------------------+--------------------+
    |Mapmodulin     | 2          |1                 |0               |0               | 0                        | 0                       | 0                       | 0                        | 0                  |  1                 |
    +---------------+------------+------------------+----------------+----------------+--------------------------+-------------------------+-------------------------+--------------------------+--------------------+--------------------+


ASE Counts Table headers:

    (1) BOTH_EXACT: Number of reads that mapped uniquely to both genomes, and it was an exact match
    (2) BOTH_INEXACT_EQUAL: Number of reads that mapped equally to both genomes (ie did not preferentially to a certain parental genome), but it was an inexact match
    (3) {SAM_A/B}_ONLY_EXACT: Reads that mapped preferentially to only one of the two genomes and it was an exact match
    (4) SAM_A_EXACT_SAM_B_INEXACT: Reads that mapped exactly to  genome 1 and inexactly to parental genome 2
    (5) SAM_B_EXACT_SAM_A_INEXACT: Reads that mapped exactly to genome 2 and inexactly to genome 1
    (6) {SAM_A/B}_ONLY_SINGLE_INEXACT: Reads that mapped preferentially to only onegenome, and it was in an inexact manner
    (7) {SAM_A/B}_INEXACT_BETTER: Reads that mapped preferentially to their opposite parent, but the match is inexact



2. An ASE totals table containing a summary of the reads aligning to the the G1 and G2 references

Example of ASE Totals Table::

           Count totals:
    1:	a_single_exact	0
    2:	a_single_inexact	0
    3:	a_multi_exact	0
    4:	a_multi_inexact	0
    5:	b_single_exact	0
    6:	b_single_inexact	0
    7:	b_multi_exact	0
    8:	b_multi_inexact	0
    9:	both_single_exact_same	0
    10:	both_single_exact_diff	6
    11:	both_single_inexact_same	0
    12:	both_single_inexact_diff	8
    13:	both_inexact_diff_equal	5
    14:	both_inexact_diff_a_better	1
    15:	both_inexact_diff_b_better	2
    16:	both_multi_exact	0
    17:	both_multi_inexact	0
    18:	a_single_exact_b_single_inexact	0
    19:	a_single_inexact_b_single_exact	0
    20:	a_single_exact_b_multi_exact	0
    21:	a_multi_exact_b_single_exact	0
    22:	a_single_exact_b_multi_inexact	0
    23:	a_multi_inexact_b_single_exact	0
    24:	a_single_inexact_b_multi_exact	0
    25:	a_multi_exact_b_single_inexact	0
    26:	a_single_inexact_b_multi_inexact	0
    27:	a_multi_inexact_b_single_inexact	0
    28:	a_multi_exact_b_multi_inexact	0
    29:	a_multi_inexact_b_multi_exact	0
    30:	total_count	14


    ]]></help>
    <citations>
            <citation type="bibtex">@ARTICLE{Miller20BASE,
            author = {Brecca Miller, Alison M. Morse, Elyse Borgert, Zihao Liu, Kelsey Sinclair, Gavin Gamble, Fei Zou, Jeremy Newman, Luis Leon Novello, Fabio Marroni, Lauren M. McIntyre},
            title = {Testcrosses are an efficient strategy for identifying cis regulatory variation: Bayesian analysis of allele imbalance among conditions (BASE)},
            journal = {????},
            year = {submitted for publication}
            }</citation>
        </citations>
</tool>