Mercurial > repos > malex > bayesase

<tool id="combine_cnt_tables" name="Combine ASE Count Tables:" version="21.1.13">
    <description>sum technical replicates</description>
    <macros>
      <import>macros.xml</import>
    </macros>
    <expand macro="requirements" />
    <command><![CDATA[
    mkdir outputs;
    cd outputs;
    combine_count_tables.py
    --design=$design
    --bed=$bed
    --collection_identifiers="${",".join($collection.keys())}"
    --collection_filenames="${",".join(map(str, $collection))}"
    --begin=$begin
    --end=$end
    $sim
    --outdir=`pwd`
    --outdesign=$output_designFile
]]></command>
    <inputs>
        <param name="design" type="data" format="tabular,tsv" label="Design File" help="Select the Alignment Design File. The design file must be sorted by biological replicate (sampleID) "/>
        <param name="bed" type="data" format="tabular" label="Bed File" help="Select the input BED file. Do NOT use the reformatted BED file created in the Align and Count workflow"/>
        <param name="collection" type="data_collection" collection_type="list" label="ASE Count Tables" help="Select collection containing ASE count tables"/>
        <param name="begin" type="text" label="Start" help="Enter start point in design file, see below for description [OPTIONAL]"/>
        <param name="end" type="text" label="End" help="Enter end point in design file, see below for description [OPTIONAL]"/>
        <param name="sim" type="boolean" checked="false" truevalue="--sim" falsevalue="" label="Simulated Dataset" help="Select if the dataset is generated from simulated reads"/>
    </inputs>
    <outputs>
      <collection name="split_output" type="list" label="${tool.name} on ${on_string}: Combined and Summed ASE Table">
        <discover_datasets pattern="(?P&lt;name&gt;.*)" ext="tabular" sort_by="reverse_filename" directory="outputs" />
      </collection>
      <data name="output_designFile" format="tabular" label="${tool.name} on ${on_string}: Sample Design File" />
    </outputs>
    <tests>
        <test>
            <param name="design" value="summarize_counts_testdata/alignment_design_test_file.tsv" ftype="tsv"/>
            <param name="bed" value="align_and_counts_test_data/BASE_testData_BEDfile.bed" ftype="bed"/>
            <param name="collection" value="summarize_counts_testdata/ASE_counts_tables" ftype="data_collection"/>
            <output name="output_designFile" file="summarize_counts_testdata/sample_design_file.tabular" ftype="tabular"/>
            <output_collection name="split_output" type="list">
              <element name="FEATURE_ID">
                <assert_contents>
                  <has_text_matching expression="Combine_counts_output"/>
                </assert_contents>
              </element>
             </output_collection>
        </test>
    </tests>
    <help><![CDATA[
**Tool Description**

The Combine ASE Counts Tables tool sums the input ASE Counts tables across the technical replicates specified in the Alignment Design File.
Technical replicates within the same biological replicate are combined for each comparate and the average per nucleotide (APN) value is calculated for each feature.

The tool outputs a single ASE counts file for each sample replicated with APN values.
The tool also outputs a Sample Design File that is required for the Summarization and Filter ASE Counts Tables tool that follows.

APN_total= ([total alignment count] x [readLength]) /(genic feature length)
APN_both= ([reads mapping equally to both parental genomes] x (readlength)/ (genic feature length)


-------------------------------------------------------------------------------------------

**Inputs**

-There are four required inputs for this tool.

**Alignment Design File [REQUIRED]**

**TIP**: Check if the design file is in the correct format by using the *Check Align Design file* tool

The Alignment Design File is required as input and must have the following format:

Example design file::

	G1	G2	sampleID		fqName			fqExtension	techRep	readLength
	W1118	W55	mel_W55_Mated_E1	mel_W55_Mated_E1_R1	.fq		1	150
	W1118	W55	mel_W55_Mated_E1	mel_W55_Mated_E1_R2	.fq		2	150
	W1118	W55	mel_W55_Mated_E1	mel_W55_Mated_E1_R3	.fq		3	150

In the example design file above, one summed ASE Counts table would be generated from the three technical replicates belonging to biological replicate 1 (E1).

**NOTE:** If using simulated reads, include the technical replicate column, but label the technical replicates with the same number

Example design file for simulated data ::

	G1	G2      sampleID   fqName	fqExtension	techRep	readLength
	W1118	W55      W55_M_1   SRR1989586_1     .fq		1	96
	W1118	W55      W55_M_2   SRR1989588_1     .fq		1	96
	W1118	W55      W55_V_1   SRR1989592_1     .fq		1	96
	W1118	W55      W55_V_2   SRR1989594_1     .fq		1	96

**BED File [REQUIRED]**

The user-supplied BED file containing locations of genic features. Do **not** use the reformatted BED file created in the Align and Count workflow where feature ID is in the first column and chromosome name is in the last. Chromosome name must be in the first column.

**ASE Count Tables [REQUIRED]**

A collection of ASE Count Tables generated by the Sam Compare with Feature Tool.


**Start [OPTIONAL]**

Enter the row number in the Alignment Design File at which to start. Use if only a certain subset of FASTQ files in the design file is wanted in data analysis.

**End [OPTIONAL]**

Enter the row number in the Alignment Design File at which to end.

**Simulated Dataset**

Select if the input dataset is simulated.

------------------------------------------------------------------------------------------------------

**Output**

The tool generates two output files:

(1) A TSV file for each sample replicate containing the summed ASE Counts and the APN values for the uniquely mapping reads per feature.

Example of a Combined ASE Count Table::


    +------------+-------------------+-------------------+------------+------------------+----------------+----------------+--------------------------+-------------------------+-------------------------+--------------------------+--------------------+--------------------+
    |Feature_ID  |APN_both           |APN_total_reads    | BOTH_EXACT |BOTH_INEXACT_EQUAL|SAM_A_ONLY_EXACT|SAM_B_ONLY_EXACT| SAM_A_EXACT_SAM_B_INEXACT|SAM_B_EXACT_SAM_A_INEXACT|SAM_A_ONLY_SINGLE_INEXACT|SAM_B_ONLY_SINGLE_INEXACT |SAM_A_INEXACT_BETTER|SAM_B_INEXACT_BETTER|
    +============+===================+===================+============+==================+================+================+==========================+=========================+=========================+==========================+====================+====================+
    | l(1)G0196  |10.255101044615834 |12.723420872791175 | 721        |1476              |120             |173             |0                         | 2                       |96                       |136                       |0                   |2                   |
    +------------+-------------------+-------------------+------------+------------------+----------------+----------------+--------------------------+-------------------------+-------------------------+--------------------------+--------------------+--------------------+
    | CG8920     |7.0372442219932285 |8.62888267334020   | 207        |293               |31              |62              |0                         | 0                       |8                        |12                        |0                   |0                   |
    +------------+-------------------+-------------------+------------+------------------+----------------+----------------+--------------------------+-------------------------+-------------------------+--------------------------+--------------------+--------------------+


(2) A Sample Design File containing the names of the biological replicates that were summed in the generated ASE Counts Table collection.
G1 and G2 refer to the names of the parental genomes.


Example Sample Design File:

  +------+-----+------------------+----------------+
  | G1   | G2  | sampleID         | comparate      |
  +------+-----+------------------+----------------+
  |W1118 | W55 | mel_W55_M_1      | mel_W55_M      |
  +------+-----+------------------+----------------+
  |W1118 | W55 | mel_W55_M_2      | mel_W55_M      |
  +------+-----+------------------+----------------+
  |W1118 | W55 | mel_W55_V_1      | mel_W55_V      |
  +------+-----+------------------+----------------+
  |W1118 | W55 | mel_W55_V_2      | mel_W55_V      |
  +------+-----+------------------+----------------+


    ]]></help>
    <citations>
            <citation type="bibtex">@ARTICLE{Miller20BASE,
            author = {Brecca Miller, Alison M. Morse, Elyse Borgert, Zihao Liu, Kelsey Sinclair, Gavin Gamble, Fei Zou, Jeremy Newman, Luis Leon Novello, Fabio Marroni, Lauren M. McIntyre},
            title = {Testcrosses are an efficient strategy for identifying cis regulatory variation: Bayesian analysis of allele imbalance among conditions (BASE)},
            journal = {????},
            year = {submitted for publication}
            }</citation>
        </citations>
</tool>
author	malex
date	Thu, 14 Jan 2021 21:51:36 +0000
parents
children