Mercurial > repos > mheinzl > fsd

<?xml version="1.0" encoding="UTF-8"?>
<!-- galaxy version 16.04 -->
<tool id="fsd" name="Duplex Sequencing Analysis: fsd" version="0.0.15">
    <description>Family size distribution (FSD) of tags</description>
    <requirements>
        <requirement type="package" version="2.7">python</requirement>
        <requirement type="package" version="1.4">matplotlib</requirement>
    </requirements>

    <command>
        python2 $__tool_directory__/fsd.py --inputFile1 "$file1" --inputName1 "$file1.name" --inputFile2 "$file2" --inputName2 "$file2.name" --inputFile3 "$file3" --inputName3 "$file3.name" --inputFile4 "$file4" --inputName4 "$file4.name" --sep $separator --output_pdf $output_pdf --output_csv $output_csv
    </command>
    <inputs>
        <param name="file1" type="data" format="tabular" label="Dataset 1: input tags" optional="false"/>
        <param name="file2" type="data" format="tabular" label="Dataset 2: input tags" optional="true"  />
        <param name="file3" type="data" format="tabular" label="Dataset 3: input tags" optional="true" />
        <param name="file4" type="data" format="tabular" label="Dataset 4: input tags" optional="true"  help="Input in tabular format with the family size, tags and the direction of the strand ('ab' or 'ba') for each family. Name of the files can have max. 34 charcters!"/>
        <param name="separator" type="text" label="Separator of the CSV file." help="can be a single character" value=","/>
    </inputs>
    <outputs>
        <data name="output_pdf" format="pdf" />
        <data name="output_csv" format="csv"/>
    </outputs>
    <!--  <tests>
        <test>
            <param name="file1" value="Test_data.tabular"/>
            <param name="file2" value="None"/>
            <param name="file3" value="None"/>
            <param name="file4" value="None"/>
            <output name="output_pdf" file="output_file.pdf"/>
            <output name="output_csv" file="output_file.csv"/>
        </test>
    </tests>
    -->
    <help> <![CDATA[

**What it does**

    This tool will create a distribution of family sizes of each tag, which is separated after families tags that have only the forward (ab) strand, the reverse (ba) strand or both strands (ab+ba) of the DCS and a family size distribution without separation is created. If multiple files are provided as input, the family size distribution without separation contains all datasets in one plot and for each dataset a distribution with separation after single ab, ba strands and DCSs is produced.


**Input**

    This tools expects a tabular file with the tags of all families, their sizes and information about forward (ab) and reverse (ba) strands.

    **!!! Name of the files can have max. 34 charcters !!!**

    +-----+----------------------------+----+
    | 1   | AAAAAAAAAAAATGTTGGAATCTT   | ba |
    +-----+----------------------------+----+
    | 10  | AAAAAAAAAAAGGCGGTCCACCCC   | ab |
    +-----+----------------------------+----+
    | 28  | AAAAAAAAAAATGGTATGGACCGA   | ab |
    +-----+----------------------------+----+


**Output**

    The output is a PDF file with the plot and a CSV with the data of the plot.


**About Author**

    Author: Monika Heinzl

    Department: Institute of Bioinformatics, Johannes Kepler University Linz, Austria

    Contact: monika.heinzl@edumail.at

        ]]>

    </help>
    <citations>
        <citation type="bibtex">
            @misc{duplex,
            author = {Heinzl, Monika},
            year = {2018},
            title = {Development of algorithms for the analysis of duplex sequencing data}
         }
        </citation>
    </citations>
</tool>
author	mheinzl
date	Wed, 23 May 2018 14:56:37 -0400
parents	9033fd840986
children	2a2308390e8f