view signature_score.xml @ 1:01b2c4fcada8 draft

planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/master/tools/gsc_signature_score commit 06c8d40814f68cbf4d24b2ea70a11407bc40d072
author artbio
date Mon, 24 Jun 2019 19:17:53 -0400
parents baf7a079bce0
children e08419b8ec24
line wrap: on
line source

<tool id="signature_score" name="Compute signature scores" version="0.9.0">
    <description>in single cell RNAseq</description>
    <requirements>
        <requirement type="package" version="1.3.2=r3.3.2_0">r-optparse</requirement>
        <requirement type="package" version="2.2.1=r3.3.2_0">r-ggplot2</requirement>
        <requirement type="package" version="2.3=r3.3.2_0">r-gridextra</requirement>
        <requirement type="package" version="1.6.9=r3.3.2_0">r-psych</requirement>
    </requirements>
    <stdio>
        <exit_code range="1:" level="fatal" description="Tool exception" />
    </stdio>
    <command detect_errors="exit_code"><![CDATA[ 
        Rscript $__tool_directory__/signature_score.R 
            --input $input 
            --sep 
            #if $sep == 'tab':
              'tab'
            #elif $sep == 'comma':
              'comma'
            #end if
            --genes '$genes'
            --percentile_threshold '$threshold'
            --output '$output'
            --stats '$stats'
            --pdf '$pdf'
]]></command>
    <inputs>
        <param name="input" type="data" format="txt,tabular" label="Raw counts of expression data"/>
        <param name="sep" type="select" label="Indicate column separator">
            <option value="tab" selected="true">Tabs</option>
            <option value="comma">Comma</option>
        </param>
        <param name="genes" type="text" value="" label="Comma-separated list of genes to include in signature"
               help="Comma-separated list of genes to include in signature, eg &quot;ZNF454,GAPDH,LAIR1,ACAD9,CHTOP&quot;" />
        <param name="threshold" type="float" value="20.0" label="Threshold to keep a proposed gene in the effective signature"
               help="signature gene that are not expressed in at least this percentage of cells will not be kept to compute the effective signature" />
    </inputs>
    <outputs>
        <data name="pdf" format="pdf" label="Signatures plots from ${on_string}" />
        <data name="output" format="tabular" label="signature scores from ${on_string}" />
        <data name="stats" format="tabular" label="genes statistics from ${on_string}" />
    </outputs>
    <tests>
        <test>
            <param name="input" value="gene_filtered_input.tsv" ftype="txt"/>
            <param name="sep" value='tab' />
            <param name="genes" value="ZNF454,GAPDH,LAIR1,ACAD9,CHTOP" />
            <output name="pdf" file="signature.pdf" ftype="pdf" compare="sim_size" delta="200" />
            <output name="output" file="signature.tsv" ftype="tabular"/>
            <output name="stats" file="gene_stats.tsv" ftype="tabular"/>
        </test>
    </tests>
    <help>

**What it does**

The tools takes a table of _normalized_ gene expression values (rows) in single cell RNAseq library sequencing (columns)
and a comma-separated list of genes, and returns in a table the geometric mean of expression of these
genes for each cell/library. This geometric mean is considered as the score for the genes signature for a given cell/library. 


**Inputs**

A table of comma (csv) or tabulation (tsv) separated values.
Gene names should be in the first column and cell names should be in the first row.
Note that in a number of a csv files, header of the gene column is omitted, resulting in
a first row with one item less than in other rows. This is handled by the tool that
recognises this situation.

A comma separated list of genes desired as contributing to the signature score of the cell/library

The expression threshold for signature genes (the aforementioned gene list) to be effectively
taken into account to compute signature scores. By default, genes are taken into account if
they are expressed in at least 20% of cells/libraries

** Outputs **
The tools returns a table "signature scores" that contains

* the cell/library identifier
* the signature score for this identifier
* the rate of the signature score, ie whether it is higher (HIGH) or lower (LOW) that the average signature score in all cells/libraries
* the number of genes detected in the cell/library
* the total number of aligned reads in the cell/library

    </help>
    <citations>
        <citation type="bibtex">
        @Manual{,
             title = {R: A Language and Environment for Statistical Computing},
             author = {{R Core Team}},
             organization = {R Foundation for Statistical Computing},
             address = {Vienna, Austria},
             year = {2014},
             url = {http://www.R-project.org/},
        }
        </citation>
    </citations>
</tool>