view signature_score.xml @ 4:31da579595c5 draft default tip

planemo upload for repository https://github.com/ARTbio/tools-artbio/tree/main/tools/gsc_signature_score commit 84220e72e5d52cc9265b7eec20e6826e04d91f1f
author artbio
date Sun, 10 Dec 2023 04:11:16 +0000
parents 3351ca630a01
children
line wrap: on
line source

<tool id="signature_score" name="Compute signature scores" version="2.3.9+galaxy0">
    <description>in single cell RNAseq</description>
    <requirements>
        <requirement type="package" version="1.7.3">r-optparse</requirement>
        <requirement type="package" version="3.4.4">r-ggplot2</requirement>
        <requirement type="package" version="2.3">r-gridextra</requirement>
        <requirement type="package" version="2.3.9">r-psych</requirement>
    </requirements>
    <stdio>
        <exit_code range="1:" level="fatal" description="Tool exception" />
    </stdio>
    <command detect_errors="exit_code"><![CDATA[ 
        Rscript $__tool_directory__/signature_score.R 
            --input $input 
            --sep 
            #if $sep == 'tab':
              'tab'
            #elif $sep == 'comma':
              'comma'
            #end if
            --genes '$genes'
            --percentile_threshold '$threshold'
            --output '$output'
            --stats '$stats'
            --pdf '$pdf'
            --correlations '$correlations'
            --covariances '$covariances'
]]></command>
    <inputs>
        <param name="input" type="data" format="txt,tabular" label="Raw counts of expression data"/>
        <param name="sep" type="select" label="Indicate column separator">
            <option value="tab" selected="true">Tabs</option>
            <option value="comma">Comma</option>
        </param>
        <param name="genes" type="text" value="" label="Comma-separated list of genes to include in signature"
               help="Comma-separated list of genes to include in signature, eg &quot;ZNF454,GAPDH,LAIR1,ACAD9,CHTOP&quot;" />
        <param name="threshold" type="float" value="20.0" label="Threshold to keep a proposed gene in the effective signature"
               help="signature gene that are not expressed in at least this percentage of cells will not be kept to compute the effective signature" />
    </inputs>
    <outputs>
        <data name="pdf" format="pdf" label="Signatures plots from ${on_string}" />
        <data name="output" format="tabular" label="signature scores from ${on_string}" />
        <data name="stats" format="tabular" label="genes statistics from ${on_string}" />
        <data name="correlations" format="tabular" label="Signature genes correlations" />
        <data name="covariances" format="tabular" label="Signature genes covariances" />
    </outputs>
    <tests>
        <test>
            <param name="input" value="gene_filtered_input.tsv" ftype="txt"/>
            <param name="sep" value='tab' />
            <param name="genes" value="ZNF454,GAPDH,LAIR1,ACAD9,CHTOP" />
            <output name="pdf" file="signature.pdf" ftype="pdf" compare="sim_size" delta="200" />
            <output name="output" file="signature.tsv" ftype="tabular"/>
            <output name="stats" file="gene_stats.tsv" ftype="tabular"/>
            <output name="correlations" file="correlations.tsv" ftype="tabular"/>
            <output name="covariances" file="covariances.tsv" ftype="tabular"/>
        </test>
    </tests>
    <help>

**What it does**

The tools takes a table of _normalized_ gene expression values (rows) in single cell RNAseq library sequencing (columns)
and a comma-separated list of genes, and returns in a table the geometric mean of expression of these
genes for each cell/library. This geometric mean is considered as the score for the genes signature for a given cell/library. 


**Inputs**

A table of comma (csv) or tabulation (tsv) separated values.
Gene names should be in the first column and cell names should be in the first row.
Note that in a number of a csv files, header of the gene column is omitted, resulting in
a first row with one item less than in other rows. This is handled by the tool that
recognises this situation.

A comma separated list of genes desired as contributing to the signature score of the cell/library

The expression threshold for signature genes (the aforementioned gene list) to be effectively
taken into account to compute signature scores. By default, genes are taken into account if
they are expressed in at least 20% of cells/libraries

** Outputs **
The tools returns a table "signature scores" that contains

* the cell/library identifier
* the signature score for this identifier
* the rate of the signature score, ie whether it is higher (HIGH) or lower (LOW) that the average signature score in all cells/libraries
* the number of genes detected in the cell/library
* the total number of aligned reads in the cell/library

And also several tables or PDF : 

* signature genes covariance matrix
* signature genes correlation matrix
* genes statistics (mean expression, standard deviation, variance, percentage of detection)
* a pdf with the violin plot of signature score for each transcriptomes splitted by their category (HIGH/LOW)


    </help>
    <citations>
        <citation type="bibtex">
        @Manual{,
             title = {R: A Language and Environment for Statistical Computing},
             author = {{R Core Team}},
             organization = {R Foundation for Statistical Computing},
             address = {Vienna, Austria},
             year = {2014},
             url = {http://www.R-project.org/},
        }
        </citation>
    </citations>
</tool>