Mercurial > repos > ebi-gxa > seurat_find_variable_genes

<tool id="seurat_find_variable_genes" name="Seurat FindVariableGenes" profile="18.01" version="@SEURAT_VERSION@+galaxy0">
    <description>identify variable genes</description>
    <macros>
        <import>seurat_macros.xml</import>
    </macros>
    <expand macro="requirements" />
    <expand macro="version" />
    <command detect_errors="exit_code"><![CDATA[
seurat-find-variable-genes.R

@INPUT_OBJECT@
#if $mean:
    --mean-function '$mean'
#end if
#if $selection_method
  --selection-method '$selection_method'
#end if
#if $disp:
--dispersion-function $disp
#end if
#if $xlow:
--x-low-cutoff $xlow
#end if
#if $xhigh:
--x-high-cutoff $xhigh
#end if
#if $ylow:
--y-low-cutoff $ylow
#end if
#if $yhigh:
--y-high-cutoff $yhigh
#end if
@OUTPUT_OBJECT@
--output-text-file '$output_tab'
]]></command>

    <inputs>
        <expand macro="input_object_params"/>
        <expand macro="output_object_params"/>
        <param label="Number of features" optional="true" name="nfeatures" argument="--nfeatures" type="integer" help="Number of features to return."/>
        <param name="mean" argument="--mean-function" type="text" optional="True" label="Mean function" help="Function to compute x-axis value (average expression). Default is to take the mean of the detected (i.e. non-zero) values."/>
        <param name="disp" argument="--dispersion-function" type="text" optional="True" label="Dispersion function" help="Function to compute y-axis value (dispersion). Default is to take the standard deviation of all values." />
        <param name="xlow" argument="--x-low-cutoff" type="float" optional="True" label="X-axis low cutoff" help="Bottom cutoff on x-axis (mean) for identifying variable genes."/>
        <param name="xhigh" argument="--x-high-cutoff" type="float" optional="True" label="X-axis high cutoff" help="Top cutoff on x-axis (mean) for identifying variable genes."/>
        <param name="ylow" argument="--y-low-cutoff" type="float" optional="True" label="Y-axis low cutoff" help="Bottom cutoff on y-axis (dispersion) for identifying variable genes."/>
        <param name="yhigh" argument="--y-high-cutoff" type="float" optional="True" label="Y-axis high cutoff" help="Top cutoff on y-axis (dispersion) for identifying variable genes."/>
        <param label="Selection method" optional="true" name="selection_method" argument="--selection-method" type="select" help="How to choose top variable features. Choose one of: 'vst', 'mvp', disp.">
          <option value="vst" selected="true">vst</option>
          <option value="mbp">mbp</option>
          <option value="disp">disp</option>
        </param>
    </inputs>

    <outputs>
        <expand macro="output_files"/>
        <data name="output_tab" format="tabular" from_work_dir="*.tab" label="${tool.name} on ${on_string}: Variable genes tabular file"/>
    </outputs>

    <tests>
        <test>
            <param name="rds_seurat_file" ftype="rdata" value="E-MTAB-6077-3k_features_90_cells-normalised.rds"/>
            <output name="rds_seurat_file" ftype="rdata">
              <assert_contents>
                <has_size value="2720178" delta="200000"/>
              </assert_contents>
            </output>
        </test>
    </tests>
    <help><![CDATA[
.. class:: infomark

**What it does**

This tool identifies genes that are outliers on a 'mean variability plot'. First, uses
a function to calculate average expression (mean.function) and dispersion (dispersion.function)
for each gene. Next, divides genes into num.bin (deafult 20) bins based on
their average expression, and calculates z-scores for dispersion within each
bin. The purpose of this is to identify variable genes while controlling for
the strong relationship between variability and average expression.

For the mean.var.plot method: Exact parameter settings may vary empirically from
dataset to dataset, and based on visual inspection of the plot. Setting the
y.cutoff parameter to 2 identifies features that are more than two standard deviations
away from the average dispersion within a bin. The default X-axis function is the
mean expression level, and for Y-axis it is the log(Variance/mean). All
mean/variance calculations are not performed in log-space, but the results are
reported in log-space - see relevant functions for exact details.

@SEURAT_INTRO@

-----

**Inputs**

    * Seurat RDS object
    * Mean function. Function to compute x-axis value (average expression). Default is to take the mean of the detected (i.e. non-zero) values.
    * Dispersion function. Function to compute y-axis value (dispersion). Default is to take the standard deviation of all values.
    * Bottom cutoff on x-axis for identifying variable genes.
    * Top cutoff on x-axis for identifying variable genes.
    * Bottom cutoff on y-axis for identifying variable genes.
    * Top cutoff on y-axis for identifying variable genes.

-----

**Outputs**

    * Seurat RDS object. Places variable genes in object@var.genes. The result of all analysis is stored in object@hvg.info
    * Tabular file of variable genes

.. _Seurat: https://www.nature.com/articles/nbt.4096
.. _Satija Lab: https://satijalab.org/seurat/

@VERSION_HISTORY@
]]></help>
      <expand macro="citations" />
</tool>
author	ebi-gxa
date	Sat, 02 Mar 2024 10:41:13 +0000
parents	5be60722113a
children