Galaxy | Tool Preview

TRANSIT Gumbel (version 3.0.2+galaxy1)
If set to 'Batch', transit will run and produce one output for each input file. If set to 'Replicates', transit will run once on all the input files.

What it does

The Gumbel method can be used to determine which genes are essential in a single condition. It does a gene-by-gene analysis of the insertions at TA sites with each gene, makes a call based on the longest consecutive sequence of TA sites without insertion in the genes, calculates the probability of this using a Bayesian model.

Note : Intended only for Himar1 datasets.


Input files for Gumnbel need to be:


Optional Arguments:
-s <integer>
= Number of samples. Default:
 -s 10000
-b <integer> := Number of Burn-in samples. Default -b 500
-m <integer>
= Smallest read-count to consider. Default:
 -m 1
-t <integer>
= Trims all but every t-th value. Default:
 -t 1
-r <string>
= How to handle replicates. Sum or Mean. Default:
 -r Mean
--iN <float>
= Ignore TAs occuring at given fraction of the N terminus. Default:
 -iN 0.0
--iC <float>
= Ignore TAs occuring at given fraction of the C terminus. Default:
 -iC 0.0
-n <string> := Determines which normalization method to use. Default -n TTR


Column Header Column Definition
Orf Gene ID
Name Gene Name
Desc Gene Description
k Number of Transposon Insertions Observed within the ORF.
n Total Number of TA dinucleotides within the ORF.
r Span of nucleotides for the Maximum Run of Non-Insertions.
s Span of nucleotides for the Maximum Run of Non-Insertions.
zbar Posterior Probability of Essentiality.
State Call Essentiality call for the gene. Depends on FDR corrected thresholds. E=Essential U=Uncertain, NE=Non-Essential, S=too short

Note: Technically, Bayesian models are used to calculate posterior probabilities, not p-values (which is a concept associated with the frequentist framework). However, we have implemented a method for computing the approximate false-discovery rate (FDR) that serves a similar purpose. This determines a threshold for significance on the posterior probabilities that is corrected for multiple tests. The actual thresholds used are reported in the headers of the output file (and are near 1 for essentials and near 0 for non-essentials). There can be many genes that score between the two thresholds (t1 < zbar < t2). This reflects intrinsic uncertainty associated with either low read counts, sparse insertion density, or small genes. If the insertion_density is too low (< ~30%), the method may not work as well, and might indicate an unusually large number of Uncertain or Essential genes.

More Information

See TRANSIT documentation