Galaxy | Tool Preview

TransDecoder (version 5.5.0+galaxy2)
LongOrfs options
LongOrfs options 0
Predict options
Predict options 0
Output options
Output options 0

What it does

TransDecoder identifies candidate coding regions within transcript sequences such as those generated by de novo RNA-Seq transcript assembly using Trinity or constructed based on RNA-Seq alignments to the genome using Tophat and Cufflinks.

TransDecoder identifies likely coding sequences based on the following criteria:

  • a minimum length open reading frame (ORF) is found in a transcript sequence.
  • a log-likelihood score similar to what is computed by the GeneID software is > 0.
  • the above coding score is greatest when the ORF is scored in the 1st reading frame as compared to scores in the other 5 reading frames.
  • if a candidate ORF is found fully encapsulated by the coordinates of another candidate ORF, the longer one is reported. However, a single transcript can report multiple ORFs (allowing for operons, chimeras, etc).
  • a PSSM is built/trained/used to refine the start codon prediction.
  • optional the putative peptide has a match to a Pfam domain above the noise cutoff score.

Step 1: Extract long open reading frames

By default, TransDecoder.LongOrfs will identify ORFs that are at least 100 amino acids long. You can lower this via the '-m' parameter, but know that the rate of false positive ORF predictions increases drastically with shorter minimum length criteria.

Step 2: (optional and not part of this wrapper)

The result "longest ORFs (PEP)" can be used to identify ORFs with homology to known proteins via BlastP or Pfam searches (details).

Step 3: Predict the likely coding regions

Optionally apply results of homology searches in this step and re-run the whole analysis.

Input

Output

LongOrfs

Predict

Other

References

More information are available on GitHub.