Galaxy |

What it does

Often the genomic data processing/analysis process requires a workflow like the following:

select some features from a genome
export the sequences associated with those regions
analyse those exports with some tool like Blast

For display, especially in software like JBrowse, it is convenient to know where in the original genome the analysis results would fall. E.g. if a transmembrane domain is detected at bases 10-20 of an analysed protein, where should this be displayed relative to the parent genome?

This tool helps fill that gap, by rebasing some analysis results against the parent features which were originally analysed.

Example Inputs

For a "child" set of annotations like:

#gff-version 3
cds42       blastp  match_part      1       50      1e-40   .       .       ID=m00001;Notes=RNAse A Protein

And a parent set of annotations like:

#gff-version 3
PhageBob    maker   cds     300     600     .       +       .       ID=cds42

One could imagine that during the analysis process, the user had exported the parent annotation into some sequence:

>cds42
M......

and then analysed those results, producing the "child" annotation file. This tool will then localize the results properly against the parent:

#gff-version 3
PhageBob    blastp  match_part      300     449     1e-40   +       .       ID=m00001;Notes=RNAse A Protein

which will allow you to display the results in the correct location in visualizations.

Options

There are two optional flags which can be passed.

The Interpro flag selectively ignores features which shouldn't be included in the output (i.e. polypeptide), and a couple of qualifiers that aren't useful (status, Target)

The "Map Protein..." flag says that you translated the sequences during the genomic export process, analysing protein sequences. This indicates to the software that the bases should be multiplied by three to obtain the correct DNA locations.