view annotation_profiler.xml @ 0:3b33da018e74 draft default tip

Imported from capsule None
author devteam
date Mon, 19 May 2014 12:33:42 -0400
parents
children
line wrap: on
line source

<tool id="Annotation_Profiler_0" name="Profile Annotations" version="1.0.0">
  <description>for a set of genomic intervals</description>
  <requirements>
    <requirement type="package" version="0.7.1">bx-python</requirement>
  </requirements>
  <command interpreter="python">annotation_profiler_for_interval.py -i $input1 -c ${input1.metadata.chromCol} -s ${input1.metadata.startCol} -e ${input1.metadata.endCol} -o $out_file1 $keep_empty -p ${GALAXY_DATA_INDEX_DIR}/annotation_profiler/$dbkey $summary -b 3 -t $table_names</command>
  <inputs>
    <param format="interval" name="input1" type="data" label="Choose Intervals">
      <validator type="dataset_metadata_in_file" filename="annotation_profiler_valid_builds.txt" metadata_name="dbkey" metadata_column="0" message="Profiling is not currently available for this species."/>
    </param>
    <param name="keep_empty" type="select" label="Keep Region/Table Pairs with 0 Coverage">
      <option value="-k">Keep</option>
      <option value="" selected="true">Discard</option>
    </param>
    <param name="summary" type="select" label="Output per Region/Summary">
      <option value="-S">Summary</option>
      <option value="" selected="true">Per Region</option>
    </param>
    <param name="table_names" type="drill_down" display="checkbox" hierarchy="recurse" multiple="true" label="Choose Tables to Use" help="Selecting no tables will result in using all tables." from_file="annotation_profiler_options.xml"/>
   </inputs>
   <outputs>
     <data format="input" name="out_file1">
       <change_format>
         <when input="summary" value="-S" format="tabular" />
       </change_format>
     </data>
   </outputs>
   <tests>
     <test>
       <param name="input1" value="4.bed" dbkey="hg18"/>
       <param name="keep_empty" value=""/>
       <param name="summary" value=""/>
       <param name="table_names" value="acembly,affyGnf1h,knownAlt,knownGene,mrna,multiz17way,multiz28way,refGene,snp126"/>
       <output name="out_file1" file="annotation_profiler_1.out" />
     </test>
     <test>
       <param name="input1" value="3.bed" dbkey="hg18"/>
       <param name="keep_empty" value=""/>
       <param name="summary" value="Summary"/>
       <param name="table_names" value="acembly,affyGnf1h,knownAlt,knownGene,mrna,multiz17way,multiz28way,refGene,snp126"/>
       <output name="out_file1" file="annotation_profiler_2.out" />
     </test>
   </tests>
   <help>
**What it does**

Takes an input set of intervals and for each interval determines the base coverage of the interval by a set of features (tables) available from UCSC. Genomic regions from the input feature data have been merged by overlap / direct adjacency (e.g. a table having ranges of: 1-10, 6-12, 12-20 and 25-28 results in two merged ranges of: 1-20 and 25-28).

By default, this tool will check the coverage of your intervals against all available features; you may, however, choose to select only those tables that you want to include. Selecting a section heading will effectively cause all of its children to be selected.

You may alternatively choose to receive a summary across all of the intervals that you provide.

-----

**Example**

Using the interval below and selecting several tables::

 chr1 4558 14764 uc001aab.1 0 -

results in::

 chr1 4558 14764 uc001aab.1 0 - snp126Exceptions 151 142
 chr1 4558 14764 uc001aab.1 0 - genomicSuperDups 10206 1
 chr1 4558 14764 uc001aab.1 0 - chainOryLat1 3718 1
 chr1 4558 14764 uc001aab.1 0 - multiz28way 10206 1
 chr1 4558 14764 uc001aab.1 0 - affyHuEx1 3553 32
 chr1 4558 14764 uc001aab.1 0 - netXenTro2 3050 1
 chr1 4558 14764 uc001aab.1 0 - intronEst 10206 1
 chr1 4558 14764 uc001aab.1 0 - xenoMrna 10203 1
 chr1 4558 14764 uc001aab.1 0 - ctgPos 10206 1
 chr1 4558 14764 uc001aab.1 0 - clonePos 10206 1
 chr1 4558 14764 uc001aab.1 0 - chainStrPur2Link 1323 29
 chr1 4558 14764 uc001aab.1 0 - affyTxnPhase3HeLaNuclear 9011 8
 chr1 4558 14764 uc001aab.1 0 - snp126orthoPanTro2RheMac2 61 58
 chr1 4558 14764 uc001aab.1 0 - snp126 205 192
 chr1 4558 14764 uc001aab.1 0 - chainEquCab1 10206 1
 chr1 4558 14764 uc001aab.1 0 - netGalGal3 3686 1
 chr1 4558 14764 uc001aab.1 0 - phastCons28wayPlacMammal 10172 3

Where::

 The first added column is the table name.
 The second added column is the number of bases covered by the table.
 The third added column is the number of regions from the table that is covered by the interval.

Alternatively, requesting a summary, using the intervals below and selecting several tables::

 chr1 4558 14764 uc001aab.1 0 -
 chr1 4558 19346 uc001aac.1 0 -

results in::

 #tableName tableSize tableRegionCount allIntervalCount allIntervalSize allCoverage allTableRegionsOverlaped allIntervalsOverlapingTable nrIntervalCount nrIntervalSize nrCoverage nrTableRegionsOverlaped nrIntervalsOverlapingTable
 snp126Exceptions 133601 92469 2 24994 388 359 2 1 14788 237 217 1
 genomicSuperDups 12268847 657 2 24994 24994 2 2 1 14788 14788 1 1
 chainOryLat1 70337730 2542 2 24994 7436 2 2 1 14788 3718 1 1
 affyHuEx1 15703901 112274 2 24994 7846 70 2 1 14788 4293 38 1
 netXenTro2 111440392 1877 2 24994 6100 2 2 1 14788 3050 1 1
 snp126orthoPanTro2RheMac2 700436 690674 2 24994 124 118 2 1 14788 63 60 1
 intronEst 135796064 2332 2 24994 24994 2 2 1 14788 14788 1 1
 xenoMrna 129031327 1586 2 24994 20406 2 2 1 14788 10203 1 1
 snp126 956976 838091 2 24994 498 461 2 1 14788 293 269 1
 clonePos 224999719 39 2 24994 24994 2 2 1 14788 14788 1 1
 chainStrPur2Link 7948016 119841 2 24994 2646 58 2 1 14788 1323 29 1
 affyTxnPhase3HeLaNuclear 136797870 140244 2 24994 22601 17 2 1 14788 13590 9 1
 multiz28way 225928588 38 2 24994 24994 2 2 1 14788 14788 1 1
 ctgPos 224999719 39 2 24994 24994 2 2 1 14788 14788 1 1
 chainEquCab1 246306414 141 2 24994 24994 2 2 1 14788 14788 1 1
 netGalGal3 203351973 461 2 24994 7372 2 2 1 14788 3686 1 1
 phastCons28wayPlacMammal 221017670 22803 2 24994 24926 6 2 1 14788 14754 3 1

Where::
 
 tableName is the name of the table
 tableChromosomeCoverage is the number of positions existing in the table for only the chromosomes that were referenced by the interval file
 tableChromosomeCount is the number of regions existing in the table for only the chromosomes that were referenced by the interval file
 tableRegionCoverage is the number of positions existing in the table between the minimal and maximal bounding regions that were referenced by the interval file
 tableRegionCount is the number of regions existing in the table between the minimal and maximal bounding regions that were referenced by the interval file
 
 allIntervalCount is the number of provided intervals
 allIntervalSize is the sum of the lengths of the provided interval file
 allCoverage is the sum of the coverage for each provided interval
 allTableRegionsOverlapped is the sum of the number of regions of the table (non-unique) that were overlapped for each interval
 allIntervalsOverlappingTable is the number of provided intervals which overlap the table
 
 nrIntervalCount is the number of non-redundant intervals
 nrIntervalSize is the sum of the lengths of non-redundant intervals
 nrCoverage is the sum of the coverage of non-redundant intervals
 nrTableRegionsOverlapped is the number of regions of the table (unique) that were overlapped by the non-redundant intervals
 nrIntervalsOverlappingTable is the number of non-redundant intervals which overlap the table
 

.. class:: infomark

**TIP:** non-redundant (nr) refers to the set of intervals that remains after the intervals provided have been merged to resolve overlaps

------

**Citation**

For the underlying data, please see http://genome.ucsc.edu/cite.html for the proper citation.

If you use this tool in Galaxy, please cite Blankenberg D, et al. *In preparation.*

  </help>
</tool>