view tools/new_operations/join.xml @ 0:9071e359b9a3

author xuebing
date Fri, 09 Mar 2012 19:37:19 -0500
line wrap: on
line source

<tool id="gops_join_1" name="Join">
  <description>the intervals of two datasets side-by-side</description>
  <command interpreter="python"> $input1 $input2 $output -1 ${input1.metadata.chromCol},${input1.metadata.startCol},${input1.metadata.endCol},${input1.metadata.strandCol} -2 ${input2.metadata.chromCol},${input2.metadata.startCol},${input2.metadata.endCol},${input2.metadata.strandCol} -m $min -f $fill</command>
    <param format="interval" name="input1" type="data" help="First dataset">
    <param format="interval" name="input2" type="data" help="Second dataset">
    <param name="min" size="4" type="integer" value="1" help="(bp)">
      <label>with min overlap</label>
  <param name="fill" type="select" label="Return">
    <option value="none">Only records that are joined (INNER JOIN)</option>
    <option value="right">All records of first dataset (fill null with ".")</option>
    <option value="left">All records of second dataset (fill null with ".")</option>
    <option value="both">All records of both datasets (fill nulls with ".")</option>
    <data format="interval" name="output" metadata_source="input1" />
  <code file=""/>
      <param name="input1" value="1.bed" />
      <param name="input2" value="2.bed" />
      <param name="min" value="1" />
      <param name="fill" value="none" />
      <output name="output" file="gops-join-none.dat" />
      <param name="input1" value="1.bed" />
      <param name="input2" value="2.bed" />
      <param name="min" value="1" />
      <param name="fill" value="right" />
      <output name="output" file="gops-join-right.dat" />
      <param name="input1" value="1.bed" />
      <param name="input2" value="2.bed" />
      <param name="min" value="1" />
      <param name="fill" value="left" />
      <output name="output" file="gops-join-left.dat" />
      <param name="input1" value="1.bed" />
      <param name="input2" value="2.bed" />
      <param name="min" value="1" />
      <param name="fill" value="both" />
      <output name="output" file="gops-join-both.dat" />
      <param name="input1" value="1.bed" />
      <param name="input2" value="2.bed" />
      <param name="min" value="500" />
      <param name="fill" value="none" />
      <output name="output" file="gops-join-none-500.dat" />
      <param name="input1" value="1.bed" />
      <param name="input2" value="2.bed" />
      <param name="min" value="100" />
      <param name="fill" value="both" />
      <output name="output" file="gops-join-both-100.dat" />

.. class:: infomark

**TIP:** If your dataset does not appear in the pulldown menu, it means that it is not in interval format. Use "edit attributes" to set chromosome, start, end, and strand columns.



See Galaxy Interval Operation Screencasts_ (right click to open this link in another window).

.. _Screencasts:



- **Where overlap** specifies the minimum overlap between intervals that allows them to be joined.
- **Return only records that are joined** returns only the records of the first dataset that join to a record in the second dataset.  This is analogous to an INNER JOIN.
- **Return all records of first dataset (fill null with &quot;.&quot;)** returns all intervals of the first dataset, and any intervals that do not join an interval from the second dataset are filled in with a period(.).  This is analogous to a LEFT JOIN.
- **Return all records of second dataset (fill null with &quot;.&quot;)** returns all intervals of the second dataset, and any intervals that do not join an interval from the first dataset are filled in with a period(.).  **Note that this may produce an invalid interval file, since a period(.) is not a valid chrom, start, end or strand.**
- **Return all records of both datasets (fill nulls with &quot;.&quot;)** returns all records from both datasets, and fills on either the right or left with periods.  **Note that this may produce an invalid interval file, since a period(.) is not a valid chrom, start, end or strand.**



If **First dataset** is::

   chr1 10   100  Query1.1
   chr1 500  1000 Query1.2
   chr1 1100 1250 Query1.3

and **Second dataset** is::

   chr1 20   80   Query2.1
   chr1 2000 2204 Query2.2
   chr1 2500 3000 Query2.3

The four return options will generate:

- **Return only records that are joined**::

   chr1 10 100 Query1.1 chr1 20 80 Query2.1

- **Return all records of first dataset**::

   chr1 10   100  Query1.1 chr1 20 80 Query2.1
   chr1 500  1000 Query1.2 .    .  .  .
   chr1 1100 1250 Query1.3 .    .  .  .

- **Return all records of second dataset**::

   chr1 10 100 Query1.1 chr1 20   80   Query2.1
   .    .  .   .        chr1 2000 2204 Query2.2
   .    .  .   .        chr1 2500 3000 Query2.3

- **Return all records of both datasets**::

   chr1 10   100  Query1.1 chr1 20   80   Query2.1
   chr1 500  1000 Query1.2 .    .    .    .
   chr1 1100 1250 Query1.3 .    .    .    .
   .    .    .    .        chr1 2000 2204 Query2.2
   .    .    .    .        chr1 2500 3000 Query2.3
