Galaxy |

Convert a BED format file of the proteins from a proteomics search database into a tabular format for the Multiomics Visualization Platform (MVP).

Example input BED dataset:

X       276352  291629  ENST00000430923 20      +       284187  291629  80,80,80        5       42,148,137,129,131      0,7814,12380,14295,15146
X       304749  318819  ENST00000326153 20      -       305073  318787  80,80,80        10      448,153,149,209,159,68,131,71,138,381   0,2610,2982,6669,8016,9400,10140,10479,12164,13689

Output:

name               chrom   start     end       strand  cds_start  cds_end
ENST00000430923    X       284187    284314    +          0        127
ENST00000430923    X       288732    288869    +        127        264
ENST00000430923    X       290647    290776    +        264        393
ENST00000430923    X       291498    291629    +        393        524
ENST00000326153    X       318438    318787    -          0        349
ENST00000326153    X       316913    317051    -        349        487
ENST00000326153    X       315228    315299    -        487        558
ENST00000326153    X       314889    315020    -        558        689
ENST00000326153    X       314149    314217    -        689        757
ENST00000326153    X       312765    312924    -        757        916
ENST00000326153    X       311418    311627    -        916       1125
ENST00000326153    X       307731    307880    -       1125       1274
ENST00000326153    X       307359    307512    -       1274       1427
ENST00000326153    X       305073    305197    -       1427       1551

The tabular output can be converted to a sqlite database using the Query_Tabular tool.

The sqlite table should be named: feature_cds_map The names for the columns should be: name,chrom,start,end,strand,cds_start,cds_end

This SQL query will return the genomic location for a peptide sequence in a protein (multiply the animo acid position by 3 for the cds location):

SELECT distinct chrom, CASE WHEN strand = '+' THEN start + cds_offset - cds_start ELSE end - cds_offset - cds_start END as "pos"
FROM feature_cds_map
WHERE name = acc_name AND cds_offset >= cds_start AND cds_offset < cds_end