Mercurial > repos > galaxyp > peptide_to_gff

Inputs:

- A tabular file that contains a column with a peptide sequence and a column with an identifier for a reference sequence
- fasta files for the reference sequences
- gff or gtf for mapping the reference sequences to a genome
- reference genome fasta

Ensembl transcript_id 	files:  Homo_sapiens.GRCh37.71.gtf,GRCh37.fa
  transcript   gtf+reference
  map peptide to 3-frame translation of transcript
  map to reference genome with ensembl gtf

ECGene  ec_id           files:  ECgene_hg18_b1_low.fa,GRCh37.fa
  transcript from ecgene.fa
  map peptide to 3-frame translation of transcript
  map transcript to reference genome with blat

Augustus id  		files:  ssc10.2.RNA.hints.augustus.fa, ssc10.2.RNA.hints.augustus.gff
  map peptide to augustus protien fasta
  map to reference genome with GFF3

EEJ			files:  Homo_sapiens.GRCh37.71.gtf,eej_sus_scrofa_core_70_102.fa
  map peptide to eej fasta
  parse id to find exon names and junc_pos
  map  to reference genome with  exon_id in ensembl GTF


Output:
a GFF3 file that specifies the position of the peptide in a reference genome


Mapping:
  find transcript in cDNA fasta:
  find transcript in translated fasta:


  peptide to transcript:
   translate transcript to animo acid sequence and search for peptide
   tblastn
   Biopython

  transcript to genome:
    If the fasta id lines contain the genomic mapping, use that
    Map transcript to reference genome with BLAT
    see if peptide cross exon boundaries
author	galaxyp
date	Wed, 26 Jun 2013 15:56:16 -0400
parents
children