Mercurial > repos > galaxyp > peptide_to_gff
view README @ 0:cec60c540546
Uploaded
author | galaxyp |
---|---|
date | Wed, 26 Jun 2013 15:56:16 -0400 |
parents | |
children |
line wrap: on
line source
Inputs: - A tabular file that contains a column with a peptide sequence and a column with an identifier for a reference sequence - fasta files for the reference sequences - gff or gtf for mapping the reference sequences to a genome - reference genome fasta Ensembl transcript_id files: Homo_sapiens.GRCh37.71.gtf,GRCh37.fa transcript gtf+reference map peptide to 3-frame translation of transcript map to reference genome with ensembl gtf ECGene ec_id files: ECgene_hg18_b1_low.fa,GRCh37.fa transcript from ecgene.fa map peptide to 3-frame translation of transcript map transcript to reference genome with blat Augustus id files: ssc10.2.RNA.hints.augustus.fa, ssc10.2.RNA.hints.augustus.gff map peptide to augustus protien fasta map to reference genome with GFF3 EEJ files: Homo_sapiens.GRCh37.71.gtf,eej_sus_scrofa_core_70_102.fa map peptide to eej fasta parse id to find exon names and junc_pos map to reference genome with exon_id in ensembl GTF Output: a GFF3 file that specifies the position of the peptide in a reference genome Mapping: find transcript in cDNA fasta: find transcript in translated fasta: peptide to transcript: translate transcript to animo acid sequence and search for peptide tblastn Biopython transcript to genome: If the fasta id lines contain the genomic mapping, use that Map transcript to reference genome with BLAT see if peptide cross exon boundaries