Mercurial > repos > galaxyp > peptide_to_gff
view README @ 2:4c87b4cc1176
Add simple label to output files for IGV display application (it does not handle punctuation in URLs)
author | Jim Johnson <jj@umn.edu> |
---|---|
date | Mon, 15 Jun 2015 15:22:59 -0500 |
parents | cec60c540546 |
children |
line wrap: on
line source
Inputs: - A tabular file that contains a column with a peptide sequence and a column with an identifier for a reference sequence - fasta files for the reference sequences - gff or gtf for mapping the reference sequences to a genome - reference genome fasta Ensembl transcript_id files: Homo_sapiens.GRCh37.71.gtf,GRCh37.fa transcript gtf+reference map peptide to 3-frame translation of transcript map to reference genome with ensembl gtf ECGene ec_id files: ECgene_hg18_b1_low.fa,GRCh37.fa transcript from ecgene.fa map peptide to 3-frame translation of transcript map transcript to reference genome with blat Augustus id files: ssc10.2.RNA.hints.augustus.fa, ssc10.2.RNA.hints.augustus.gff map peptide to augustus protien fasta map to reference genome with GFF3 EEJ files: Homo_sapiens.GRCh37.71.gtf,eej_sus_scrofa_core_70_102.fa map peptide to eej fasta parse id to find exon names and junc_pos map to reference genome with exon_id in ensembl GTF Output: a GFF3 file that specifies the position of the peptide in a reference genome Mapping: find transcript in cDNA fasta: find transcript in translated fasta: peptide to transcript: translate transcript to animo acid sequence and search for peptide tblastn Biopython transcript to genome: If the fasta id lines contain the genomic mapping, use that Map transcript to reference genome with BLAT see if peptide cross exon boundaries