Galaxy | Tool Preview

Select longest CDS per gene (version 0.0.2)

This tool filters a CDS FASTA file from Ensembl retaining only the longest CDS sequence for each gene.

The headers of the input CDS FASTA file are expected to be of the following format:

>ENSMUST00000177965.1 cds chromosome:GRCm38:12:113456720:113456736:-1 gene:ENSMUSG00000094057.1 gene_biotype:IG_D_gene transcript_biotype:IG_D_gene gene_symbol:Ighd2-7 description:immunoglobulin heavy diversity 2-7 [Source:MGI Symbol;Acc:MGI:4439866]

Among the CDS sequences having the same gene identifier (ENSMUSG00000094057 in the example above), the tool will select the one with the longest sequence.