CPAT is a bioinformatics tool to predict RNAs coding probability based on the RNA sequence characteristics. To achieve this goal, CPAT calculates scores of these 4 linguistic features from a set of known protein-coding genes and another set of non-coding genes: ORF size, ORF coverage, Fickett TESTCODE and hexamer usage bias. CPAT will then builds a logistic regression model using these 4 features as predictor variables and the protein-coding status as the response variable. After evaluating the performance and determining the probability cutoff, the model can be used to predict new RNA sequences. |
hg clone https://toolshed.g2.bx.psu.edu/repos/bgruening/cpat
Name | Description | Version | Minimum Galaxy Version |
---|---|---|---|
coding potential assessment | 3.0.5+galaxy1 | 16.01 |