Add Annotations from dbNSFP and similar annotation DBssnpSift_macros.xml "$output"
2> tmp.err && grep -v file tmp.err
]]>
=1. Then LRTnew scores were ranked among all LRTnew scores in dbNSFP. The rankscore is the ratio of the rank over the total number of the scores in dbNSFP. The scores range from 0.00166 to 0.85682
LRT_pred
LRT prediction, D(eleterious), N(eutral) or U(nknown), which is not solely determined by the score
LRT_score
The original LRT two-sided p-value (LRTori), ranges from 0 to 1
MutationAssessor_pred
MutationAssessor's functional impact of a variant
MutationAssessor_rankscore
MAori scores were ranked among all MAori scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of MAori scores in dbNSFP. The scores range from 0 to 1
MutationAssessor_score
MutationAssessor functional impact combined score (MAori)
MutationTaster_converted_rankscore
The MTori scores were first converted: if the prediction is "A" or "D" MTnew=MTori; if the prediction is "N" or "P", MTnew=1-MTori. Then MTnew scores were ranked among all MTnew scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of MTnew scores in dbNSFP. The scores range from 0.0931 to 0.80722
MutationTaster_pred
MutationTaster prediction
MutationTaster_score
MutationTaster p-value (MTori), ranges from 0 to 1
phastCons46way_placental
phastCons conservation score based on the multiple alignments of 33 placental mammal genomes (including human). The larger the score, the more conserved the site
phastCons46way_placental_rankscore
phastCons46way_placental scores were ranked among all phastCons46way_placental scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of phastCons46way_placental scores in dbNSFP
phastCons46way_primate
phastCons conservation score based on the multiple alignments of 10 primate genomes (including human). The larger the score, the more conserved the site
phastCons46way_primate_rankscore
phastCons46way_primate scores were ranked among all phastCons46way_primate scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of phastCons46way_primate scores in dbNSFP
phastCons100way_vertebrate
phastCons conservation score based on the multiple alignments of 100 vertebrate genomes (including human). The larger the score, the more conserved the site
phastCons100way_vertebrate_rankscore
phastCons100way_vertebrate scores were ranked among all phastCons100way_vertebrate scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of phastCons100way_vertebrate scores in dbNSFP
phyloP46way_placental
phyloP (phylogenetic p-values) conservation score based on the multiple alignments of 33 placental mammal genomes (including human). The larger the score, the more conserved the site
phyloP46way_placental_rankscore
phyloP46way_placental scores were ranked among all phyloP46way_placental scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of phyloP46way_placental scores in dbNSFP
phyloP46way_primate
phyloP (phylogenetic p-values) conservation score based on the multiple alignments of 10 primate genomes (including human). The larger the score, the more conserved the site
phyloP46way_primate_rankscore
phyloP46way_primate scores were ranked among all phyloP46way_primate scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of phyloP46way_primate scores in dbNSFP
phyloP100way_vertebrate
phyloP (phylogenetic p-values) conservation score based on the multiple alignments of 100 vertebrate genomes (including human). The larger the score, the more conserved the site
phyloP100way_vertebrate_rankscore
phyloP100way_vertebrate scores were ranked among all phyloP100way_vertebrate scores in dbNSFP. The rankscore is the ratio of the rank of the score over the total number of phyloP100way_vertebrate scores in dbNSFP
Polyphen2_HDIV_pred
Polyphen2 prediction based on HumDiv
Polyphen2_HDIV_rankscore
Polyphen2 HDIV scores were first ranked among all HDIV scores in dbNSFP. The rankscore is the ratio of the rank the score over the total number of the scores in dbNSFP. If there are multiple scores, only the most damaging (largest) rankscore is presented. The scores range from 0.02656 to 0.89917
Polyphen2_HDIV_score
Polyphen2 score based on HumDiv, i.e. hdiv_prob. The score ranges from 0 to 1. Multiple entries separated by ";"
Polyphen2_HVAR_pred
Polyphen2 prediction based on HumVar
Polyphen2_HVAR_rankscore
Polyphen2 HVAR scores were first ranked among all HVAR scores in dbNSFP. The rankscore is the ratio of the rank the score over the total number of the scores in dbNSFP. If there are multiple scores, only the most damaging (largest) rankscore is presented. The scores range from 0.01281 to 0.9711
Polyphen2_HVAR_score
Polyphen2 score based on HumVar, i.e. hvar_prob. The score ranges from 0 to 1. Multiple entries separated by ";"
pos(1-coor)
Physical position on the chromosome as to hg19 (1-based coordinate)
RadialSVM_pred
Prediction of our SVM based ensemble prediction score, "T(olerated)" or "D(amaging)". The score cutoff between "D" and "T" is 0. The rankscore cutoff between "D" and "T" is 0.83357
RadialSVM_rankscore
RadialSVM scores were ranked among all RadialSVM scores in dbNSFP. The rankscore is the ratio of the rank of the screo over the total number of RadialSVM scores in dbNSFP. The scores range from 0 to 1
RadialSVM_score
Our support vector machine (SVM) based ensemble prediction score, which incorporated 10 scores (SIFT, PolyPhen-2 HDIV, PolyPhen-2 HVAR, GERP++, MutationTaster, Mutation Assessor, FATHMM, LRT, SiPhy, PhyloP) and the maximum frequency observed in the 1000 genomes populations. Larger value means the SNV is more likely to be damaging. Scores range from -2 to 3 in dbNSFP
ref
Reference nucleotide allele (as on the + strand)
refcodon
Reference codon
Reliability_index
Number of observed component scores (except the maximum frequency in the 1000 genomes populations) for RadialSVM and LR. Ranges from 1 to 10. As RadialSVM and LR scores are calculated based on imputed data, the less missing component scores, the higher the reliability of the scores and predictions
SIFT_converted_rankscore
SIFTori scores were first converted to SIFTnew=1-SIFTori, then ranked among all SIFTnew scores in dbNSFP. The rankscore is the ratio of the rank the SIFTnew score over the total number of SIFTnew scores in dbNSFP. If there are multiple scores, only the most damaging (largest) rankscore is presented. The rankscores range from 0.02654 to 0.87932
SIFT_pred
If SIFTori is smaller than 0.05 (rankscore>0.55) the corresponding non-synonymous SNP is predicted as "D(amaging)"; otherwise it is predicted as "T(olerated)". Multiple predictions separated by ";"
SIFT_score
SIFT score (SIFTori). Scores range from 0 to 1. The smaller the score the more likely the SNP has damaging effect. Multiple scores separated by ";"
SiPhy_29way_logOdds
SiPhy score based on 29 mammals genomes. The larger the score, the more conserved the site
SiPhy_29way_pi
The estimated stationary distribution of A, C, G and T at the site, using SiPhy algorithm based on 29 mammals genomes
SLR_test_statistic
SLR test statistic for testing natural selection on codons. A negative value indicates negative selection, and a positive value indicates positive selection. Larger magnitude of the value suggests stronger evidence
Uniprot_aapos
Amino acid position as to Uniprot. Multiple entries separated by ";"
Uniprot_acc
Uniprot accession number. Multiple entries separated by ";"
Uniprot_id
Uniprot ID number. Multiple entries separated by ";"
UniSNP_ids
rs numbers from UniSNP, which is a cleaned version of dbSNP build 129, in format: rs number1;rs number2;...
The procedure for preparing the dbNSFP data for use in SnpSift dbnsfp and a couple of prebuilt dbNSFP databases are available at:
http://snpeff.sourceforge.net/SnpSift.html#dbNSFP
**Uploading Your Own Annotations for any Genome**
The website for dbNSFP databases releases is:
https://sites.google.com/site/jpopgen/dbNSFP
But there is only annotation for human hg18, hg19, and hg38 genome builds.
However, any dbNSFP-like tabular file that be can used with SnpSift dbnsfp if it has:
- The first line of the file must be column headers that name the annotations.
- The first 4 columns are required and must be:
1. #chr - chromosome
2. pos(1-coor) - position in chromosome
3. ref - reference base
4. alt - alternate base
For example:
::
#chr pos(1-coor) ref alt aaref aaalt genename SIFT_score
4 100239319 T A H L ADH1B 0
4 100239319 T C H R ADH1B 0.15
4 100239319 T G H P ADH1B 0
The custom galaxy datatypes for dbNSFP can automatically convert the specially formatted tabular file for use by SnpSift dbNSFP:
1. Upload the tabular file, set the datatype as: **"dbnsfp.tabular"**
2. Edit the history dataset attributes (pencil icon): Use "Convert Format" to convert the **"dbnsfp.tabular"** to the correct format for SnpSift dbnsfp: **"snpsiftdbnsfp"**.
The procedure for preparing the dbNSFP data for use in SnpSift dbnsfp is in the SnpSift documentation.
@EXTERNAL_DOCUMENTATION@
http://snpeff.sourceforge.net/SnpSift.html#dbNSFP
]]>
DOI: 10.1002/humu.21517DOI: 10.1002/humu.22376DOI: 10.1002/humu.22932doi: 10.1093/hmg/ddu733doi: 10.1093/nar/gku1206doi: 10.3389/fgene.2012.00035