Mercurial > repos > cafletezbrant > kmersvm
comparison kmersvm/train.xml @ 7:fd740d515502 draft default tip
Uploaded revised kmer-SVM to include modules from kmer-visual.
author | cafletezbrant |
---|---|
date | Sun, 16 Jun 2013 18:06:14 -0400 |
parents | 7fe1103032f7 |
children |
comparison
equal
deleted
inserted
replaced
6:1aea7c1a9ab1 | 7:fd740d515502 |
---|---|
45 </param> | 45 </param> |
46 <when value="custom"> | 46 <when value="custom"> |
47 <param name="weight" type="float" value="1" label="Input The Value of Positive Set Weight" /> | 47 <param name="weight" type="float" value="1" label="Input The Value of Positive Set Weight" /> |
48 </when> | 48 </when> |
49 </conditional> | 49 </conditional> |
50 <param name="SVMC" type="integer" value="1" label="Regularization Param C" /> | 50 <param name="SVMC" type="float" value="1" label="Regularization Param C" > |
51 <param name="EPS" type="float" value="0.00001" label="Precision Param E" /> | 51 <validator type="in_range" message="SVMC must be in range 1 - 10" min="0.01" max="1" /> |
52 </param> | |
53 <param name="EPS" type="float" value="0.00001" label="Precision Param E" > | |
54 <validator type="in_range" message="EPS must be in range 1e-1 to 1e-5" min="0.00001" max="0.1" /> | |
55 </param> | |
52 </inputs> | 56 </inputs> |
53 <outputs> | 57 <outputs> |
54 <data format="tabular" name="SVM_weights" from_work_dir="kmersvm_output_weights.out" label="${tool.name} on ${on_string} : Weights" /> | 58 <data format="tabular" name="SVM_weights" from_work_dir="kmersvm_output_weights.out" label="${tool.name} on ${on_string} : Weights" /> |
55 <data format="tabular" name="CV_predictions" from_work_dir="kmersvm_output_cvpred.out" label="${tool.name} on ${on_string} : Predictions" /> | 59 <data format="tabular" name="CV_predictions" from_work_dir="kmersvm_output_cvpred.out" label="${tool.name} on ${on_string} : Predictions" /> |
56 </outputs> | 60 </outputs> |
77 | 81 |
78 **What it does** | 82 **What it does** |
79 | 83 |
80 Takes as input 2 FASTA files, 1 of positive sequences and 1 of negative sequences. Produces 2 outputs: | 84 Takes as input 2 FASTA files, 1 of positive sequences and 1 of negative sequences. Produces 2 outputs: |
81 | 85 |
82 A) Weights: list of sequences of length K ranked by score and posterior probability for that score. | 86 A) Weights: list of sequences of length K ranked by score. |
83 | 87 |
84 B) Predictions: results of N-fold cross validation | 88 B) Predictions: results of N-fold cross validation. |
85 | 89 |
90 ---- | |
91 | |
92 **Recommended Settings** | |
93 | |
94 Kernel: Spectrum | |
95 | |
96 Kmer length: 6 | |
97 | |
98 N-Fold Cross-Validation: 5 | |
99 | |
100 Weight: We recommend letting the Positive Set Weight be selected automatically, unless it has been separately optimized. | |
101 | |
102 Regularization Parameter C: We recommend values between 0.1 and 1. | |
103 | |
104 Precision Parameter E: We recommend using the default and staying below 0.1. | |
105 | |
86 ---- | 106 ---- |
87 | 107 |
88 **Parameters** | 108 **Parameters** |
89 | 109 |
90 Kernel: 2 choices: | 110 Kernel: 2 choices: |
91 | 111 |
92 A) Spectrum Kernel: Analyzes a sequence using strings of length K. | 112 A) Spectrum Kernel: Analyzes a sequence using strings of length K. |
93 | 113 |
94 B) Weighted Spectrum Kernel: Analyzes a sequence using strings of range of lengths K1 - Kn. | 114 B) Weighted Spectrum Kernel: Analyzes a sequence using strings of range of lengths K_min - K_max. |
95 | 115 |
116 | |
96 N-Fold Cross Validation: Number of partitions of training data used for cross validation. | 117 N-Fold Cross Validation: Number of partitions of training data used for cross validation. |
97 | 118 |
98 Weight: Increases importance of positive data (increase if positive sets are very trustworthy or for training with very large negative sequence sets). | 119 Weight: Increases importance of positive data (increase if positive sets are very trustworthy or for training with very large negative sequence sets). |
99 | 120 |
100 Regularization Parameter: Penalty for misclassification. Trade-off is overfitting (high parameter) versus high error rate (low parameter). | 121 Regularization Parameter: Penalty for misclassification. Trade-off is overfitting (high parameter) versus high error rate (low parameter). |
101 | 122 |
102 Precision Parameter: Insensitivity zone. Affects precision of SVM by altering number of support vectors used. | 123 Precision Parameter: Insensitivity zone. Affects precision of SVM by altering number of support vectors used. |
103 | 124 |
104 ---- | 125 ---- |
105 | 126 |
106 **Example** | 127 **Example** |
107 | 128 |
108 Weights file:: | 129 Weights file:: |