comparison kmersvm/train.xml @ 7:fd740d515502 draft default tip

Uploaded revised kmer-SVM to include modules from kmer-visual.
author cafletezbrant
date Sun, 16 Jun 2013 18:06:14 -0400
parents 7fe1103032f7
children
comparison
equal deleted inserted replaced
6:1aea7c1a9ab1 7:fd740d515502
45 </param> 45 </param>
46 <when value="custom"> 46 <when value="custom">
47 <param name="weight" type="float" value="1" label="Input The Value of Positive Set Weight" /> 47 <param name="weight" type="float" value="1" label="Input The Value of Positive Set Weight" />
48 </when> 48 </when>
49 </conditional> 49 </conditional>
50 <param name="SVMC" type="integer" value="1" label="Regularization Param C" /> 50 <param name="SVMC" type="float" value="1" label="Regularization Param C" >
51 <param name="EPS" type="float" value="0.00001" label="Precision Param E" /> 51 <validator type="in_range" message="SVMC must be in range 1 - 10" min="0.01" max="1" />
52 </param>
53 <param name="EPS" type="float" value="0.00001" label="Precision Param E" >
54 <validator type="in_range" message="EPS must be in range 1e-1 to 1e-5" min="0.00001" max="0.1" />
55 </param>
52 </inputs> 56 </inputs>
53 <outputs> 57 <outputs>
54 <data format="tabular" name="SVM_weights" from_work_dir="kmersvm_output_weights.out" label="${tool.name} on ${on_string} : Weights" /> 58 <data format="tabular" name="SVM_weights" from_work_dir="kmersvm_output_weights.out" label="${tool.name} on ${on_string} : Weights" />
55 <data format="tabular" name="CV_predictions" from_work_dir="kmersvm_output_cvpred.out" label="${tool.name} on ${on_string} : Predictions" /> 59 <data format="tabular" name="CV_predictions" from_work_dir="kmersvm_output_cvpred.out" label="${tool.name} on ${on_string} : Predictions" />
56 </outputs> 60 </outputs>
77 81
78 **What it does** 82 **What it does**
79 83
80 Takes as input 2 FASTA files, 1 of positive sequences and 1 of negative sequences. Produces 2 outputs: 84 Takes as input 2 FASTA files, 1 of positive sequences and 1 of negative sequences. Produces 2 outputs:
81 85
82 A) Weights: list of sequences of length K ranked by score and posterior probability for that score. 86 A) Weights: list of sequences of length K ranked by score.
83 87
84 B) Predictions: results of N-fold cross validation 88 B) Predictions: results of N-fold cross validation.
85 89
90 ----
91
92 **Recommended Settings**
93
94 Kernel: Spectrum
95
96 Kmer length: 6
97
98 N-Fold Cross-Validation: 5
99
100 Weight: We recommend letting the Positive Set Weight be selected automatically, unless it has been separately optimized.
101
102 Regularization Parameter C: We recommend values between 0.1 and 1.
103
104 Precision Parameter E: We recommend using the default and staying below 0.1.
105
86 ---- 106 ----
87 107
88 **Parameters** 108 **Parameters**
89 109
90 Kernel: 2 choices: 110 Kernel: 2 choices:
91 111
92 A) Spectrum Kernel: Analyzes a sequence using strings of length K. 112 A) Spectrum Kernel: Analyzes a sequence using strings of length K.
93 113
94 B) Weighted Spectrum Kernel: Analyzes a sequence using strings of range of lengths K1 - Kn. 114 B) Weighted Spectrum Kernel: Analyzes a sequence using strings of range of lengths K_min - K_max.
95 115
116
96 N-Fold Cross Validation: Number of partitions of training data used for cross validation. 117 N-Fold Cross Validation: Number of partitions of training data used for cross validation.
97 118
98 Weight: Increases importance of positive data (increase if positive sets are very trustworthy or for training with very large negative sequence sets). 119 Weight: Increases importance of positive data (increase if positive sets are very trustworthy or for training with very large negative sequence sets).
99 120
100 Regularization Parameter: Penalty for misclassification. Trade-off is overfitting (high parameter) versus high error rate (low parameter). 121 Regularization Parameter: Penalty for misclassification. Trade-off is overfitting (high parameter) versus high error rate (low parameter).
101 122
102 Precision Parameter: Insensitivity zone. Affects precision of SVM by altering number of support vectors used. 123 Precision Parameter: Insensitivity zone. Affects precision of SVM by altering number of support vectors used.
103 124
104 ---- 125 ----
105 126
106 **Example** 127 **Example**
107 128
108 Weights file:: 129 Weights file::