diff kmersvm/train.xml @ 7:fd740d515502 draft default tip

Uploaded revised kmer-SVM to include modules from kmer-visual.
author cafletezbrant
date Sun, 16 Jun 2013 18:06:14 -0400
parents 7fe1103032f7
children
line wrap: on
line diff
--- a/kmersvm/train.xml	Mon Aug 20 21:42:29 2012 -0400
+++ b/kmersvm/train.xml	Sun Jun 16 18:06:14 2013 -0400
@@ -47,8 +47,12 @@
     		<param name="weight" type="float" value="1" label="Input The Value of Positive Set Weight" />   
 		</when>
     </conditional>
-    <param name="SVMC" type="integer" value="1" label="Regularization Param C" />
-    <param name="EPS" type="float" value="0.00001" label="Precision Param E" />
+    <param name="SVMC" type="float" value="1" label="Regularization Param C" >
+	<validator type="in_range" message="SVMC must be in range 1 - 10" min="0.01" max="1" />
+    </param>
+    <param name="EPS" type="float" value="0.00001" label="Precision Param E" >
+	<validator type="in_range" message="EPS must be in range 1e-1 to 1e-5" min="0.00001" max="0.1" />
+    </param>
   </inputs>
   <outputs>
     <data format="tabular" name="SVM_weights" from_work_dir="kmersvm_output_weights.out" label="${tool.name} on ${on_string} : Weights" />
@@ -79,11 +83,27 @@
   
 Takes as input 2 FASTA files, 1 of positive sequences and 1 of negative sequences.  Produces 2 outputs: 
   
-  A) Weights: list of sequences of length K ranked by score and posterior probability for that score.
+  A) Weights: list of sequences of length K ranked by score.
   	
-  B) Predictions: results of N-fold cross validation
+  B) Predictions: results of N-fold cross validation.
   
 ----
+
+**Recommended Settings**
+
+Kernel: Spectrum
+
+Kmer length: 6
+
+N-Fold Cross-Validation: 5
+
+Weight: We recommend letting the Positive Set Weight be selected automatically, unless it has been separately optimized.
+
+Regularization Parameter C: We recommend values between 0.1 and 1.
+
+Precision Parameter E: We recommend using the default and staying below 0.1.
+
+----
   
 **Parameters**
   
@@ -91,8 +111,9 @@
   
   A) Spectrum Kernel: Analyzes a sequence using strings of length K.
   	
-  B) Weighted Spectrum Kernel: Analyzes a sequence using strings of range of lengths K1 - Kn.
-  	
+  B) Weighted Spectrum Kernel: Analyzes a sequence using strings of range of lengths K_min - K_max.
+
+	
 N-Fold Cross Validation: Number of partitions of training data used for cross validation.
   
 Weight: Increases importance of positive data (increase if positive sets are very trustworthy or for training with very large negative sequence sets).
@@ -100,7 +121,7 @@
 Regularization Parameter: Penalty for misclassification.  Trade-off is overfitting (high parameter) versus high error rate (low parameter).
   
 Precision Parameter:  Insensitivity zone.  Affects precision of SVM by altering number of support vectors used.
-  
+
 ----
   
 **Example**