Mercurial > repos > cafletezbrant > kmersvm
diff kmersvm/train.xml @ 7:fd740d515502 draft default tip
Uploaded revised kmer-SVM to include modules from kmer-visual.
author | cafletezbrant |
---|---|
date | Sun, 16 Jun 2013 18:06:14 -0400 |
parents | 7fe1103032f7 |
children |
line wrap: on
line diff
--- a/kmersvm/train.xml Mon Aug 20 21:42:29 2012 -0400 +++ b/kmersvm/train.xml Sun Jun 16 18:06:14 2013 -0400 @@ -47,8 +47,12 @@ <param name="weight" type="float" value="1" label="Input The Value of Positive Set Weight" /> </when> </conditional> - <param name="SVMC" type="integer" value="1" label="Regularization Param C" /> - <param name="EPS" type="float" value="0.00001" label="Precision Param E" /> + <param name="SVMC" type="float" value="1" label="Regularization Param C" > + <validator type="in_range" message="SVMC must be in range 1 - 10" min="0.01" max="1" /> + </param> + <param name="EPS" type="float" value="0.00001" label="Precision Param E" > + <validator type="in_range" message="EPS must be in range 1e-1 to 1e-5" min="0.00001" max="0.1" /> + </param> </inputs> <outputs> <data format="tabular" name="SVM_weights" from_work_dir="kmersvm_output_weights.out" label="${tool.name} on ${on_string} : Weights" /> @@ -79,11 +83,27 @@ Takes as input 2 FASTA files, 1 of positive sequences and 1 of negative sequences. Produces 2 outputs: - A) Weights: list of sequences of length K ranked by score and posterior probability for that score. + A) Weights: list of sequences of length K ranked by score. - B) Predictions: results of N-fold cross validation + B) Predictions: results of N-fold cross validation. ---- + +**Recommended Settings** + +Kernel: Spectrum + +Kmer length: 6 + +N-Fold Cross-Validation: 5 + +Weight: We recommend letting the Positive Set Weight be selected automatically, unless it has been separately optimized. + +Regularization Parameter C: We recommend values between 0.1 and 1. + +Precision Parameter E: We recommend using the default and staying below 0.1. + +---- **Parameters** @@ -91,8 +111,9 @@ A) Spectrum Kernel: Analyzes a sequence using strings of length K. - B) Weighted Spectrum Kernel: Analyzes a sequence using strings of range of lengths K1 - Kn. - + B) Weighted Spectrum Kernel: Analyzes a sequence using strings of range of lengths K_min - K_max. + + N-Fold Cross Validation: Number of partitions of training data used for cross validation. Weight: Increases importance of positive data (increase if positive sets are very trustworthy or for training with very large negative sequence sets). @@ -100,7 +121,7 @@ Regularization Parameter: Penalty for misclassification. Trade-off is overfitting (high parameter) versus high error rate (low parameter). Precision Parameter: Insensitivity zone. Affects precision of SVM by altering number of support vectors used. - + ---- **Example**