Mercurial > repos > luis > ball
diff galaxy_stubs/FingerprintSimilarityClustering.xml @ 2:605370bc1def draft default tip
Uploaded
author | luis |
---|---|
date | Tue, 12 Jul 2016 12:33:33 -0400 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/galaxy_stubs/FingerprintSimilarityClustering.xml Tue Jul 12 12:33:33 2016 -0400 @@ -0,0 +1,117 @@ +<?xml version='1.0' encoding='UTF-8'?> +<!--This is a configuration file for the integration of a tools into Galaxy (https://galaxyproject.org/). This file was automatically generated using CTD2Galaxy.--> +<!--Proposed Tool Section: [Chemoinformatics]--> +<tool id="FingerprintSimilarityClustering" name="FingerprintSimilarityClustering" version="1.1.0"> + <description>fast clustering of compounds using 2D binary fingerprints</description> + <macros> + <token name="@EXECUTABLE@">FingerprintSimilarityClustering</token> + <import>macros.xml</import> + </macros> + <expand macro="stdio"/> + <expand macro="requirements"/> + <command>FingerprintSimilarityClustering + +#if $param_t: + -t $param_t +#end if +#if $param_f: + -f $param_f +#end if +#if $param_fp_col: + -fp_col $param_fp_col +#end if +#if $param_id_col: + -id_col $param_id_col +#end if +#if $param_fp_tag: + -fp_tag "$param_fp_tag" +#end if +#if $param_id_tag: + -id_tag "$param_id_tag" +#end if +#if $param_tc: + -tc $param_tc +#end if +#if $param_cc: + -cc $param_cc +#end if +#if $param_l: + -l $param_l +#end if +#if $param_nt: + -nt "$param_nt" +#end if +#if $param_sdf_out: + -sdf_out $param_sdf_out +#end if +</command> + <inputs> + <param name="param_t" type="data" format="smi.gz,csv,sdf.gz,sdf,txt.gz,smi,txt,csv.gz" optional="False" value="<class 'CTDopts.CTDopts._Null'>" label="Target library input file" help="(-t) "/> + <param name="param_f" type="integer" min="1" max="2" optional="False" value="0" label="Fingerprint format [1 = binary bitstring, 2 = comma separated feature list]" help="(-f) "/> + <param name="param_fp_col" type="integer" value="-1" label="Column number for comma separated smiles input which contains the fingerprint" help="(-fp_col) "/> + <param name="param_id_col" type="integer" value="-1" label="Column number for comma separated smiles input which contains the molecule identifie" help="(-id_col) "/> + <param name="param_fp_tag" type="text" size="30" value=" " label="Tag name for SDF input which contains the fingerprint" help="(-fp_tag) "> + <sanitizer> + <valid initial="string.printable"> + <remove value="'"/> + <remove value="""/> + </valid> + </sanitizer> + </param> + <param name="param_id_tag" type="text" size="30" value=" " label="Tag name for SDF input which contains the molecule identifie" help="(-id_tag) "> + <sanitizer> + <valid initial="string.printable"> + <remove value="'"/> + <remove value="""/> + </valid> + </sanitizer> + </param> + <param name="param_tc" type="float" value="0.7" label="Tanimoto cutoff [default: 0.7]" help="(-tc) "/> + <param name="param_cc" type="integer" value="1000" label="Clustering size cutoff [default: 1000]" help="(-cc) "/> + <param name="param_l" type="integer" value="0" label="Number of fingerprints to read" help="(-l) "/> + <param name="param_nt" type="text" size="30" value="1" label="Number of parallel threads to use" help="(-nt) To use all possible threads enter <max> [default: 1]"> + <sanitizer> + <valid initial="string.printable"> + <remove value="'"/> + <remove value="""/> + </valid> + </sanitizer> + </param> + <param name="param_sdf_out" type="integer" min="0" max="1" optional="True" value="0" label="If input file has SD format, this flag activates writing of clustering information as new tags in a copy of the input SD file" help="(-sdf_out) "/> + </inputs> + <expand macro="advanced_options"/> + <outputs> + <data name="param_stdout" format="text" label="Output from stdout"/> + </outputs> + <help>This tool performs a fast and deterministic semi-hierarchical clustering of input compounds encoded as 2D binary fingerprints. + +The method is a multistep workflow which first reduces the number of input fingerprints by removing duplicates. This unique set is forwarded to connected +components decomposition by calculating all pairwise Tanimoto similarities and application of a similarity cutoff value. As a third step, all connected components +which exceed a predefined size are hierarchically clustered using the average linkage clustering criterion. The Kelley method is applied on every hierarchical clustering +to determine a level for cluster selection. Finally, the fingerprint duplicates are remapped onto the final clusters which contain their representatives. + +For every final cluster a medoid is calulated. For a single cluster multiple medoids are possible because fingerprint duplicates of a medoid are also marked as medoid. + +For every compound the output yields a cluster ID, a medoid tag where '1' indicates the cluster medoid(s) and the average similarity of the compound to all other +cluster members. If the output format is SD, these properties are added as new tags. + +====================================================================================================================================================== + +Examples: + +$ FingerprintSimilarityClustering -t target.sdf -fp_tag FPRINT -f 1 -id_tag NAME + tries to read fingerprints as binary bitstrings (-f 1) from tag <FPRINT> and compound IDs from tag <NAME> of target.sdf input file. + The clustering workflow described is executed on the input molecules with default values. + +$ FingerprintSimilarityClustering -t target.csv -fp_col 3 -f 2 -id_col 1 + tries to read fingerprints as comma separated integer feature list (-f 2) from column 3 and IDs from column 1 out of a space separated CSV file. + The clustering workflow described is executed on the input molecules with default values. + +$ FingerprintSimilarityClustering -t target.sdf -fp_tag FPRINT -f 1 -id_tag NAME -nt max + Same as first example but executed in parallel mode using as many threads as available. + +$ FingerprintSimilarityClustering -t target.sdf -fp_tag FPRINT -f 1 -id_tag NAME -tc 0.5 -cc 50 + Same as first example but using modified parameters for similarity network generation (tc 0.5) and size of connected components to be clustered (-cc 50). + +</help> +</tool>