annotate text_to_wordmatrix.xml @ 0:dd696b179eb7 draft

"planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
author dlalgroup
date Thu, 24 Sep 2020 02:58:53 +0000
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
1 <tool id="text_to_wordmatrix" name="text_to_wordmatrix" version="@VERSION@">
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
2 <description>Extract most frequent words from each row in a table and generate binary word matrix</description>
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
3 <macros>
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
4 <import>macros.xml</import>
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
5 </macros>
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
6 <expand macro="requirements">
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
7 <requirement type="package" version="2.0.1">r-argparse</requirement>
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
8 <requirement type="package" version="0.7.0">r-snowballc</requirement>
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
9 <requirement type="package" version="0.3.6">r-pubmedwordcloud</requirement>
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
10 <requirement type="package" version="1.2.0">r-semnetcleaner</requirement>
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
11 <requirement type="package" version="0.9.3">r-textclean</requirement>
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
12 <requirement type="package" version="1.4.3">r-stringi</requirement>
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
13 <requirement type="package" version="1.4.0">r-stringr</requirement>
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
14 </expand>
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
15 <expand macro="stdio"/>
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
16 <command><![CDATA[Rscript
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
17 '${__tool_directory__}/text_to_wordmatrix.R'
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
18 --input '$input'
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
19 --output '$output'
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
20 --number '$number'
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
21 $remove_num
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
22 $lower_case
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
23 $remove_stopwords
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
24 $stemDoc
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
25 $plurals
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
26 ]]>
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
27 </command>
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
28 <inputs>
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
29 <param argument="--input" label="Input file" name="input" optional="false" type="data" format="tabular" help="input"/>
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
30 <param argument="--number" label="Number of most frequent words that should be extracted per row" name="number" optional="false" type="integer" help="number" value="50" min="1" max="500"/>
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
31 <param argument="--remove_num" label="Remove any numbers in text" name="remove_num" type="boolean" truevalue="--remove_num" falsevalue="" help="remove_num" checked="false"/>
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
32 <param argument="--lower_case" label="Translate all characters are to lower case." name="lower_case" type="boolean" truevalue="" falsevalue="--lower_case" help="lower_case" checked="true"/>
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
33 <param argument="--remove_stopwords" label="Remove english stopwords (e.g., 'the' or 'not')." name="remove_stopwords" type="boolean" truevalue="" falsevalue="--remove_stopwords" help="remove_stopwords" checked="true"/>
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
34 <param argument="--stemDoc" label="Apply Porter's stemming algorithm: collapsing words to a common root to aid comparison of vocabulary." name="stemDoc" type="boolean" truevalue="--stemDoc" falsevalue="" help="stemDoc" checked="false"/>
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
35 <param argument="--plurals" label="Transform words in plural to their singular form." name="plurals" type="boolean" truevalue="" falsevalue="--plurals" help="plurals" checked="true"/>
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
36 </inputs>
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
37 <outputs>
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
38 <data format="tabular" name="output" />
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
39 </outputs>
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
40 <tests>
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
41 <test>
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
42 <param name="input" value="pubmed_by_queries_output_abstracts" ftype="tabular"/>
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
43 <output name="output" value="text_to_wordmatrix_output"/>
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
44 </test>
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
45 </tests>
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
46 <help><![CDATA[
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
47
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
48 The tool extracts for each row the most frequent words from the text in columns starting with "ABSTRACT" or "TEXT. The extracted words from each row are united in one large binary matrix, with 0= word not frequently occurring in text of that row and 1= word frequently present in text of that row.
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
49
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
50 Input:
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
51 The output of "pubmed_by_queries" or "abstracts_by_pmids" tools, or a table with text in columns starting with "ABSTRACT" or "TEXT".
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
52 Output:
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
53 A binary matrix in that each column represents one of the extracted words.
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
54
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
55 ]]></help>
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
56 <expand macro="citations"/>
dd696b179eb7 "planemo upload for repository https://github.com/dlal-group/simtext commit fd3f5b7b0506fbc460f2a281f694cb57f1c90a3c-dirty"
dlalgroup
parents:
diff changeset
57 </tool>