Mercurial > repos > jjohnson > mothur_toolsuite
view mothur/README @ 35:95d75b35e4d2
Updated tools to use Mothur 1.33. Added some misc. fixes and updates (blast repository, tool fixes)
author | certain cat |
---|---|
date | Fri, 31 Oct 2014 15:09:32 -0400 |
parents | 49058b1f8d3f |
children | 040410b8167e |
line wrap: on
line source
Provides galaxy tools for the Mothur metagenomics package - http://www.mothur.org/wiki/Main_Page Mothur should be able to be auto-installed as a tool_dependency You may want to reorganize the tool panel after installing See below: Reorganize integrated_tool_panel.xml This was based on: http://www.mothur.org/wiki/Mothur_manual (The environment variable MOTHUR_MAX_PROCESSORS can be used to limit the number of cpu processors used for mothur commands) This will be set in: tool_dependencies/mothur/1.27/jjohnson/mothur_toolsuite/*/env.sh Requirements for auto installation: - make (sudo-apt get install make) - g++ (sudo apt-get install g++) - gfortran (sudo apt-get install gfortran) - pip (sudo apt-get install python-pip) - simplejson (pip install simplejson) Repository Dependency: - BLAST Legacy ver. 2.2.26 (ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/LATEST/) - The repository name should be package_blast_2_2_26 so it matches with the tool dependency. Manual installation for Mothur: Install mothur v.1.33 on your galaxy system so galaxy can execute the mothur command ( This version of wrappers is designed for Mothur version 1.33- it may work on later versions ) http://www.mothur.org/wiki/Download_mothur http://www.mothur.org/wiki/Installation ( This Galaxy Mothur wrapper will invoke Mothur in command line mode: http://www.mothur.org/wiki/Command_line_mode ) TreeVector is also packaged with this Mothur package to view phylogenetic trees: TreeVector is a utility to create and integrate phylogenetic trees as Scalable Vector Graphics (SVG) files. TreeVector was written by Ralph_Pethica, Department_of_Computer_Science, University_of_Bristol TreeVector: http://supfam.cs.bris.ac.uk/TreeVector/about.html Install in galaxy: tool-data/shared/jars/TreeVector.jar Install reference data from silva and greengenes RDP reference file (modified for mothur): http://www.mothur.org/wiki/RDP_reference_files - 16S rRNA reference (RDP): A collection of 9,662 bacterial and 384 archaeal 16S rRNA gene sequences with an improved taxonomy compared to version 6. http://www.mothur.org/w/images/2/29/Trainset7_112011.rdp.zip - 16S rRNA reference (PDS): The RDP reference with three sequences reversed and 119 mitochondrial 16S rRNA gene sequences added as members of the Rickettsiales http://www.mothur.org/w/images/4/4a/Trainset7_112011.pds.zip - 28S rRNA reference (RDP): A collection of 8506 reference 28S rRNA gene sequences from the Fungi that were curated by the Kuske lab http://www.mothur.org/w/images/3/36/FungiLSU_train_v7.zip Silva reference: http://www.mothur.org/wiki/Silva_reference_files - Bacterial references (14,956 sequences) http://www.mothur.org/w/images/9/98/Silva.bacteria.zip - Archaeal references (2,297 sequences) http://www.mothur.org/w/images/3/3c/Silva.archaea.zip - Eukaryotic references (1,238 sequences) http://www.mothur.org/w/images/1/1a/Silva.eukarya.zip - Silva-based alignment of template file for chimera.slayer (5,181 sequences) http://www.mothur.org/w/images/f/f1/Silva.gold.bacteria.zip Alignment database rRNA gene sequences: http://www.mothur.org/wiki/Alignment_database - greengenes reference alignment http://www.mothur.org/w/images/7/72/Greengenes.alignment.zip - SILVA (Silva reference) http://www.mothur.org/w/images/f/f1/Silva.gold.bacteria.zip Secondary structure mapping files: http://www.mothur.org/wiki/Secondary_structure_map http://www.mothur.org/w/images/6/6d/Silva_ss_map.zip http://www.mothur.org/w/images/4/4b/Gg_ss_map.zip Lane masks: http://www.mothur.org/wiki/Lane_mask greengenes-compatible mask: - lane1241.gg.filter - A Lane Masks that comes with the greengenes arb database http://www.mothur.org/w/images/2/2a/Lane1241.gg.filter - lane1287.gg.filter - A Lane Masks that comes with the greengenes arb database http://www.mothur.org/w/images/a/a0/Lane1287.gg.filter - lane1349.gg.filter - Pat Schloss's transcription of the mask from the Lane paper http://www.mothur.org/w/images/3/3d/Lane1349.gg.filter SILVA-compatible mask: - lane1349.silva.filter - Pat Schloss's transcription of the mask from the Lane paper http://www.mothur.org/w/images/6/6d/Lane1349.silva.filter Lookup Files for sff flow analysis using shhh.flows: http://www.mothur.org/wiki/Alignment_database Example from UMN installation: (We also made these available in a Galaxy public data library) /project/db/galaxy/mothur/Silva.bacteria.zip /project/db/galaxy/mothur/silva.eukarya.fasta /project/db/galaxy/mothur/Greengenes.alignment.zip /project/db/galaxy/mothur/Silva.archaea.zip /project/db/galaxy/mothur/Silva_ss_map.zip /project/db/galaxy/mothur/silva.eukarya.ncbi.tax /project/db/galaxy/mothur/Silva.gold.bacteria.zip /project/db/galaxy/mothur/Silva.archaea/silva.archaea.silva.tax /project/db/galaxy/mothur/Silva.archaea/silva.archaea.gg.tax /project/db/galaxy/mothur/Silva.archaea/silva.archaea.rdp.tax /project/db/galaxy/mothur/Silva.archaea/nogap.archaea.fasta /project/db/galaxy/mothur/Silva.archaea/silva.archaea.ncbi.tax /project/db/galaxy/mothur/Silva.archaea/silva.archaea.fasta /project/db/galaxy/mothur/nogap.eukarya.fasta /project/db/galaxy/mothur/silva.eukarya.silva.tax /project/db/galaxy/mothur/silva.gold.align /project/db/galaxy/mothur/silva.ss.map /project/db/galaxy/mothur/gg.ss.map /project/db/galaxy/mothur/silva.bacteria/silva.bacteria.silva.tax /project/db/galaxy/mothur/silva.bacteria/silva.bacteria.rdp6.tax /project/db/galaxy/mothur/silva.bacteria/nogap.bacteria.fasta /project/db/galaxy/mothur/silva.bacteria/silva.bacteria.gg.tax /project/db/galaxy/mothur/silva.bacteria/silva.bacteria.ncbi.tax /project/db/galaxy/mothur/silva.bacteria/silva.bacteria.fasta /project/db/galaxy/mothur/silva.bacteria/silva.bacteria.rdp.tax /project/db/galaxy/mothur/Silva.eukarya.zip /project/db/galaxy/mothur/Gg_ss_map.zip /project/db/galaxy/mothur/core_set_aligned.imputed.fasta /project/db/galaxy/mothur/RDP/FungiLSU_train_1400bp_8506_mod.fasta /project/db/galaxy/mothur/RDP/FungiLSU_train_1400bp_8506_mod.tax /project/db/galaxy/mothur/RDP/trainset6_032010.rdp.fasta /project/db/galaxy/mothur/RDP/trainset6_032010.rdp.tax /project/db/galaxy/mothur/RDP/trainset7_112011.pds.fasta /project/db/galaxy/mothur/RDP/trainset7_112011.pds.tax /project/db/galaxy/mothur/RDP/trainset7_112011.rdp.fasta /project/db/galaxy/mothur/RDP/trainset7_112011.rdp.tax Add tool-data: (contains pointers to silva, greengenes, and RDP reference data) tool-data/mothur_aligndb.loc tool-data/mothur_map.loc tool-data/mothur_taxonomy.loc tool-data/shared/jars/TreeVector.jar ################################################################ #### If you are manually adding this to your local galaxy: #### ################################################################ add config files (*.xml) and wrapper code (*.py) from tools/mothur/* to your galaxy installation add datatype definition file: lib/galaxy/datatypes/metagenomics.py add the following import line to: lib/galaxy/datatypes/registry.py import metagenomics # added for metagenomics mothur add datatypes to: datatypes_conf.xml add mothur tools to: tool_conf.xml ############ DESIGN NOTES ######################################################################################################### Each mothur command has it's own tool_config (.xml) file, but all call the same python wrapper code: mothur_wrapper.py (The environment variable MOTHUR_MAX_PROCESSORS can be used to limit the number of cpu processors used be mothur commands) * Every mothur tool will call mothur_wrapper.py script with a --cmd= parameter that gives the mothur command name. * Every tool will produce the logfile of the mothur run as an output. * When the outputs of a mothur command could be determined in advance, they are included in the --result= parameter to mothur_wrapper.py * When the number of outputs cannot be determined in advance, the name patterns and datatypes of the ouputs are included in the --new_datasets parameter to mothur_wrapper.py Here is an example call to the mothur_wrapper.py script with an explanation before each param : mothur_wrapper.py # name of a mothur command, this is required --cmd='summary.shared' # Galaxy output dataset list, these are output files that can be determined before the command is run # The items in the list are separated by commas # Each item contains a regex to match the output filename and a galaxy dataset filepath in which to copy the data (separated by :) --result='^mothur.\S+\.logfile$:'/home/galaxy/data/database/files/002/dataset_2613.dat,'^\S+\.summary$:'/home/galaxy/data/database/files/002/dataset_2614.dat # Galaxy output dataset extra_files_path direcotry in which to put all output files (usually the logfile extra_file path) --outputdir='/home/galaxy/data/database/files/002/dataset_2613_files' # The id of one of the galaxy outputs (e.g. the mothur logfile) used for dynamic dataset generation (when number of outputs not known in advance) # see: ttp://bitbucket.org/galaxy/galaxy-central/wiki/ToolsMultipleOutput --datasetid='2578' # The galaxy directory in which to copy all output files for dynamic dataset generation (special galaxy tool param: $__new_file_path__) --new_file_path='$__new_file_path__' # specifies files to copy to the new_file_path # The list is separated by commas # Each item conatins: a regex pattern for matching filenames and a galaxy datatype (separated by :) # The regex match.groups()[0] is used as the id name of the dataset, and must result in unique name for each output --new_datasets='^\S+?\.((\S+)\.(unique|[0-9.]*)\.dist)$:lower.dist' ## ## NOTE: The "read" commands were eliminated with Mothur version 1.18 ##