view mothur/README @ 35:95d75b35e4d2

Updated tools to use Mothur 1.33. Added some misc. fixes and updates (blast repository, tool fixes)
author certain cat
date Fri, 31 Oct 2014 15:09:32 -0400
parents 49058b1f8d3f
children 040410b8167e
line wrap: on
line source

Provides galaxy tools for the Mothur metagenomics package -  http://www.mothur.org/wiki/Main_Page 


Mothur should be able to be auto-installed as a tool_dependency
  You may want to reorganize the tool panel after installing
  See below: Reorganize integrated_tool_panel.xml
  This was based on: http://www.mothur.org/wiki/Mothur_manual
  (The environment variable MOTHUR_MAX_PROCESSORS can be used to limit the number of cpu processors used for mothur commands)
   This will be set in: tool_dependencies/mothur/1.27/jjohnson/mothur_toolsuite/*/env.sh
	Requirements for auto installation:
	-	make (sudo-apt get install make)
	- g++ (sudo apt-get install g++)
	- gfortran (sudo apt-get install gfortran)
	- pip (sudo apt-get install python-pip)
	- simplejson (pip install simplejson)

	Repository Dependency: 
	- BLAST Legacy ver. 2.2.26 (ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/LATEST/)
	- The repository name should be package_blast_2_2_26 so it matches with the tool dependency.
	

Manual installation for Mothur: 
  Install mothur v.1.33 on your galaxy system so galaxy can execute the mothur command
  ( This version of wrappers is designed for Mothur version 1.33- it may work on later versions )
  http://www.mothur.org/wiki/Download_mothur
  http://www.mothur.org/wiki/Installation
  ( This Galaxy Mothur wrapper will invoke Mothur in command line mode: http://www.mothur.org/wiki/Command_line_mode )

TreeVector is also packaged with this Mothur package to view phylogenetic trees:
  TreeVector is a utility to create and integrate phylogenetic trees as Scalable Vector Graphics (SVG) files.
  TreeVector was written by Ralph_Pethica, Department_of_Computer_Science, University_of_Bristol
  TreeVector: http://supfam.cs.bris.ac.uk/TreeVector/about.html
  Install in galaxy:  tool-data/shared/jars/TreeVector.jar

Install reference data from silva and greengenes
 RDP reference file (modified for mothur):
  http://www.mothur.org/wiki/RDP_reference_files
   - 16S rRNA reference (RDP): A collection of 9,662 bacterial and 384 archaeal 16S rRNA gene sequences with an improved taxonomy compared to version 6.
     http://www.mothur.org/w/images/2/29/Trainset7_112011.rdp.zip
   - 16S rRNA reference (PDS): The RDP reference with three sequences reversed and 119 mitochondrial 16S rRNA gene sequences added as members of the Rickettsiales
     http://www.mothur.org/w/images/4/4a/Trainset7_112011.pds.zip
   - 28S rRNA reference (RDP): A collection of 8506 reference 28S rRNA gene sequences from the Fungi that were curated by the Kuske lab
     http://www.mothur.org/w/images/3/36/FungiLSU_train_v7.zip
 Silva reference:
  http://www.mothur.org/wiki/Silva_reference_files
  - Bacterial references (14,956 sequences)
    http://www.mothur.org/w/images/9/98/Silva.bacteria.zip
  - Archaeal references (2,297 sequences)
    http://www.mothur.org/w/images/3/3c/Silva.archaea.zip
  - Eukaryotic references (1,238 sequences)
    http://www.mothur.org/w/images/1/1a/Silva.eukarya.zip
  - Silva-based alignment of template file for chimera.slayer (5,181 sequences)
    http://www.mothur.org/w/images/f/f1/Silva.gold.bacteria.zip
 Alignment database rRNA gene sequences:
  http://www.mothur.org/wiki/Alignment_database
  - greengenes reference alignment
    http://www.mothur.org/w/images/7/72/Greengenes.alignment.zip
  - SILVA (Silva reference)
    http://www.mothur.org/w/images/f/f1/Silva.gold.bacteria.zip
 Secondary structure mapping files:
  http://www.mothur.org/wiki/Secondary_structure_map
    http://www.mothur.org/w/images/6/6d/Silva_ss_map.zip
    http://www.mothur.org/w/images/4/4b/Gg_ss_map.zip
 Lane masks:
  http://www.mothur.org/wiki/Lane_mask
  greengenes-compatible mask:
     - lane1241.gg.filter - A Lane Masks that comes with the greengenes arb database
       http://www.mothur.org/w/images/2/2a/Lane1241.gg.filter
     - lane1287.gg.filter - A Lane Masks that comes with the greengenes arb database
       http://www.mothur.org/w/images/a/a0/Lane1287.gg.filter
     - lane1349.gg.filter - Pat Schloss's transcription of the mask from the Lane paper
       http://www.mothur.org/w/images/3/3d/Lane1349.gg.filter
  SILVA-compatible mask:
     - lane1349.silva.filter - Pat Schloss's transcription of the mask from the Lane paper
       http://www.mothur.org/w/images/6/6d/Lane1349.silva.filter
 Lookup Files for sff flow analysis using shhh.flows:
  http://www.mothur.org/wiki/Alignment_database

 Example from UMN installation: (We also made these available in a Galaxy public data library)
    /project/db/galaxy/mothur/Silva.bacteria.zip
    /project/db/galaxy/mothur/silva.eukarya.fasta
    /project/db/galaxy/mothur/Greengenes.alignment.zip
    /project/db/galaxy/mothur/Silva.archaea.zip
    /project/db/galaxy/mothur/Silva_ss_map.zip
    /project/db/galaxy/mothur/silva.eukarya.ncbi.tax
    /project/db/galaxy/mothur/Silva.gold.bacteria.zip
    /project/db/galaxy/mothur/Silva.archaea/silva.archaea.silva.tax
    /project/db/galaxy/mothur/Silva.archaea/silva.archaea.gg.tax
    /project/db/galaxy/mothur/Silva.archaea/silva.archaea.rdp.tax
    /project/db/galaxy/mothur/Silva.archaea/nogap.archaea.fasta
    /project/db/galaxy/mothur/Silva.archaea/silva.archaea.ncbi.tax
    /project/db/galaxy/mothur/Silva.archaea/silva.archaea.fasta
    /project/db/galaxy/mothur/nogap.eukarya.fasta
    /project/db/galaxy/mothur/silva.eukarya.silva.tax
    /project/db/galaxy/mothur/silva.gold.align
    /project/db/galaxy/mothur/silva.ss.map
    /project/db/galaxy/mothur/gg.ss.map
    /project/db/galaxy/mothur/silva.bacteria/silva.bacteria.silva.tax
    /project/db/galaxy/mothur/silva.bacteria/silva.bacteria.rdp6.tax
    /project/db/galaxy/mothur/silva.bacteria/nogap.bacteria.fasta
    /project/db/galaxy/mothur/silva.bacteria/silva.bacteria.gg.tax
    /project/db/galaxy/mothur/silva.bacteria/silva.bacteria.ncbi.tax
    /project/db/galaxy/mothur/silva.bacteria/silva.bacteria.fasta
    /project/db/galaxy/mothur/silva.bacteria/silva.bacteria.rdp.tax
    /project/db/galaxy/mothur/Silva.eukarya.zip
    /project/db/galaxy/mothur/Gg_ss_map.zip
    /project/db/galaxy/mothur/core_set_aligned.imputed.fasta
    /project/db/galaxy/mothur/RDP/FungiLSU_train_1400bp_8506_mod.fasta
    /project/db/galaxy/mothur/RDP/FungiLSU_train_1400bp_8506_mod.tax
    /project/db/galaxy/mothur/RDP/trainset6_032010.rdp.fasta
    /project/db/galaxy/mothur/RDP/trainset6_032010.rdp.tax
    /project/db/galaxy/mothur/RDP/trainset7_112011.pds.fasta
    /project/db/galaxy/mothur/RDP/trainset7_112011.pds.tax
    /project/db/galaxy/mothur/RDP/trainset7_112011.rdp.fasta
    /project/db/galaxy/mothur/RDP/trainset7_112011.rdp.tax



Add tool-data:  (contains  pointers to silva, greengenes, and RDP reference data)
  tool-data/mothur_aligndb.loc
  tool-data/mothur_map.loc
  tool-data/mothur_taxonomy.loc
  tool-data/shared/jars/TreeVector.jar


################################################################
#### If you are manually adding this to your local galaxy:  ####
################################################################

add config files (*.xml) and wrapper code (*.py) from tools/mothur/*  to your galaxy installation 


add datatype definition file: lib/galaxy/datatypes/metagenomics.py

add the following import line to:  lib/galaxy/datatypes/registry.py
import metagenomics # added for metagenomics mothur



add datatypes to:  datatypes_conf.xml

add mothur tools to:   tool_conf.xml

############ DESIGN NOTES #########################################################################################################
Each mothur command has it's own tool_config (.xml) file, but all call the same python wrapper code: mothur_wrapper.py

  (The environment variable MOTHUR_MAX_PROCESSORS can be used to limit the number of cpu processors used be mothur commands)

* Every mothur tool will call mothur_wrapper.py script with a --cmd= parameter that gives the mothur command name.
* Every tool will produce the logfile of the mothur run as an output.
* When the outputs of a mothur command could be determined in advance, they are included in the --result= parameter to mothur_wrapper.py
* When the number of outputs cannot be determined in advance, the name patterns and datatypes of the ouputs 
  are included in the --new_datasets parameter to mothur_wrapper.py
 
Here is an example call to the mothur_wrapper.py script with an explanation before each param :
 mothur_wrapper.py
 # name of a mothur command, this is required
 --cmd='summary.shared'
 # Galaxy output dataset list, these are output files that can be determined before the command is run
 # The items in the list are separated by commas
 # Each item contains a regex to match the output filename and a galaxy dataset filepath in which to copy the data (separated by :)
 --result='^mothur.\S+\.logfile$:'/home/galaxy/data/database/files/002/dataset_2613.dat,'^\S+\.summary$:'/home/galaxy/data/database/files/002/dataset_2614.dat
 # Galaxy output dataset extra_files_path direcotry in which to put all output files (usually the logfile extra_file path)
 --outputdir='/home/galaxy/data/database/files/002/dataset_2613_files'
 # The id of one of the galaxy outputs (e.g. the mothur logfile) used for dynamic dataset generation (when number of outputs not known in advance)
 #  see: ttp://bitbucket.org/galaxy/galaxy-central/wiki/ToolsMultipleOutput
 --datasetid='2578'
 # The galaxy directory in which to copy all output files for dynamic dataset generation (special galaxy tool param: $__new_file_path__)
 --new_file_path='$__new_file_path__'
 # specifies files to copy to the new_file_path
 # The list is separated by commas
 # Each item  conatins:   a regex pattern for matching filenames and  a galaxy datatype (separated by :)
 # The regex match.groups()[0] is used as the id name of the dataset, and must result in  unique name for each output
 --new_datasets='^\S+?\.((\S+)\.(unique|[0-9.]*)\.dist)$:lower.dist'

 ## 
 ## NOTE:   The "read" commands were eliminated with Mothur version 1.18
 ##