Repository 'amplicon_analysis_pipeline'
hg clone https://toolshed.g2.bx.psu.edu/repos/pjbriggs/amplicon_analysis_pipeline

Changeset 0:47ec9c6f44b8 (2017-11-09)
Next changeset 1:1c1902e12caf (2018-04-25)
Commit message:
planemo upload for repository https://github.com/pjbriggs/Amplicon_analysis-galaxy commit b63924933a03255872077beb4d0fde49d77afa92
added:
README.rst
amplicon_analysis_pipeline.py
amplicon_analysis_pipeline.xml
install_tool_deps.sh
static/images/Pipeline_description_Fig1.png
static/images/Pipeline_description_Fig2.png
static/images/Pipeline_description_Fig3.png
b
diff -r 000000000000 -r 47ec9c6f44b8 README.rst
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/README.rst Thu Nov 09 10:13:29 2017 -0500
b
b'@@ -0,0 +1,249 @@\n+Amplicon_analysis-galaxy\n+========================\n+\n+A Galaxy tool wrapper to Mauro Tutino\'s ``Amplicon_analysis`` pipeline\n+script at https://github.com/MTutino/Amplicon_analysis\n+\n+The pipeline can analyse paired-end 16S rRNA data from Illumina Miseq\n+(Casava >= 1.8) and performs the following operations:\n+\n+ * QC and clean up of input data\n+ * Removal of singletons and chimeras and building of OTU table\n+   and phylogenetic tree\n+ * Beta and alpha diversity of analysis\n+\n+Usage documentation\n+===================\n+\n+Usage of the tool (including required inputs) is documented within\n+the ``help`` section of the tool XML.\n+\n+Installing the tool in a Galaxy instance\n+========================================\n+\n+The following sections describe how to install the tool files,\n+dependencies and reference data, and how to configure the Galaxy\n+instance to detect the dependencies and reference data correctly\n+at run time.\n+\n+1. Install the dependencies\n+---------------------------\n+\n+The ``install_tool_deps.sh`` script can be used to fetch and install the\n+dependencies locally, for example::\n+\n+    install_tool_deps.sh /path/to/local_tool_dependencies\n+\n+This can take some time to complete. When finished it should have\n+created a set of directories containing the dependencies under the\n+specified top level directory.\n+\n+2. Install the tool files\n+-------------------------\n+\n+The core tool is hosted on the Galaxy toolshed, so it can be installed\n+directly from there (this is the recommended route):\n+\n+ * https://toolshed.g2.bx.psu.edu/view/pjbriggs/amplicon_analysis_pipeline/\n+\n+Alternatively it can be installed manually; in this case there are two\n+files to install:\n+\n+ * ``amplicon_analysis_pipeline.xml`` (the Galaxy tool definition)\n+ * ``amplicon_analysis_pipeline.py`` (the Python wrapper script)\n+\n+Put these in a directory that is visible to Galaxy (e.g. a\n+``tools/Amplicon_analysis/`` folder), and modify the ``tools_conf.xml``\n+file to tell Galaxy to offer the tool by adding the line e.g.::\n+\n+    <tool file="Amplicon_analysis/amplicon_analysis_pipeline.xml" />\n+\n+3. Install the reference data\n+-----------------------------\n+\n+The script ``References.sh`` from the pipeline package at\n+https://github.com/MTutino/Amplicon_analysis can be run to install\n+the reference data, for example::\n+\n+    cd /path/to/pipeline/data\n+    wget https://github.com/MTutino/Amplicon_analysis/raw/master/References.sh\n+    /bin/bash ./References.sh\n+\n+will install the data in ``/path/to/pipeline/data``.\n+\n+**NB** The final amount of data downloaded and uncompressed will be\n+around 6GB.\n+\n+4. Configure dependencies and reference data in Galaxy\n+------------------------------------------------------\n+\n+The final steps are to make your Galaxy installation aware of the\n+tool dependencies and reference data, so it can locate them both when\n+the tool is run.\n+\n+To target the tool dependencies installed previously, add the\n+following lines to the ``dependency_resolvers_conf.xml`` file in the\n+Galaxy ``config`` directory::\n+\n+    <dependency_resolvers>\n+    ...\n+      <galaxy_packages base_path="/path/to/local_tool_dependencies" />\n+      <galaxy_packages base_path="/path/to/local_tool_dependencies" versionless="true" />\n+      ...\n+    </dependency_resolvers>\n+\n+(NB it is recommended to place these *before* the ``<conda ... />``\n+resolvers)\n+\n+(If you\'re not familiar with dependency resolvers in Galaxy then\n+see the documentation at\n+https://docs.galaxyproject.org/en/master/admin/dependency_resolvers.html\n+for more details.)\n+\n+The tool locates the reference data via an environment variable called\n+``AMPLICON_ANALYSIS_REF_DATA_PATH``, which needs to set to the parent\n+directory where the reference data has been installed.\n+\n+There are various ways to do this, depending on how your Galaxy\n+installation is configured:\n+\n+ * **For local instances:** add a line to set it in the\n+   ``config/local_env.sh`` file of your Galaxy installation, e.g'..b'  ``Amplicon_analysis`` (hint: use your browser\'s \'find-in-page\'\n+     search function to help locate it) and click on\n+     ``Submit new whitelist`` to update the settings.\n+\n+Additional details\n+==================\n+\n+Some other things to be aware of:\n+\n+ * Note that using the Silva database requires a minimum of 18Gb RAM\n+\n+Known problems\n+==============\n+\n+ * Only the ``VSEARCH`` pipeline in Mauro\'s script is currently\n+   available via the Galaxy tool; the ``USEARCH`` and ``QIIME``\n+   pipelines have yet to be implemented.\n+ * The images in the tool help section are not visible if the\n+   tool has been installed locally, or if it has been installed in\n+   a Galaxy instance which is served from a subdirectory.\n+\n+   These are both problems with Galaxy and not the tool, see\n+   https://github.com/galaxyproject/galaxy/issues/4490 and\n+   https://github.com/galaxyproject/galaxy/issues/1676\n+\n+Appendix: availability of tool dependencies\n+===========================================\n+\n+The tool takes its dependencies from the underlying pipeline script (see\n+https://github.com/MTutino/Amplicon_analysis/blob/master/README.md\n+for details).\n+\n+As noted above, currently the ``install_tool_deps.sh`` script can be\n+used to manually install the dependencies for a local tool install.\n+\n+In principle these should also be available if the tool were installed\n+from a toolshed. However it would be preferrable in this case to get as\n+many of the dependencies as possible via the ``conda`` dependency\n+resolver.\n+\n+The following are known to be available via conda, with the required\n+version:\n+\n+ - cutadapt 1.8.1\n+ - sickle-trim 1.33\n+ - bioawk 1.0\n+ - fastqc 0.11.3\n+ - R 3.2.0\n+\n+Some dependencies are available but with the "wrong" versions:\n+\n+ - spades (need 3.5.0)\n+ - qiime (need 1.8.0)\n+ - blast (need 2.2.26)\n+ - vsearch (need 1.1.3)\n+\n+The following dependencies are currently unavailable:\n+\n+ - fasta_number (need 02jun2015)\n+ - fasta-splitter (need 0.2.4)\n+ - rdp_classifier (need 2.2)\n+ - microbiomeutil (need r20110519)\n+\n+(NB usearch 6.1.544 and 8.0.1623 are special cases which must be\n+handled outside of Galaxy\'s dependency management systems.)\n+\n+History\n+=======\n+\n+========== ======================================================================\n+Version    Changes\n+---------- ----------------------------------------------------------------------\n+1.1.0      First official version on Galaxy toolshed.\n+1.0.6      Expand inline documentation to provide detailed usage guidance.\n+1.0.5      Updates including:\n+\n+           - Capture read counts from quality control as new output dataset\n+           - Capture FastQC per-base quality boxplots for each sample as\n+             new output dataset\n+           - Add support for -l option (sliding window length for trimming)\n+           - Default for -L set to "200"\n+1.0.4      Various updates:\n+\n+\t   - Additional outputs are captured when a "Categories" file is\n+\t     supplied (alpha diversity rarefaction curves and boxplots)\n+\t   - Sample names derived from Fastqs in a collection of pairs\n+\t     are trimmed to SAMPLE_S* (for Illumina-style Fastq filenames)\n+           - Input Fastqs can now be of more general ``fastq`` type\n+\t   - Log file outputs are captured in new output dataset\n+\t   - User can specify a "title" for the job which is copied into\n+\t     the dataset names (to distinguish outputs from different runs)\n+\t   - Improved detection and reporting of problems with input\n+\t     Metatable\n+1.0.3      Take the sample names from the collection dataset names when\n+           using collection as input (this is now the default input mode);\n+           collect additional output dataset; disable ``usearch``-based\n+           pipelines (i.e. ``UPARSE`` and ``QIIME``).\n+1.0.2      Enable support for FASTQs supplied via dataset collections and\n+           fix some broken output datasets.\n+1.0.1      Initial version\n+========== ======================================================================\n'
b
diff -r 000000000000 -r 47ec9c6f44b8 amplicon_analysis_pipeline.py
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/amplicon_analysis_pipeline.py Thu Nov 09 10:13:29 2017 -0500
[
b'@@ -0,0 +1,329 @@\n+#!/usr/bin/env python\n+#\n+# Wrapper script to run Amplicon_analysis_pipeline.sh\n+# from Galaxy tool\n+\n+import sys\n+import os\n+import argparse\n+import subprocess\n+import glob\n+\n+class PipelineCmd(object):\n+    def __init__(self,cmd):\n+        self.cmd = [str(cmd)]\n+    def add_args(self,*args):\n+        for arg in args:\n+            self.cmd.append(str(arg))\n+    def __repr__(self):\n+        return \' \'.join([str(arg) for arg in self.cmd])\n+\n+def ahref(target,name=None,type=None):\n+    if name is None:\n+        name = os.path.basename(target)\n+    ahref = "<a href=\'%s\'" % target\n+    if type is not None:\n+        ahref += " type=\'%s\'" % type\n+    ahref += ">%s</a>" % name\n+    return ahref\n+\n+def check_errors():\n+    # Errors in Amplicon_analysis_pipeline.log\n+    with open(\'Amplicon_analysis_pipeline.log\',\'r\') as pipeline_log:\n+        log = pipeline_log.read()\n+        if "Names in the first column of Metatable.txt and in the second column of Final_name.txt do not match" in log:\n+            print_error("""*** Sample IDs don\'t match dataset names ***\n+\n+The sample IDs (first column of the Metatable file) don\'t match the\n+supplied sample names for the input Fastq pairs.\n+""")\n+    # Errors in pipeline output\n+    with open(\'pipeline.log\',\'r\') as pipeline_log:\n+        log = pipeline_log.read()\n+        if "Errors and/or warnings detected in mapping file" in log:\n+            with open("Metatable_log/Metatable.log","r") as metatable_log:\n+                # Echo the Metatable log file to the tool log\n+                print_error("""*** Error in Metatable mapping file ***\n+\n+%s""" % metatable_log.read())\n+        elif "No header line was found in mapping file" in log:\n+            # Report error to the tool log\n+            print_error("""*** No header in Metatable mapping file ***\n+\n+Check you\'ve specified the correct file as the input Metatable""")\n+\n+def print_error(message):\n+    width = max([len(line) for line in message.split(\'\\n\')]) + 4\n+    sys.stderr.write("\\n%s\\n" % (\'*\'*width))\n+    for line in message.split(\'\\n\'):\n+        sys.stderr.write("* %s%s *\\n" % (line,\' \'*(width-len(line)-4)))\n+    sys.stderr.write("%s\\n\\n" % (\'*\'*width))\n+\n+def clean_up_name(sample):\n+    # Remove trailing "_L[0-9]+_001" from Fastq\n+    # pair names\n+    split_name = sample.split(\'_\')\n+    if split_name[-1] == "001":\n+        split_name = split_name[:-1]\n+    if split_name[-1].startswith(\'L\'):\n+        try:\n+            int(split_name[-1][1:])\n+            split_name = split_name[:-1]\n+        except ValueError:\n+            pass\n+    return \'_\'.join(split_name)\n+\n+def list_outputs(filen=None):\n+    # List the output directory contents\n+    # If filen is specified then will be the filename to\n+    # write to, otherwise write to stdout\n+    if filen is not None:\n+        fp = open(filen,\'w\')\n+    else:\n+        fp = sys.stdout\n+    results_dir = os.path.abspath("RESULTS")\n+    fp.write("Listing contents of output dir %s:\\n" % results_dir)\n+    ix = 0\n+    for d,dirs,files in os.walk(results_dir):\n+        ix += 1\n+        fp.write("-- %d: %s\\n" % (ix,\n+                                  os.path.relpath(d,results_dir)))\n+        for f in files:\n+            ix += 1\n+            fp.write("---- %d: %s\\n" % (ix,\n+                                        os.path.relpath(f,results_dir)))\n+    # Close output file\n+    if filen is not None:\n+        fp.close()\n+\n+if __name__ == "__main__":\n+    # Command line\n+    print "Amplicon analysis: starting"\n+    p = argparse.ArgumentParser()\n+    p.add_argument("metatable",\n+                   metavar="METATABLE_FILE",\n+                   help="Metatable.txt file")\n+    p.add_argument("fastq_pairs",\n+                   metavar="SAMPLE_NAME FQ_R1 FQ_R2",\n+                   nargs="+",\n+                   default=list(),\n+                   help="Triplets of SAMPLE_NAME followed by "\n+                   "a R1/R2 FASTQ file pair")\n+    p.add_argument("-g",dest="forward_pcr_primer")\n+    p.add_a'..b' log:\n+                sys.stderr.write("%s" % log.read())\n+            # Write log file contents to tool log\n+            print "\\nAmplicon_analysis_pipeline.log:"\n+            with open(log_file,\'r\') as log:\n+                print "%s" % log.read()\n+    else:\n+        sys.stderr.write("ERROR missing log file \\"%s\\"\\n" %\n+                         log_file)\n+\n+    # Handle FastQC boxplots\n+    print "Amplicon analysis: collating per base quality boxplots"\n+    with open("fastqc_quality_boxplots.html","w") as quality_boxplots:\n+        # PHRED value for trimming\n+        phred_score = 20\n+        if args.trimming_threshold is not None:\n+            phred_score = args.trimming_threshold\n+        # Write header for HTML output file\n+        quality_boxplots.write("""<html>\n+<head>\n+<title>Amplicon analysis pipeline: Per-base Quality Boxplots (FastQC)</title>\n+<head>\n+<body>\n+<h1>Amplicon analysis pipeline: Per-base Quality Boxplots (FastQC)</h1>\n+""")\n+        # Look for raw and trimmed FastQC output for each sample\n+        for sample_name in sample_names:\n+            fastqc_dir = os.path.join(sample_name,"FastQC")\n+            quality_boxplots.write("<h2>%s</h2>" % sample_name)\n+            for d in ("Raw","cutdapt_sickle/Q%s" % phred_score):\n+                quality_boxplots.write("<h3>%s</h3>" % d)\n+                fastqc_html_files = glob.glob(\n+                    os.path.join(fastqc_dir,d,"*_fastqc.html"))\n+                if not fastqc_html_files:\n+                    quality_boxplots.write("<p>No FastQC outputs found</p>")\n+                    continue\n+                # Pull out the per-base quality boxplots\n+                for f in fastqc_html_files:\n+                    boxplot = None\n+                    with open(f) as fp:\n+                        for line in fp.read().split(">"):\n+                            try:\n+                                line.index("alt=\\"Per base quality graph\\"")\n+                                boxplot = line + ">"\n+                                break\n+                            except ValueError:\n+                                pass\n+                    if boxplot is None:\n+                        boxplot = "Missing plot"\n+                    quality_boxplots.write("<h4>%s</h4><p>%s</p>" %\n+                                           (os.path.basename(f),\n+                                            boxplot))\n+            quality_boxplots.write("""</body>\n+</html>\n+""")\n+\n+    # Handle additional output when categories file was supplied\n+    if args.categories_file is not None:\n+        # Alpha diversity boxplots\n+        print "Amplicon analysis: indexing alpha diversity boxplots"\n+        boxplots_dir = os.path.abspath(\n+            os.path.join("RESULTS",\n+                         "%s_%s" % (args.pipeline.title(),\n+                                    ("gg" if not args.use_silva\n+                                     else "silva")),\n+                         "Alpha_diversity",\n+                         "Alpha_diversity_boxplot",\n+                         "Categories_shannon"))\n+        print "Amplicon analysis: gathering PDFs from %s" % boxplots_dir\n+        boxplot_pdfs = [os.path.basename(pdf)\n+                        for pdf in\n+                        sorted(glob.glob(\n+                            os.path.join(boxplots_dir,"*.pdf")))]\n+        with open("alpha_diversity_boxplots.html","w") as boxplots_out:\n+            boxplots_out.write("""<html>\n+<head>\n+<title>Amplicon analysis pipeline: Alpha Diversity Boxplots (Shannon)</title>\n+<head>\n+<body>\n+<h1>Amplicon analysis pipeline: Alpha Diversity Boxplots (Shannon)</h1>\n+""")\n+            boxplots_out.write("<ul>\\n")\n+            for pdf in boxplot_pdfs:\n+                boxplots_out.write("<li>%s</li>\\n" % ahref(pdf))\n+            boxplots_out.write("<ul>\\n")\n+            boxplots_out.write("""</body>\n+</html>\n+""")\n+\n+    # Finish\n+    print "Amplicon analysis: finishing, exit code: %s" % exit_code\n+    sys.exit(exit_code)\n'
b
diff -r 000000000000 -r 47ec9c6f44b8 amplicon_analysis_pipeline.xml
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/amplicon_analysis_pipeline.xml Thu Nov 09 10:13:29 2017 -0500
[
b'@@ -0,0 +1,484 @@\n+<tool id="amplicon_analysis_pipeline" name="Amplicon Analysis Pipeline" version="1.0.6">\n+  <description>analyse 16S rRNA data from Illumina Miseq paired-end reads</description>\n+  <requirements>\n+    <requirement type="package" version="1.1">amplicon_analysis_pipeline</requirement>\n+    <requirement type="package" version="1.11">cutadapt</requirement>\n+    <requirement type="package" version="1.33">sickle</requirement>\n+    <requirement type="package" version="27-08-2013">bioawk</requirement>\n+    <requirement type="package" version="2.8.1">pandaseq</requirement>\n+    <requirement type="package" version="3.5.0">spades</requirement>\n+    <requirement type="package" version="0.11.3">fastqc</requirement>\n+    <requirement type="package" version="1.8.0">qiime</requirement>\n+    <requirement type="package" version="2.2.26">blast</requirement>\n+    <requirement type="package" version="0.2.4">fasta-splitter</requirement>\n+    <requirement type="package" version="2.2">rdp-classifier</requirement>\n+    <requirement type="package" version="3.2.0">R</requirement>\n+    <requirement type="package" version="1.1.3">vsearch</requirement>\n+    <requirement type="package" version="2010-04-29">microbiomeutil</requirement>\n+    <requirement type="package">fasta_number</requirement>\n+  </requirements>\n+  <stdio>\n+    <exit_code range="1:" />\n+  </stdio>\n+  <command><![CDATA[\n+  ## Set the reference database name\n+  #if $reference_database == ""\n+    #set reference_database_name = "gg"\n+  #else\n+    #set reference_database_name = "silva"\n+  #end if\n+\n+  ## Run the amplicon analysis pipeline wrapper\n+  python $__tool_directory__/amplicon_analysis_pipeline.py\n+  ## Set options\n+  #if str( $forward_pcr_primer ) != ""\n+  -g "$forward_pcr_primer"\n+  #end if\n+  #if str( $reverse_pcr_primer ) != ""\n+  -G "$reverse_pcr_primer"\n+  #end if\n+  #if str( $trimming_threshold ) != ""\n+  -q $trimming_threshold\n+  #end if\n+  #if str( $sliding_window_length ) != ""\n+  -l $sliding_window_length\n+  #end if\n+  #if str( $minimum_overlap ) != ""\n+  -O $minimum_overlap\n+  #end if\n+  #if str( $minimum_length ) != ""\n+  -L $minimum_length\n+  #end if\n+  -P $pipeline\n+  -r \\$AMPLICON_ANALYSIS_REF_DATA_PATH\n+  #if str( $reference_database ) != ""\n+    "${reference_database}"\n+  #end if\n+  #if str($categories_file_in) != \'None\'\n+    -c "${categories_file_in}"\n+  #end if\n+  ## Input files\n+  "${metatable_file_in}"\n+  ## FASTQ pairs\n+  #if str($input_type.pairs_or_collection) == "collection"\n+    #set fastq_pairs = $input_type.fastq_collection\n+  #else\n+    #set fastq_pairs = $input_type.fastq_pairs\n+  #end if\n+  #for $fq_pair in $fastq_pairs\n+    "${fq_pair.name}" "${fq_pair.forward}" "${fq_pair.reverse}"\n+  #end for\n+  &&\n+\n+  ## Collect outputs\n+  cp Metatable_log/Metatable_mod.txt "${metatable_mod}" &&\n+  cp ${pipeline}_OTU_tables/multiplexed_linearized_dereplicated_mc2_repset_nonchimeras_tax_OTU_table.biom "${tax_otu_table_biom_file}" &&\n+  cp ${pipeline}_OTU_tables/otus.tre "${otus_tre_file}" &&\n+  cp RESULTS/${pipeline}_${reference_database_name}/OTUs_count.txt "${otus_count_file}" &&\n+  cp RESULTS/${pipeline}_${reference_database_name}/table_summary.txt "${table_summary_file}" &&\n+  cp Multiplexed_files/${pipeline}_pipeline/multiplexed_linearized_dereplicated_mc2_repset_nonchimeras_OTUs.fasta "${dereplicated_nonchimera_otus_fasta}" &&\n+  cp QUALITY_CONTROL/Reads_count.txt "$read_counts_out" &&\n+  cp fastqc_quality_boxplots.html "${fastqc_quality_boxplots_html}" &&\n+\n+  ## HTML outputs\n+\n+  ## OTU table\n+  mkdir $heatmap_otu_table_html.files_path &&\n+  cp -r RESULTS/${pipeline}_${reference_database_name}/Heatmap/js $heatmap_otu_table_html.files_path &&\n+  cp RESULTS/${pipeline}_${reference_database_name}/Heatmap/otu_table.html "${heatmap_otu_table_html}" &&\n+\n+  ## Phylum genus barcharts\n+  mkdir $phylum_genus_dist_barcharts_html.files_path &&\n+  cp -r RESULTS/${pipeline}_${reference_database_name}/phylum_genus_charts/charts $phylum_genus_dist_barchart'..b'. Insert the PCR primer sequence\n+   in the corresponding field. DO NOT include any barcode or adapter\n+   sequence. If the PCR primers have been already trimmed by the MiSeq,\n+   and you include the sequence in this field, this would lead to an error.\n+   Only include the sequences if still present in the fastq files.\n+\n+ * **Threshold quality below which reads will be trimmed** Choose the\n+   Phred score used by Sickle to trim the reads at the 3\xe2\x80\x99 end.\n+\n+ * **Minimum length to retain a read after trimming** If the read length\n+   after trimming is shorter than a user defined length, the read, along\n+   with the corresponding read pair, will be discarded.\n+\n+ * **Minimum overlap in bp between forward and reverse reads** Choose the\n+   minimum basepair overlap used by Pandaseq to assemble the reads.\n+   Default is 10.\n+\n+ * **Minimum length in bp to keep a sequence after overlapping** Choose the\n+   minimum sequence length used by Pandaseq to keep a sequence after the\n+   overlapping. This depends on the expected amplicon length. Default is\n+   380 (used for V3-V4 16S sequencing; expected length ~440bp)\n+\n+ * **Pipeline to use for analysis** Choose the pipeline to use for OTU\n+   clustering and chimera removal. The Galaxy tool currently supports\n+   ``Vsearch`` only. ``Uparse`` and ``QIIME`` are planned to be added\n+   shortly (the tools are already available for the stand-alone pipeline).\n+\n+ * **Reference database** Choose between ``GreenGenes`` and ``Silva``\n+   databases for taxa assignment.\n+\n+Click on **Execute** to start the analysis.\n+\n+5. Results\n+**********\n+\n+Results are entirely generated using QIIME scripts. The results will \n+appear in the History panel when the analysis is completed\n+\n+ * **Vsearch_tax_OTU_table (biom format)** The OTU table in BIOM format\n+   (http://biom-format.org/)\n+\n+ * **Vsearch_OTUs.tree** Phylogenetic tree constructed using\n+   ``make_phylogeny.py`` (fasttree) QIIME script\n+   (http://qiime.org/scripts/make_phylogeny.html)\n+\n+ * **Vsearch_phylum_genus_dist_barcharts_HTML** HTML file with bar\n+   charts at Phylum, Genus and Species level\n+   (http://qiime.org/scripts/summarize_taxa.html and\n+   http://qiime.org/scripts/plot_taxa_summary.html)\n+\n+ * **Vsearch_OTUs_count_file** Summary of OTU counts per sample\n+   (http://biom-format.org/documentation/summarizing_biom_tables.html)\n+\n+ * **Vsearch_table_summary_file** Summary of sequences counts per sample\n+   (http://biom-format.org/documentation/summarizing_biom_tables.html)\n+\n+ * **Vsearch_multiplexed_linearized_dereplicated_mc2_repset_nonchimeras_OTUs.fasta**\n+   Fasta file with OTU sequences\n+\n+ * **Vsearch_heatmap_OTU_table_HTML** Interactive OTU heatmap\n+   (http://qiime.org/1.8.0/scripts/make_otu_heatmap_html.html )\n+\n+ * **Vsearch_beta_diversity_weighted_2D_plots_HTML** PCoA plots in HTML\n+   format using weighted Unifrac distance measure. Samples are grouped\n+   by the column names present in the Metatable file. The samples are\n+   firstly rarefied to the minimum sequencing depth\n+   (http://qiime.org/scripts/beta_diversity_through_plots.html )\n+\n+ * **Vsearch_beta_diversity_unweighted_2D_plots_HTML** PCoA plots in HTML\n+   format using Unweighted Unifrac distance measure. Samples are grouped\n+   by the column names present in the Metatable file. The samples are\n+   firstly rarefied to the minimum sequencing depth\n+   (http://qiime.org/scripts/beta_diversity_through_plots.html )\n+\n+Code availability\n+-----------------\n+\n+**Code is available at** https://github.com/MTutino/Amplicon_analysis\n+\n+Credits\n+-------\n+\n+Pipeline author: Mauro Tutino\n+\n+Galaxy tool: Peter Briggs\n+\n+\t]]></help>\n+  <citations>\n+    <citation type="bibtex">\n+      @misc{githubAmplicon_analysis,\n+      author = {Tutino, Mauro},\n+      year = {2017},\n+      title = {Amplicon Analysis Pipeline},\n+      publisher = {GitHub},\n+      journal = {GitHub repository},\n+      url = {https://github.com/MTutino/Amplicon_analysis},\n+}</citation>\n+  </citations>\n+</tool>\n'
b
diff -r 000000000000 -r 47ec9c6f44b8 install_tool_deps.sh
--- /dev/null Thu Jan 01 00:00:00 1970 +0000
+++ b/install_tool_deps.sh Thu Nov 09 10:13:29 2017 -0500
[
b'@@ -0,0 +1,706 @@\n+#!/bin/bash -e\n+#\n+# Install the tool dependencies for Amplicon_analysis_pipeline.sh for\n+# testing from command line\n+#\n+function install_python_package() {\n+    echo Installing $2 $3 from $4 under $1\n+    local install_dir=$1\n+    local install_dirs="$install_dir $install_dir/bin $install_dir/lib/python2.7/site-packages"\n+    for d in $install_dirs ; do\n+\tif [ ! -d $d ] ; then\n+\t    mkdir -p $d\n+\tfi\n+    done\n+    wd=$(mktemp -d)\n+    echo Moving to $wd\n+    pushd $wd\n+    wget -q $4\n+    if [ ! -f "$(basename $4)" ] ; then\n+\techo "No archive $(basename $4)"\n+\texit 1\n+    fi\n+    tar xzf $(basename $4)\n+    if [ ! -d "$5" ] ; then\n+\techo "No directory $5"\n+\texit 1\n+    fi\n+    cd $5\n+    /bin/bash <<EOF\n+export PYTHONPATH=$install_dir:$PYTHONPATH && \\\n+export PYTHONPATH=$install_dir/lib/python2.7/site-packages:$PYTHONPATH && \\\n+python setup.py install --prefix=$install_dir --install-scripts=$install_dir/bin --install-lib=$install_dir/lib/python2.7/site-packages >>$INSTALL_DIR/INSTALLATION.log 2>&1\n+EOF\n+    popd\n+    rm -rf $wd/*\n+    rmdir $wd\n+}\n+function install_amplicon_analysis_pipeline_1_1() {\n+    install_amplicon_analysis_pipeline $1 1.1\n+}\n+function install_amplicon_analysis_pipeline_1_0() {\n+    install_amplicon_analysis_pipeline $1 1.0\n+}\n+function install_amplicon_analysis_pipeline() {\n+    version=$2\n+    echo Installing Amplicon_analysis $version\n+    install_dir=$1/amplicon_analysis_pipeline/$version\n+    if [ -f $install_dir/env.sh ] ; then\n+\treturn\n+    fi\n+    mkdir -p $install_dir\n+    echo Moving to $install_dir\n+    pushd $install_dir\n+    wget -q https://github.com/MTutino/Amplicon_analysis/archive/v${version}.tar.gz\n+    tar zxf v${version}.tar.gz\n+    mv Amplicon_analysis-${version} Amplicon_analysis\n+    rm -rf v${version}.tar.gz\n+    popd\n+    # Make setup file\n+    cat > $install_dir/env.sh <<EOF\n+#!/bin/sh\n+# Source this to setup Amplicon_analysis/$version\n+echo Setting up Amplicon analysis pipeline $version\n+export PATH=$install_dir/Amplicon_analysis:\\$PATH\n+## AMPLICON_ANALYSIS_REF_DATA_PATH should be set in\n+## config/local_env.sh or in the job_conf.xml file\n+## - see the README\n+##export AMPLICON_ANALYSIS_REF_DATA_PATH=\n+#\n+EOF\n+}\n+function install_amplicon_analysis_pipeline_1_0_patched() {\n+    version="1.0-patched"\n+    echo Installing Amplicon_analysis $version\n+    install_dir=$1/amplicon_analysis_pipeline/$version\n+    if [ -f $install_dir/env.sh ] ; then\n+\treturn\n+    fi\n+    mkdir -p $install_dir\n+    echo Moving to $install_dir\n+    pushd $install_dir\n+    # Clone and patch analysis pipeline scripts\n+    git clone https://github.com/pjbriggs/Amplicon_analysis.git\n+    cd Amplicon_analysis\n+    git checkout -b $version\n+    branches=\n+    if [ ! -z "$branches" ] ; then\n+\tfor branch in $branches ; do\n+\t    git checkout -b $branch origin/$branch\n+\t    git checkout $version\n+\t    git merge -m "Merge $branch into $version" $branch\n+\tdone\n+    fi\n+    cd ..\n+    popd\n+    # Make setup file\n+    cat > $install_dir/env.sh <<EOF\n+#!/bin/sh\n+# Source this to setup Amplicon_analysis/$version\n+echo Setting up Amplicon analysis pipeline $version\n+export PATH=$install_dir/Amplicon_analysis:\\$PATH\n+## AMPLICON_ANALYSIS_REF_DATA_PATH should be set in\n+## config/local_env.sh or in the job_conf.xml file\n+## - see the README\n+##export AMPLICON_ANALYSIS_REF_DATA_PATH=\n+#\n+EOF\n+}\n+function install_cutadapt_1_11() {\n+    echo Installing cutadapt 1.11\n+    INSTALL_DIR=$1/cutadapt/1.11\n+    if [ -f $INSTALL_DIR/env.sh ] ; then\n+\treturn\n+    fi\n+    mkdir -p $INSTALL_DIR\n+    install_python_package $INSTALL_DIR cutadapt 1.11 \\\n+\thttps://pypi.python.org/packages/47/bf/9045e90dac084a90aa2bb72c7d5aadefaea96a5776f445f5b5d9a7a2c78b/cutadapt-1.11.tar.gz \\\n+\tcutadapt-1.11\n+    # Make setup file\n+    cat > $INSTALL_DIR/env.sh <<EOF\n+#!/bin/sh\n+# Source this to setup cutadapt/1.11\n+echo Setting up cutadapt 1.11\n+#if [ -f $1/python/2.7.10/env.sh ] ; then\n+#   . $1/python/2.7.10/env.sh\n+#fi\n+export PA'..b'tter.pl\n+    mv fasta-splitter.pl $install_dir/bin\n+    popd\n+    # Clean up\n+    rm -rf $wd/*\n+    rmdir $wd\n+    # Make setup file\n+cat > $install_dir/env.sh <<EOF\n+#!/bin/sh\n+# Source this to setup fasta-splitter/0.2.4\n+echo Setting up fasta-splitter 0.2.4\n+export PATH=$install_dir/bin:\\$PATH\n+export PERL5LIB=$install_dir/lib/perl5:\\$PERL5LIB\n+#\n+EOF\n+}\n+function install_rdp_classifier_2_2() {\n+    echo Installing rdp-classifier 2.2R\n+    local install_dir=$1/rdp-classifier/2.2\n+    if [ -f $install_dir/env.sh ] ; then\n+\treturn\n+    fi\n+    mkdir -p $install_dir\n+    local wd=$(mktemp -d)\n+    echo Moving to $wd\n+    pushd $wd\n+    wget -q https://sourceforge.net/projects/rdp-classifier/files/rdp-classifier/rdp_classifier_2.2.zip\n+    unzip -qq rdp_classifier_2.2.zip\n+    cd rdp_classifier_2.2\n+    mv * $install_dir\n+    popd\n+    # Clean up\n+    rm -rf $wd/*\n+    rmdir $wd\n+    # Make setup file\n+cat > $install_dir/env.sh <<EOF\n+#!/bin/sh\n+# Source this to setup rdp-classifier/2.2\n+echo Setting up RDP classifier 2.2\n+export RDP_JAR_PATH=$install_dir/rdp_classifier-2.2.jar\n+#\n+EOF\n+}\n+function install_R_3_2_0() {\n+    # Adapted from https://github.com/fls-bioinformatics-core/galaxy-tools/blob/master/local_dependency_installers/R.sh\n+    echo Installing R 3.2.0\n+    local install_dir=$1/R/3.2.0\n+    if [ -f $install_dir/env.sh ] ; then\n+\treturn\n+    fi\n+    mkdir -p $install_dir\n+    local wd=$(mktemp -d)\n+    echo Moving to $wd\n+    pushd $wd\n+    wget -q http://cran.r-project.org/src/base/R-3/R-3.2.0.tar.gz\n+    tar xzf R-3.2.0.tar.gz\n+    cd R-3.2.0\n+    ./configure --prefix=$install_dir\n+    make\n+    make install\n+    popd\n+    # Clean up\n+    rm -rf $wd/*\n+    rmdir $wd\n+    # Make setup file\n+cat > $install_dir/env.sh <<EOF\n+#!/bin/sh\n+# Source this to setup R/3.2.0\n+echo Setting up R 3.2.0\n+export PATH=$install_dir/bin:\\$PATH\n+export TCL_LIBRARY=$install_dir/lib/libtcl8.4.so\n+export TK_LIBRARY=$install_dir/lib/libtk8.4.so\n+#\n+EOF\n+}\n+function install_uc2otutab() {\n+    # See http://drive5.com/python/uc2otutab_py.html\n+    echo Installing uc2otutab\n+    # Install to "default" version i.e. essentially a versionless\n+    # installation (see Galaxy dependency resolver docs)\n+    local install_dir=$1/uc2otutab/default\n+    if [ -f $install_dir/env.sh ] ; then\n+\treturn\n+    fi\n+    mkdir -p $install_dir/bin\n+    local wd=$(mktemp -d)\n+    echo Moving to $wd\n+    pushd $wd\n+    wget -q http://drive5.com/python/python_scripts.tar.gz\n+    tar zxf python_scripts.tar.gz\n+    mv die.py fasta.py progress.py uc.py $install_dir/bin\n+    echo "#!/usr/bin/env python" >$install_dir/bin/uc2otutab.py\n+    cat uc2otutab.py >>$install_dir/bin/uc2otutab.py\n+    chmod +x $install_dir/bin/uc2otutab.py\n+    popd\n+    # Clean up\n+    rm -rf $wd/*\n+    rmdir $wd\n+    # Make setup file\n+cat > $install_dir/env.sh <<EOF\n+#!/bin/sh\n+# Source this to setup uc2otutab/default\n+echo Setting up uc2otutab \\(default\\)\n+export PATH=$install_dir/bin:\\$PATH\n+#\n+EOF\n+}\n+##########################################################\n+# Main script starts here\n+##########################################################\n+# Fetch top-level installation directory from command line\n+TOP_DIR=$1\n+if [ -z "$TOP_DIR" ] ; then\n+    echo Usage: $(basename $0) DIR\n+    exit\n+fi\n+if [ -z "$(echo $TOP_DIR | grep ^/)" ] ; then\n+    TOP_DIR=$(pwd)/$TOP_DIR\n+fi\n+if [ ! -d "$TOP_DIR" ] ; then\n+    mkdir -p $TOP_DIR\n+fi\n+# Install dependencies\n+install_amplicon_analysis_pipeline_1_1 $TOP_DIR\n+install_cutadapt_1_11 $TOP_DIR\n+install_sickle_1_33 $TOP_DIR\n+install_bioawk_27_08_2013 $TOP_DIR\n+install_pandaseq_2_8_1 $TOP_DIR\n+install_spades_3_5_0 $TOP_DIR\n+install_fastqc_0_11_3 $TOP_DIR\n+install_qiime_1_8_0 $TOP_DIR\n+install_vsearch_1_1_3 $TOP_DIR\n+install_microbiomeutil_2010_04_29 $TOP_DIR\n+install_blast_2_2_26 $TOP_DIR\n+install_fasta_number $TOP_DIR\n+install_fasta_splitter_0_2_4 $TOP_DIR\n+install_rdp_classifier_2_2 $TOP_DIR\n+install_R_3_2_0 $TOP_DIR\n+install_uc2otutab $TOP_DIR\n+##\n+#\n'
b
diff -r 000000000000 -r 47ec9c6f44b8 static/images/Pipeline_description_Fig1.png
b
Binary file static/images/Pipeline_description_Fig1.png has changed
b
diff -r 000000000000 -r 47ec9c6f44b8 static/images/Pipeline_description_Fig2.png
b
Binary file static/images/Pipeline_description_Fig2.png has changed
b
diff -r 000000000000 -r 47ec9c6f44b8 static/images/Pipeline_description_Fig3.png
b
Binary file static/images/Pipeline_description_Fig3.png has changed