# HG changeset patch # User Vimalkumar Velayudhan # Date 1437486519 -3600 # Node ID c34c364ce75db4c65870f03395e4a905ce83f79a First commit diff -r 000000000000 -r c34c364ce75d .hgignore --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/.hgignore Tue Jul 21 14:48:39 2015 +0100 @@ -0,0 +1,3 @@ +syntax: glob + +*.pyc diff -r 000000000000 -r c34c364ce75d LICENSE --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/LICENSE Tue Jul 21 14:48:39 2015 +0100 @@ -0,0 +1,340 @@ + GNU GENERAL PUBLIC LICENSE + Version 2, June 1991 + + Copyright (C) 1989, 1991 Free Software Foundation, Inc., + 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + Everyone is permitted to copy and distribute verbatim copies + of this license document, but changing it is not allowed. + + Preamble + + The licenses for most software are designed to take away your +freedom to share and change it. By contrast, the GNU General Public +License is intended to guarantee your freedom to share and change free +software--to make sure the software is free for all its users. This +General Public License applies to most of the Free Software +Foundation's software and to any other program whose authors commit to +using it. (Some other Free Software Foundation software is covered by +the GNU Lesser General Public License instead.) You can apply it to +your programs, too. + + When we speak of free software, we are referring to freedom, not +price. Our General Public Licenses are designed to make sure that you +have the freedom to distribute copies of free software (and charge for +this service if you wish), that you receive source code or can get it +if you want it, that you can change the software or use pieces of it +in new free programs; and that you know you can do these things. + + To protect your rights, we need to make restrictions that forbid +anyone to deny you these rights or to ask you to surrender the rights. +These restrictions translate to certain responsibilities for you if you +distribute copies of the software, or if you modify it. + + For example, if you distribute copies of such a program, whether +gratis or for a fee, you must give the recipients all the rights that +you have. You must make sure that they, too, receive or can get the +source code. And you must show them these terms so they know their +rights. + + We protect your rights with two steps: (1) copyright the software, and +(2) offer you this license which gives you legal permission to copy, +distribute and/or modify the software. + + Also, for each author's protection and ours, we want to make certain +that everyone understands that there is no warranty for this free +software. If the software is modified by someone else and passed on, we +want its recipients to know that what they have is not the original, so +that any problems introduced by others will not reflect on the original +authors' reputations. + + Finally, any free program is threatened constantly by software +patents. We wish to avoid the danger that redistributors of a free +program will individually obtain patent licenses, in effect making the +program proprietary. To prevent this, we have made it clear that any +patent must be licensed for everyone's free use or not licensed at all. + + The precise terms and conditions for copying, distribution and +modification follow. + + GNU GENERAL PUBLIC LICENSE + TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION + + 0. This License applies to any program or other work which contains +a notice placed by the copyright holder saying it may be distributed +under the terms of this General Public License. The "Program", below, +refers to any such program or work, and a "work based on the Program" +means either the Program or any derivative work under copyright law: +that is to say, a work containing the Program or a portion of it, +either verbatim or with modifications and/or translated into another +language. (Hereinafter, translation is included without limitation in +the term "modification".) Each licensee is addressed as "you". + +Activities other than copying, distribution and modification are not +covered by this License; they are outside its scope. The act of +running the Program is not restricted, and the output from the Program +is covered only if its contents constitute a work based on the +Program (independent of having been made by running the Program). +Whether that is true depends on what the Program does. + + 1. You may copy and distribute verbatim copies of the Program's +source code as you receive it, in any medium, provided that you +conspicuously and appropriately publish on each copy an appropriate +copyright notice and disclaimer of warranty; keep intact all the +notices that refer to this License and to the absence of any warranty; +and give any other recipients of the Program a copy of this License +along with the Program. + +You may charge a fee for the physical act of transferring a copy, and +you may at your option offer warranty protection in exchange for a fee. + + 2. You may modify your copy or copies of the Program or any portion +of it, thus forming a work based on the Program, and copy and +distribute such modifications or work under the terms of Section 1 +above, provided that you also meet all of these conditions: + + a) You must cause the modified files to carry prominent notices + stating that you changed the files and the date of any change. + + b) You must cause any work that you distribute or publish, that in + whole or in part contains or is derived from the Program or any + part thereof, to be licensed as a whole at no charge to all third + parties under the terms of this License. + + c) If the modified program normally reads commands interactively + when run, you must cause it, when started running for such + interactive use in the most ordinary way, to print or display an + announcement including an appropriate copyright notice and a + notice that there is no warranty (or else, saying that you provide + a warranty) and that users may redistribute the program under + these conditions, and telling the user how to view a copy of this + License. (Exception: if the Program itself is interactive but + does not normally print such an announcement, your work based on + the Program is not required to print an announcement.) + +These requirements apply to the modified work as a whole. If +identifiable sections of that work are not derived from the Program, +and can be reasonably considered independent and separate works in +themselves, then this License, and its terms, do not apply to those +sections when you distribute them as separate works. But when you +distribute the same sections as part of a whole which is a work based +on the Program, the distribution of the whole must be on the terms of +this License, whose permissions for other licensees extend to the +entire whole, and thus to each and every part regardless of who wrote it. + +Thus, it is not the intent of this section to claim rights or contest +your rights to work written entirely by you; rather, the intent is to +exercise the right to control the distribution of derivative or +collective works based on the Program. + +In addition, mere aggregation of another work not based on the Program +with the Program (or with a work based on the Program) on a volume of +a storage or distribution medium does not bring the other work under +the scope of this License. + + 3. You may copy and distribute the Program (or a work based on it, +under Section 2) in object code or executable form under the terms of +Sections 1 and 2 above provided that you also do one of the following: + + a) Accompany it with the complete corresponding machine-readable + source code, which must be distributed under the terms of Sections + 1 and 2 above on a medium customarily used for software interchange; or, + + b) Accompany it with a written offer, valid for at least three + years, to give any third party, for a charge no more than your + cost of physically performing source distribution, a complete + machine-readable copy of the corresponding source code, to be + distributed under the terms of Sections 1 and 2 above on a medium + customarily used for software interchange; or, + + c) Accompany it with the information you received as to the offer + to distribute corresponding source code. (This alternative is + allowed only for noncommercial distribution and only if you + received the program in object code or executable form with such + an offer, in accord with Subsection b above.) + +The source code for a work means the preferred form of the work for +making modifications to it. For an executable work, complete source +code means all the source code for all modules it contains, plus any +associated interface definition files, plus the scripts used to +control compilation and installation of the executable. However, as a +special exception, the source code distributed need not include +anything that is normally distributed (in either source or binary +form) with the major components (compiler, kernel, and so on) of the +operating system on which the executable runs, unless that component +itself accompanies the executable. + +If distribution of executable or object code is made by offering +access to copy from a designated place, then offering equivalent +access to copy the source code from the same place counts as +distribution of the source code, even though third parties are not +compelled to copy the source along with the object code. + + 4. You may not copy, modify, sublicense, or distribute the Program +except as expressly provided under this License. Any attempt +otherwise to copy, modify, sublicense or distribute the Program is +void, and will automatically terminate your rights under this License. +However, parties who have received copies, or rights, from you under +this License will not have their licenses terminated so long as such +parties remain in full compliance. + + 5. You are not required to accept this License, since you have not +signed it. However, nothing else grants you permission to modify or +distribute the Program or its derivative works. These actions are +prohibited by law if you do not accept this License. Therefore, by +modifying or distributing the Program (or any work based on the +Program), you indicate your acceptance of this License to do so, and +all its terms and conditions for copying, distributing or modifying +the Program or works based on it. + + 6. Each time you redistribute the Program (or any work based on the +Program), the recipient automatically receives a license from the +original licensor to copy, distribute or modify the Program subject to +these terms and conditions. You may not impose any further +restrictions on the recipients' exercise of the rights granted herein. +You are not responsible for enforcing compliance by third parties to +this License. + + 7. If, as a consequence of a court judgment or allegation of patent +infringement or for any other reason (not limited to patent issues), +conditions are imposed on you (whether by court order, agreement or +otherwise) that contradict the conditions of this License, they do not +excuse you from the conditions of this License. If you cannot +distribute so as to satisfy simultaneously your obligations under this +License and any other pertinent obligations, then as a consequence you +may not distribute the Program at all. For example, if a patent +license would not permit royalty-free redistribution of the Program by +all those who receive copies directly or indirectly through you, then +the only way you could satisfy both it and this License would be to +refrain entirely from distribution of the Program. + +If any portion of this section is held invalid or unenforceable under +any particular circumstance, the balance of the section is intended to +apply and the section as a whole is intended to apply in other +circumstances. + +It is not the purpose of this section to induce you to infringe any +patents or other property right claims or to contest validity of any +such claims; this section has the sole purpose of protecting the +integrity of the free software distribution system, which is +implemented by public license practices. Many people have made +generous contributions to the wide range of software distributed +through that system in reliance on consistent application of that +system; it is up to the author/donor to decide if he or she is willing +to distribute software through any other system and a licensee cannot +impose that choice. + +This section is intended to make thoroughly clear what is believed to +be a consequence of the rest of this License. + + 8. If the distribution and/or use of the Program is restricted in +certain countries either by patents or by copyrighted interfaces, the +original copyright holder who places the Program under this License +may add an explicit geographical distribution limitation excluding +those countries, so that distribution is permitted only in or among +countries not thus excluded. In such case, this License incorporates +the limitation as if written in the body of this License. + + 9. The Free Software Foundation may publish revised and/or new versions +of the General Public License from time to time. Such new versions will +be similar in spirit to the present version, but may differ in detail to +address new problems or concerns. + +Each version is given a distinguishing version number. If the Program +specifies a version number of this License which applies to it and "any +later version", you have the option of following the terms and conditions +either of that version or of any later version published by the Free +Software Foundation. If the Program does not specify a version number of +this License, you may choose any version ever published by the Free Software +Foundation. + + 10. If you wish to incorporate parts of the Program into other free +programs whose distribution conditions are different, write to the author +to ask for permission. For software which is copyrighted by the Free +Software Foundation, write to the Free Software Foundation; we sometimes +make exceptions for this. Our decision will be guided by the two goals +of preserving the free status of all derivatives of our free software and +of promoting the sharing and reuse of software generally. + + NO WARRANTY + + 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY +FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN +OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES +PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED +OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF +MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS +TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE +PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, +REPAIR OR CORRECTION. + + 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING +WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR +REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, +INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING +OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED +TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY +YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER +PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE +POSSIBILITY OF SUCH DAMAGES. + + END OF TERMS AND CONDITIONS + + How to Apply These Terms to Your New Programs + + If you develop a new program, and you want it to be of the greatest +possible use to the public, the best way to achieve this is to make it +free software which everyone can redistribute and change under these terms. + + To do so, attach the following notices to the program. It is safest +to attach them to the start of each source file to most effectively +convey the exclusion of warranty; and each file should have at least +the "copyright" line and a pointer to where the full notice is found. + + {description} + Copyright (C) {year} {fullname} + + This program is free software; you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation; either version 2 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License along + with this program; if not, write to the Free Software Foundation, Inc., + 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. + +Also add information on how to contact you by electronic and paper mail. + +If the program is interactive, make it output a short notice like this +when it starts in an interactive mode: + + Gnomovision version 69, Copyright (C) year name of author + Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. + This is free software, and you are welcome to redistribute it + under certain conditions; type `show c' for details. + +The hypothetical commands `show w' and `show c' should show the appropriate +parts of the General Public License. Of course, the commands you use may +be called something other than `show w' and `show c'; they could even be +mouse-clicks or menu items--whatever suits your program. + +You should also get your employer (if you work as a programmer) or your +school, if any, to sign a "copyright disclaimer" for the program, if +necessary. Here is a sample; alter the names: + + Yoyodyne, Inc., hereby disclaims all copyright interest in the program + `Gnomovision' (which makes passes at compilers) written by James Hacker. + + {signature of Ty Coon}, 1 April 1989 + Ty Coon, President of Vice + +This General Public License does not permit incorporating your program into +proprietary programs. If your program is a subroutine library, you may +consider it more useful to permit linking proprietary applications with the +library. If this is what you want to do, use the GNU Lesser General +Public License instead of this License. + diff -r 000000000000 -r c34c364ce75d README.rst --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/README.rst Tue Jul 21 14:48:39 2015 +0100 @@ -0,0 +1,36 @@ +riboseqr_wrapper +---------------- +`riboSeqR `_ +integration for `Galaxy `_ and +`RiboGalaxy `_. + +Included tools +.............. +In the order they are run + +1. Prepare riboSeqR Input - Prepare riboSeqR format input files from SAM format alignment files. + The SAM format files should be obtained by aligning Ribo-Seq and RNA-Seq data to the transcriptome. + (RNA-Seq data is optional but required for Differential translation analysis). + +2. Triplet Periodicity - Plot triplet periodicity for different read lengths. + +3. Metagene Analysis + +4. Plot Ribosome Profile + + [OR] + + Differential Translation Analysis - Get Ribo and RNA-Seq counts with riboSeqR. Perform differential + translation analysis with baySeq. + +Dependencies +............ +Tested on Ubuntu Linux 14.04 LTS, 64-bit. Dependencies should install automatically on Linux x86_64. + +**R** ``3.1.2`` + +**riboSeqR** ``1.0.5`` + +**baySeq** ``2.0.50`` + +**rpy2** ``2.3.10`` diff -r 000000000000 -r c34c364ce75d difftrans.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/difftrans.xml Tue Jul 21 14:48:39 2015 +0100 @@ -0,0 +1,149 @@ + + + (Step 4) Get Ribo and RNA-Seq counts with riboSeqR. Perform differential + translation analysis with baySeq. + + + R + readline + rpy2 + riboseqr_wrapper_deps + + + + + + riboseqr/difftrans.py + --rdata_load "$rdata_load" + --slice_lengths "$slice_lengths" + --frames "$frames" + --group1 "$group1" + --group2 "$group2" + --num_counts "$num_counts" + --normalize "$normalize" + --html_file "$html_file" + --output_path "$html_file.files_path" + + + + value.name == "Metagene analysis (R data file)" + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Differential Translation Analysis +--------------------------------- +riboSeqR version: ``1.0.5``, baySeq version: ``2.0.50``. + +This tool can be used for obtaining the Ribo and RNA-Seq counts and +finding the +top candidates for differential translation using baySeq. The input is +the +R data file from the Metagene analysis step. + +How to use? +----------- +Inputs +...... +Select *Metagene analysis (R data file)*, enter lengths of ribosome +footprints +and the frames. For baySeq analysis, enter the group model and the +number of +results to return. Please refer to the documentation for explanation of +the +options. + +Outputs +....... +The following files will be generated on completion: + +*Differential Translation Analysis (HTML report)* + +A HTML file with results, links to other output files and the R script +used for +the session. + +riboSeqR functions used +....................... +``sliceCounts``, ``rnaCounts``. + +baySeq functions used +..................... +``getLibsizes``, ``getPriors``, ``getLikelihoods``, ``topCounts``. + +For detailed description of the functions and the options used, please +consult +the riboSeqR and baySeq documentation. + +Links +..... +* `riboSeqR <http://bioconductor.org/packages/3.0/bioc/html/riboSeqR.html>`_ +* `baySeq <http://bioconductor.org/packages/3.0/bioc/html/baySeq.html>`_ + + + + @Manual{, + title = {riboSeqR: Analysis of sequencing data from ribosome + profiling experiments.}, + author = {Thomas J. Hardcastle}, + year = {2014}, + note = {R package version 1.0.5}, + } + + + @Manual{, + title = {baySeq: Empirical Bayesian analysis of patterns of + differential + expression in count data}, + author = {Thomas J. Hardcastle}, + year = {2012}, + note = {R package version 2.0.50}, + } + + 10.1186/1471-2105-11-422 + + diff -r 000000000000 -r c34c364ce75d metagene.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/metagene.xml Tue Jul 21 14:48:39 2015 +0100 @@ -0,0 +1,202 @@ + + + (Step 3) Metagene analysis using riboSeqR. + + + R + readline + rpy2 + riboseqr_wrapper_deps + + riboseqr/metagene.py + --rdata_load "$rdata_load" + --selected_lengths "$selected_lengths" + --selected_frames "$selected_frames" + --hit_mean "$hit_mean" + --unique_hit_mean "$unique_hit_mean" + --ratio_check "$ratio_check" + --plot_lengths "$plot_lengths" + --min5p "$min5p" + --max5p "$max5p" + --min3p "$min3p" + --max3p "$max3p" + --cap "$cap" + --plot_title "$plot_title" + --rdata_save "$rdata_save" + --html_file "$html_file" + --output_path "$html_file.files_path" + + + + value.name == "Triplet periodicity (R data file)" + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Metagene Analysis +----------------- +riboSeqR version: ``1.0.5``. + +The input is the R data file from the previous step - Triplet periodicity. + +How to use? +----------- +Inputs +...... +#. Select *Triplet periodicity (R data file)* from the previous step. + +#. Specify length of ribosome footprint reads to be used in filtering +(lengths). Only these reads **will** be used in the analysis. + +#. Specify frames to consider. This information can be obtained +from the *Triplet periodicity (HTML report)*. + +.. class:: warningmark + +Please note that the frames specified should correspond to the +lengths of the reads. + +#. Under *plotCDS parameters*, input length of footprints to be considered for +generating the plot. + +#. Review/change other options if necessary and execute program. + +Outputs +....... +The following files will be generated on completion: + +#. Metagene analysis (HTML report) + +A HTML file with results and links to other output files - plots for +specified lengths (PDF) and R script used for the session. + +#. Metagene analysis (R data file) + +Used as input for the next step - *Plot Rribosome Profile*. + +riboSeqR functions used +....................... +``filterHits``. + +For detailed description of the functions and the options used, please consult +the riboSeqR documentation. + +Links +..... +`riboSeqR <http://bioconductor.org/packages/3.0/bioc/html/riboSeqR.html>`_. + + + + + @Manual{, + title = {riboSeqR: Analysis of sequencing data from ribosome + profiling experiments.}, + author = {Thomas J. Hardcastle}, + year = {2014}, + note = {R package version 1.0.5}, + } + + + diff -r 000000000000 -r c34c364ce75d prepare.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/prepare.xml Tue Jul 21 14:48:39 2015 +0100 @@ -0,0 +1,175 @@ + + + (Step 1) Prepare alignment file (SAM format, Ribo-Seq or RNA-Seq alignments) + for riboSeqR analysis. + + + R + readline + rpy2 + riboseqr_wrapper_deps + + + + + riboseqr/prepare.py + #set $files_ribo = [] + #set $files_rna = [] + #set $replicates = [] + + #for $i, $s in enumerate($rnaseq.datasets) + #silent $replicates.append($s.replicate_name.value) + #silent $files_ribo.append($s.ribo_files.file_name) + #try: + #silent $s.rna_files + #except: + #pass + #else: + #silent $files_rna.append($s.rna_files.file_name) + #end try + #end for + + #set $ribofiles = ", ".join($files_ribo) + #set $rnafiles = ", ".join($files_rna) + #set $replicate_names = ", ".join($replicates) + + --ribo_files "$ribofiles" + --rna_files "$rnafiles" + --replicate_names "$replicate_names" + --seqnames "$seqnames" + --rdata_save "$rdata_save" + --sam_format + --html_file "$html_file" + --output_path "$html_file.files_path" + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Prepare riboSeqR Input +---------------------- +riboSeqR version: ``1.0.5``. + +This tool can be used to prepare input data for riboSeqR from SAM +format alignments of Ribo or RNA-Seq data to a reference transcriptome. You can +do this alignment manually using bowtie or the +"Transcriptome Mapping" -> "Align to transcriptome using Bowtie" +tool on `RiboGalaxy <http://ribogalaxy.ucc.ie>`_. + +The required input format for riboSeqR is mentioned in the +*"Getting Data"* section of the documentation. + +How to use? +----------- +Inputs +...... +Select SAM format Ribo-Seq alignment files in the input section. + +If you have RNA-Seq data, these can be included if the *"Have RNA-Seq data"* +option is checked. + +Outputs +....... +The following files will be generated on completion: + +#. Prepare riboSeqR input (HTML report) - A HTML file with links to all other +output files. + +* Generated riboSeqR format input files of RiboSeq and RNASeq data(if provided). +These files are plain text and lines have the following information - +strand, transcript name, alignment position, sequence. + +Please note the alignments are made *0-indexed*. + +* R script used in this session. + +#. Prepare riboSeqR input (R data file) - used as input for the next step - +*Triplet Periodicity*. + +How are the SAM alignments processed? +..................................... +#. Lines starting with ``@`` are ignored. +#. Lines having a ``FLAG=0`` are considered as successful alignments. These are +considered for the next step. +#. Alignment start is located on ``column 4``. These are decremented by +1 as SAM alignments are 1-indexed. +#. riboSeqR input file is written with the strand (``+``), transcript name, +alignment start and the aligned sequence. + +riboSeqR functions used +....................... +``readRibodata``. + +For detailed description of the functions and the options used, please consult +the riboSeqR documentation. + +Links +..... +`riboSeqR <http://bioconductor.org/packages/3.0/bioc/html/riboSeqR.html>`_ + + + + + @Manual{, + title = {riboSeqR: Analysis of sequencing data from ribosome + profiling experiments.}, + author = {Thomas J. Hardcastle}, + year = {2014}, + note = {R package version 1.0.5}, + } + + + diff -r 000000000000 -r c34c364ce75d riboseqr/__init__.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/riboseqr/__init__.py Tue Jul 21 14:48:39 2015 +0100 @@ -0,0 +1,1 @@ +# -*- coding: utf-8 -*- \ No newline at end of file diff -r 000000000000 -r c34c364ce75d riboseqr/difftrans.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/riboseqr/difftrans.py Tue Jul 21 14:48:39 2015 +0100 @@ -0,0 +1,150 @@ +# -*- coding: utf-8 -*- +import os +import sys +import argparse +import logging +import rpy2.robjects as robjects +import utils + +rscript = '' +R = robjects.r + + +def run_rscript(command=None): + """Run R command, log it, append to rscript""" + global rscript + if not command: + return + logging.debug(command) + rscript += '{}\n'.format(command) + output = R(command) + return output + + +def get_counts(rdata_load='Metagene.rda', slice_lengths='27', + frames='', group1=None, group2=None, num_counts=10, + normalize='FALSE', html_file='Counts.html', + output_path='counts'): + options = {'slice_lengths': utils.process_args( + slice_lengths, ret_type='int', ret_mode='charvector')} + + if not num_counts: + num_counts = 10 + + options['frames'] = utils.process_args( + frames, ret_type='int', ret_mode='listvector') + + run_rscript('suppressMessages(library(riboSeqR))') + run_rscript('load("{}")'.format(rdata_load)) + + cmd_args = 'ffCs, lengths={slice_lengths}'.format(**options) + if frames: + cmd_args += ', frames={frames}'.format(**options) + + run_rscript("""riboCounts <- sliceCounts({}) + annotation <- as.data.frame(ffCs@CDS) + rownames(riboCounts) <- annotation$seqnames + colnames(riboCounts) <- names(riboDat@riboGR)""".format(cmd_args)) + + run_rscript("""mrnaCounts <- rnaCounts(riboDat, ffCs@CDS) + rownames(mrnaCounts) <- annotation$seqnames""") + + if not os.path.exists(output_path): + os.mkdir(output_path) + + html = '

Differential Translation Analysis


' + for count_name, file_name, legend in ( + ('riboCounts', 'RiboCounts.csv', 'Ribo-Seq counts'), + ('mrnaCounts', 'RNACounts.csv', 'RNA-Seq counts'), + ('tC', 'TopCounts.csv', 'baySeq topCounts')): + if count_name == 'tC' and R['riboCounts'] and R['mrnaCounts']: + run_rscript('suppressMessages(library(baySeq))') + cmd = """pD <- new("countData", replicates=ffCs@replicates, \ + data=list(riboCounts, mrnaCounts), groups=list(NDT={0}, DT={1}), \ + annotation=as.data.frame(ffCs@CDS), \ + densityFunction=bbDensity)""".format( + utils.process_args( + group1, ret_type='int', ret_mode='charvector'), + utils.process_args( + group2, ret_type='str', ret_mode='charvector')) + run_rscript(cmd) + + run_rscript('libsizes(pD) <- getLibsizes(pD)') + run_rscript('pD <- getPriors(pD, cl=NULL)') + run_rscript('pD <- getLikelihoods(pD, cl=NULL)') + run_rscript('tC <- topCounts(pD, "DT", normaliseData={}, ' + 'number={})'.format(normalize, num_counts)) + + if R[count_name]: + html += '

{}

'.format(legend) + output_file = os.path.join(output_path, file_name) + run_rscript('write.csv({}, file="{}")'.format( + count_name, output_file)) + with open(output_file) as g: + html += '' + header = g.readline() + html += '' + for item in header.strip().split(","): + html += ''.format(item.strip('"')) + html += '' + for line in g: + html += '' + data = line.strip().split(",") + # html += ''.format(data[0].strip('"')) + for item in data: + html += (''.format(item.strip('"'))) + html += '' + html += ('
{}
{}{}' + '

Download: ' + '{0}


'.format(file_name)) + + with open(os.path.join(output_path, 'counts.R'), 'w') as r: + r.write(rscript) + + html += ('

R script for this session

' + '

Download: counts.R

') + + with open(html_file, 'w') as f: + f.write(html) + + +if __name__ == '__main__': + parser = argparse.ArgumentParser(description='Get Ribo and RNA-Seq counts') + + # required arguments + flags = parser.add_argument_group('required arguments') + flags.add_argument('--rdata_load', required=True, + help='Saved riboSeqR data from Step 2') + flags.add_argument('--slice_lengths', + help='Lengths of ribosome footprints to inform count ' + 'data', default='27,28', required=True) + flags.add_argument('--group1', help='Group model for baySeq', required=True) + flags.add_argument('--group2', help='Group model for baySeq', required=True) + flags.add_argument('--frames', + help='Frames of ribosome footprints (relative to ' + 'coding start site). If omitted, all frames ' + 'are used.', required=True) + + # optional arguments + parser.add_argument( + '--num_counts', help='How many results to return? (topCounts)') + parser.add_argument('--normalize', help='Normalize data?', + choices=['TRUE', 'FALSE'], default='FALSE') + parser.add_argument('--html_file', help='HTML file with reports') + parser.add_argument('--output_path', help='Directory to save output files') + parser.add_argument( + '--debug', help='Produce debug output', action='store_true') + + args = parser.parse_args() + if args.debug: + logging.basicConfig(format='%(levelname)s - %(message)s', + level=logging.DEBUG, stream=sys.stdout) + logging.debug('Supplied Arguments\n{}\n'.format(vars(args))) + + if not os.path.exists(args.output_path): + os.mkdir(args.output_path) + + get_counts(rdata_load=args.rdata_load, slice_lengths=args.slice_lengths, + frames=args.frames, group1=args.group1, group2=args.group2, + num_counts=args.num_counts, normalize=args.normalize, + html_file=args.html_file, output_path=args.output_path) diff -r 000000000000 -r c34c364ce75d riboseqr/metagene.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/riboseqr/metagene.py Tue Jul 21 14:48:39 2015 +0100 @@ -0,0 +1,209 @@ +#!/usr/bin/env python +import os +import sys +import glob +import argparse +import logging +import rpy2.robjects as robjects + +import utils + +rscript = '' +R = robjects.r + + +def run_rscript(command=None): + """Run R command, log it, append to rscript""" + global rscript + if not command: + return + logging.debug(command) + rscript += '{}\n'.format(command) + output = R(command) + return output + + +def do_analysis( + rdata_load='Periodicity.rda', selected_lengths='27', + selected_frames='', hit_mean='10', unique_hit_mean='1', + ratio_check='TRUE', min5p='-20', max5p='200', min3p='-200', max3p='20', + cap='', plot_title='', plot_lengths='27', rdata_save='Metagene.rda', + html_file='Metagene-report.html', output_path=os.getcwd()): + """Metagene analysis from saved periodicity R data file. """ + run_rscript('suppressMessages(library(riboSeqR))') + run_rscript('load("{}")'.format(rdata_load)) + + logging.debug('fS\n{}\nfCs\n{}\n'.format(R['fS'], R['fCs'])) + options = {} + for key, value, rtype, rmode in ( + ('lengths', selected_lengths, 'int', 'charvector'), + ('frames', selected_frames, 'int', 'listvector'), + ('hit_mean', hit_mean, 'int', None), + ('unique_hit_mean', unique_hit_mean, 'int', None), + ('ratio_check', ratio_check, 'bool', None), + ('min5p', min5p, 'int', None), ('max5p', max5p, 'int', None), + ('min3p', min3p, 'int', None), ('max3p', max3p, 'int', None), + ('cap', cap, 'int', None), + ('plot_title', plot_title, 'str', 'charvector'), + ('plot_lengths', plot_lengths, 'int', 'list')): + options[key] = utils.process_args( + value, ret_type=rtype, ret_mode=rmode) + + cmd_args = """fCs, lengths={lengths}, + frames={frames}, hitMean={hit_mean}, + unqhitMean={unique_hit_mean}, fS=fS""".format(**options) + + if ratio_check == 'TRUE': + cmd_args += ', ratioCheck = TRUE' + + run_rscript('ffCs <- filterHits({})'.format(cmd_args)) + logging.debug("ffCs\n{}\n".format(R['ffCs'])) + + cds_args = ('coordinates=ffCs@CDS, riboDat=riboDat, min5p={min5p}, ' + 'max5p={max5p}, min3p={min3p}, max3p={max3p}'.format(**options)) + + if options['cap']: + cds_args += ', cap={cap}'.format(**options) + + if options['plot_title']: + cds_args += ', main={plot_title}'.format(**options) + + html = """ + + + Metagene Analysis - Report + + + """ + html += '

Metagene analysis - results

\n
\n' + html += ('

\nLengths of footprints used in analysis - ' + '{0}
\nLengths of footprints ' + 'selected for the plot - {1}' + '\n

\n'.format(selected_lengths, plot_lengths)) + for count, length in enumerate(options['plot_lengths']): + count += 1 + html += '

Length: {0}

\n'.format(length) + plot_file = os.path.join(output_path, + 'Metagene-analysis-plot{0}'.format(count)) + for fmat in ('pdf', 'png'): + if fmat == 'png': + cmd = 'png(file="{0}_%1d.png", type="cairo")' + else: + cmd = 'pdf(file="{0}.pdf")' + run_rscript(cmd.format(plot_file)) + run_rscript('plotCDS({0},{1})'.format( + cds_args, 'lengths={}'.format(length))) + run_rscript('dev.off()') + for image in sorted( + glob.glob('{}*.png'.format(plot_file))): + html += '

{0}

\n'.format( + os.path.basename(image)) + html += '

PDF version

\n'.format( + os.path.basename(plot_file)) + run_rscript('save("ffCs", "riboDat", "fastaCDS", file="{}", ' + 'compress=FALSE)'.format(rdata_save)) + + logging.debug('\n{:#^80}\n{}\n{:#^80}\n'.format( + ' R script for this session ', rscript, ' End R script ')) + + with open(os.path.join(output_path, 'metagene.R'), 'w') as r: + r.write(rscript) + + html += ('

R script for this session

\n' + '

metagene.R

\n' + '

Next step: Plot Ribosome profile

\n' + '\n\n') + + with open(html_file, 'w') as f: + f.write(html) + +if __name__ == '__main__': + + parser = argparse.ArgumentParser(description='Metagene analysis') + + # required arguments + flags = parser.add_argument_group('required arguments') + flags.add_argument('--rdata_load', required=True, + help='Saved riboSeqR data from Periodicity step.') + flags.add_argument('--selected_lengths', required=True, + help='Select frame lengths to filter. Comma-separated', + default='27') + flags.add_argument( + '--selected_frames', required=True, + help='Select frames corresponding to frame lengths. Comma-separated') + + flags.add_argument( + '--hit_mean', required=True, + help='Mean number of hits within the replicate group for filtering', + default='10') + + flags.add_argument( + '--unique_hit_mean', required=True, + help='Mean number of unique sequences within the replicate group ' + 'for filtering', default='1') + + parser.add_argument( + '--rdata_save', help='File to write R data to (default: %(default)s)', + default='Metagene.rda') + + parser.add_argument( + '--ratio_check', + help='Check the ratios of the expected phase to maximal phase ' + 'within the putative coding sequence (default: %(default)s)', + choices=['TRUE', 'FALSE'], default='TRUE') + + parser.add_argument( + '--plot_lengths', + help='Length of footprints to be plotted. Multiple values should be ' + 'comma-separated. In that case, multiple plots will be produced' + '(default: %(default)s)', default='27') + + parser.add_argument( + '--min5p', + help='The distance upstream of the translation start to be plotted ' + '(default: %(default)s)', default='-20') + + parser.add_argument( + '--max5p', + help='The distance downstream of the translation start to be plotted ' + '(default: %(default)s)', default='200') + + parser.add_argument( + '--min3p', + help='The distance upstream of the translation end to be plotted ' + '(default: %(default)s)', default='-200') + + parser.add_argument( + '--max3p', + help='The distance downtream of the translation end to be plotted ' + '(default: %(default)s)', default='20') + + parser.add_argument( + '--cap', help='If given, caps the height of plotted values ' + '(default: %(default)s)') + + parser.add_argument('--plot_title', help='Title of the plot', default='') + parser.add_argument('--html_file', help='HTML file with reports') + parser.add_argument('--output_path', help='Directory to save output files') + parser.add_argument( + '--debug', help='Produce debug output', action='store_true') + + args = parser.parse_args() + if args.debug: + logging.basicConfig(format='%(module)s: %(levelname)s - %(message)s', + level=logging.DEBUG, stream=sys.stdout) + logging.debug('Supplied Arguments\n{}\n'.format(vars(args))) + + if not os.path.exists(args.output_path): + os.mkdir(args.output_path) + + do_analysis( + rdata_load=args.rdata_load, selected_lengths=args.selected_lengths, + selected_frames=args.selected_frames, hit_mean=args.hit_mean, + unique_hit_mean=args.unique_hit_mean, ratio_check=args.ratio_check, + min5p=args.min5p, max5p=args.max5p, min3p=args.min3p, max3p=args.max3p, + cap=args.cap, plot_title=args.plot_title, + plot_lengths=args.plot_lengths, rdata_save=args.rdata_save, + html_file=args.html_file, output_path=args.output_path) + + logging.debug('Done!') diff -r 000000000000 -r c34c364ce75d riboseqr/prepare.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/riboseqr/prepare.py Tue Jul 21 14:48:39 2015 +0100 @@ -0,0 +1,214 @@ +#!/usr/bin/env python +import os +import sys +import argparse +import logging +import rpy2.robjects as robjects + +import utils + +rscript = '' +R = robjects.r + + +def run_rscript(command=None): + """Run R command, log it, append to rscript""" + global rscript + if not command: + return + logging.debug(command) + rscript += '{}\n'.format(command) + output = R(command) + return output + + +def prep_riboseqr_input(sam_file, output_file): + """Generate input file for riboSeqR from SAM format file.""" + with open(output_file, 'w') as f: + for line in open(sam_file): + if line.startswith('@'): + continue + line = line.split() + flag = line[1] + + if flag != '0': + continue + # make start 0-indexed, sam alignments are 1-indexed + start = int(line[3]) - 1 + (name, sequence) = (line[2], line[9]) + f.write('"+"\t"{0}"\t{1}\t"{2}"\n'.format(name, start, sequence)) + + +def batch_process(sam_files, seq_type, output_path): + """Batch process the conversion of SAM format files -> riboSeqR format + input files. + + Files are saved with file names corresponding to their sequence type. + + """ + outputs = [] + prefix = '{}' + if seq_type == 'riboseq': + prefix = 'RiboSeq file {}' + elif seq_type == 'rnaseq': + prefix = 'RNASeq file {}' + + for count, fname in enumerate(sam_files): + count += 1 + out_file = os.path.join(output_path, prefix.format(count)) + logging.debug('Processing: {}'.format(fname)) + logging.debug('Writing output to: {}'.format(out_file)) + prep_riboseqr_input(fname, out_file) + outputs.append(out_file) + return outputs + + +def generate_ribodata(ribo_files='', rna_files='', replicate_names='', + seqnames='', rdata_save='Prepare.rda', sam_format=True, + html_file='Prepare-report.html', output_path=os.getcwd()): + """Prepares Ribo and RNA seq data in the format required for riboSeqR. Calls + the readRibodata function of riboSeqR and saves the result objects in an + R data file which can be used as input for the next step. + + """ + input_ribo_files = utils.process_args(ribo_files, ret_mode='list') + logging.debug('Found {} Ribo-Seq files'.format(len(input_ribo_files))) + logging.debug(input_ribo_files) + + input_rna_files = [] + if rna_files: + input_rna_files = utils.process_args(rna_files, ret_mode='list') + logging.debug('Found {} RNA-Seq files'.format(len(input_rna_files))) + logging.debug(input_rna_files) + + replicates = utils.process_args(replicate_names, ret_mode='charvector') + logging.debug('Replicates: {}\n'.format(replicates)) + + if sam_format: + ribo_seq_files = batch_process(input_ribo_files, 'riboseq', output_path) + else: + ribo_seq_files = input_ribo_files + + html = '

Prepare riboSeqR input - results


' + if len(ribo_seq_files): + html += '

Generated riboSeqR format input files ' \ + '(RiboSeq)

' + for fname in ribo_seq_files: + html += '{0}
'.format( + os.path.basename(fname)) + html += '

' + + rna_seq_files = [] + if len(input_rna_files): + if sam_format: + rna_seq_files = batch_process( + input_rna_files, 'rnaseq', output_path) + else: + rna_seq_files = input_rna_files + + if len(rna_seq_files): + html += ('

Generated riboSeqR format input files ' + '(RNASeq)

') + for fname in rna_seq_files: + html += '{0}
'.format( + os.path.basename(fname)) + html += '

' + + input_seqnames = utils.process_args(seqnames, ret_mode='charvector') + options = {'ribo_seq_files': 'c({})'.format(str(ribo_seq_files)[1:-1]), + 'rna_seq_files': 'c({})'.format(str(rna_seq_files)[1:-1]), + 'input_replicates': replicates, + 'input_seqnames': input_seqnames} + + logging.debug('{}'.format(R('sessionInfo()'))) + + script = '' + cmd = 'suppressMessages(library(riboSeqR))' + run_rscript(cmd) + script += '{}\n'.format(cmd) + + if len(rna_seq_files): + cmd_args = ('riboFiles={ribo_seq_files}, ' + 'rnaFiles={rna_seq_files}'.format(**options)) + else: + cmd_args = 'riboFiles={ribo_seq_files}'.format(**options) + + if input_seqnames: + cmd_args += ', seqnames={input_seqnames}'.format(**options) + if replicates: + cmd_args += ', replicates={input_replicates}'.format(**options) + else: + cmd_args += ', replicates=c("")' + cmd = 'riboDat <- readRibodata({0})'.format(cmd_args) + run_rscript(cmd) + script += '{}\n'.format(cmd) + + ribo_data = R['riboDat'] + logging.debug('riboDat \n{}\n'.format(ribo_data)) + cmd = 'save("riboDat", file="{}", compress=FALSE)'.format(rdata_save) + run_rscript(cmd) + script += '{}\n'.format(cmd) + + msg = '\n{:#^80}\n{}\n{:#^80}\n'.format( + ' R script for this session ', script, ' End R script ') + logging.debug(msg) + + with open(os.path.join(output_path, 'prepare.R'), 'w') as r: + r.write(script) + + html += ('

R script for this session

' + '

prepare.R

' + '

Next step: Triplet periodicity

') + + with open(html_file, 'w') as f: + f.write(html) + + return ribo_data + + +if __name__ == '__main__': + + description = ( + 'Prepare riboSeqR input file from SAM format RNA/Ribo-Seq alignment.') + parser = argparse.ArgumentParser(description=description) + + # required arguments + flags = parser.add_argument_group('required arguments') + flags.add_argument('--ribo_files', required=True, + help='List of Ribo-Seq files, comma-separated') + # optional arguments + parser.add_argument('--rna_files', + help='List of RNA-Seq files. Comma-separated') + parser.add_argument('--replicate_names', + help='Replicate names, comma-separated') + parser.add_argument('--seqnames', + help='Transcript (seqname) names to be read') + parser.add_argument('--rdata_save', + help='File to write RData to (default: %(default)s)', + default='Prepare.rda') + parser.add_argument( + '--sam_format', + help='Flag. Input is in SAM format', action='store_true') + parser.add_argument('--html_file', help='Output file for results (HTML)') + parser.add_argument('--output_path', + help='Files are saved in this directory') + parser.add_argument('--debug', help='Flag. Produce debug output', + action='store_true') + args = parser.parse_args() + + if args.debug: + logging.basicConfig(format='%(module)s: %(levelname)s - %(message)s', + level=logging.DEBUG, stream=sys.stdout) + logging.debug('Supplied Arguments: {}'.format(vars(args))) + + if not os.path.exists(args.output_path): + os.mkdir(args.output_path) + + generate_ribodata( + ribo_files=args.ribo_files, rna_files=args.rna_files, + replicate_names=args.replicate_names, seqnames=args.seqnames, + rdata_save=args.rdata_save, + sam_format=args.sam_format, html_file=args.html_file, + output_path=args.output_path + ) + logging.debug('Done') diff -r 000000000000 -r c34c364ce75d riboseqr/ribosome_profile.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/riboseqr/ribosome_profile.py Tue Jul 21 14:48:39 2015 +0100 @@ -0,0 +1,127 @@ +#!/usr/bin/env python +import os +import sys +import glob +import argparse +import logging +import rpy2.robjects as robjects +import utils + +rscript = '' +R = robjects.r + + +def run_rscript(command=None): + """Run R command, log it, append to rscript""" + global rscript + if not command: + return + logging.debug(command) + rscript += '{}\n'.format(command) + msg = R(command) + + +def plot_transcript(rdata_load='Metagene.rda', transcript_name='', + transcript_length='27', transcript_cap='', + html_file='Plot-ribosome-profile.html', + output_path=os.getcwd()): + """Plot ribosome profile for a given transcript. """ + options = {} + for key, value, rtype, rmode in ( + ('transcript_name', transcript_name, 'str', None), + ('transcript_length', transcript_length, 'int', 'charvector'), + ('transcript_cap', transcript_cap, 'int', None)): + options[key] = utils.process_args(value, ret_type=rtype, ret_mode=rmode) + + run_rscript('suppressMessages(library(riboSeqR))') + run_rscript('load("{}")'.format(rdata_load)) + + html = """ + + + Ribosome Profile Plot - Report + + + """ + html += '

Plot ribosome profile - results

\n
\n' + if len(transcript_name): + cmd_args = ( + '"{transcript_name}", main="{transcript_name}",' + 'coordinates=ffCs@CDS, riboData=riboDat,' + 'length={transcript_length}'.format(**options)) + if transcript_cap: + cmd_args += ', cap={transcript_cap}'.format(**options) + plot_file = os.path.join(output_path, 'Ribosome-profile-plot') + for fmat in ('pdf', 'png'): + if fmat == 'png': + cmd = 'png(file="{}_%1d.png", type="cairo")'.format(plot_file) + else: + cmd = 'pdf(file="{}.pdf")'.format(plot_file) + run_rscript(cmd) + cmd = 'plotTranscript({})'.format(cmd_args) + run_rscript(cmd) + run_rscript('dev.off()') + + html += ('

Selected ribosome footprint length: ' + '{0}\n'.format(transcript_length)) + + for image in sorted(glob.glob('{}_*.png'.format(plot_file))): + html += '

{0}

\n'.format( + os.path.basename(image)) + html += '

PDF version

\n' + else: + msg = 'No transcript name was provided. Did not generate plot.' + html += '

{}

'.format(msg) + logging.debug(msg) + + logging.debug('\n{:#^80}\n{}\n{:#^80}\n'.format( + ' R script for this session ', rscript, ' End R script ')) + + with open(os.path.join(output_path, 'ribosome-profile.R'), 'w') as r: + r.write(rscript) + + html += ('

R script for this session

\n' + '

ribosome-profile.R

\n' + '\n\n') + + with open(html_file, 'w') as f: + f.write(html) + + +if __name__ == '__main__': + parser = argparse.ArgumentParser(description='Plot Ribosome profile') + + # required arguments + flags = parser.add_argument_group('required arguments') + flags.add_argument('--rdata_load', required=True, + help='Saved riboSeqR data from Step 2') + flags.add_argument('--transcript_name', required=True, + help='Name of the transcript to be plotted') + flags.add_argument( + '--transcript_length', required=True, + help='Size class of ribosome footprint data to be plotted', + default='27') + flags.add_argument( + '--transcript_cap', required=True, + help=('Cap on the largest value that will be plotted as an abundance ' + 'of the ribosome footprint data')) + parser.add_argument('--html_file', help='HTML file with reports') + parser.add_argument('--output_path', help='Directory to save output files') + parser.add_argument('--debug', help='Produce debug output', + action='store_true') + + args = parser.parse_args() + if args.debug: + logging.basicConfig(format='%(levelname)s - %(message)s', + level=logging.DEBUG, stream=sys.stdout) + logging.debug('Supplied Arguments\n{}\n'.format(vars(args))) + + if not os.path.exists(args.output_path): + os.mkdir(args.output_path) + + plot_transcript(rdata_load=args.rdata_load, + transcript_name=args.transcript_name, + transcript_length=args.transcript_length, + transcript_cap=args.transcript_cap, + html_file=args.html_file, output_path=args.output_path) + logging.debug('Done!') diff -r 000000000000 -r c34c364ce75d riboseqr/triplet.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/riboseqr/triplet.py Tue Jul 21 14:48:39 2015 +0100 @@ -0,0 +1,158 @@ +#!/usr/bin/env python +import os +import sys +import argparse +import logging +import rpy2.robjects as robjects + +import utils + +rscript = '' +R = robjects.r + + +def run_rscript(command=None): + """Run R command, log it, append to rscript""" + global rscript + if not command: + return + logging.debug(command) + rscript += '{}\n'.format(command) + output = R(command) + return output + + +def find_periodicity( + rdata_load='Prepare.rda', start_codons='ATG', stop_codons='TAG,TAA,TGA', + fasta_file=None, include_lengths='25:30', analyze_plot_lengths='26:30', + text_legend='Frame 0, Frame 1, Frame 2', rdata_save='Periodicity.rda', + html_file='Periodicity-report.html', output_path=os.getcwd()): + """Plot triplet periodicity from prepared R data file. """ + logging.debug('{}'.format(R('sessionInfo()'))) + cmd = 'suppressMessages(library(riboSeqR))' + run_rscript(cmd) + + logging.debug('Loading saved R data file') + cmd = 'load("{}")'.format(rdata_load) + run_rscript(cmd) + + # R("""options(showTailLines=Inf)""") + starts, stops = (utils.process_args(start_codons, ret_mode='charvector'), + utils.process_args(stop_codons, ret_mode='charvector')) + + cmd = ('fastaCDS <- findCDS(fastaFile={0!r}, startCodon={1}, ' + 'stopCodon={2})'.format(fasta_file, starts, stops)) + run_rscript(cmd) + + logging.debug('Potential coding sequences using start codon (ATG) and ' + 'stop codons TAG, TAA, TGA') + logging.debug('{}\n'.format(R['fastaCDS'])) + + cmd = """fCs <- frameCounting(riboDat, fastaCDS, lengths={0}) + fS <- readingFrame(rC=fCs, lengths={1}); fS""".\ + format(include_lengths, analyze_plot_lengths) + run_rscript(cmd) + + logging.debug('riboDat \n{}\n'.format(R['riboDat'])) + logging.debug('fCs\n{0}\n'.format(R['fCs'])) + logging.debug('Reading frames for each n-mer\n{}'.format(R['fS'])) + + legend = utils.process_args(text_legend, ret_mode='charvector') + + for fmat in ('pdf', 'png'): + if fmat == 'png': + cmd = '{0}(file="{1}", type="cairo")' + else: + cmd = '{0}(file="{1}")' + run_rscript(cmd.format(fmat, os.path.join( + output_path, '{0}.{1}'.format('Periodicity-plot', fmat)))) + run_rscript('plotFS(fS, legend.text = {0})'.format(legend)) + run_rscript('dev.off()') + + run_rscript('save("fCs", "fS", "riboDat", "fastaCDS", ' + 'file="{}", compress=FALSE)'.format(rdata_save)) + + html = '

Triplet periodicity - results


' + html += ('

Results of reading frame analysis

' + '
{}

'.format(R['fS'])) + html += ('

Lengths used for reading frame analysis - {0}' + '
Lengths selected for the plot - {1}' + '

'.format(include_lengths, analyze_plot_lengths)) + html += ('

' + '
PDF version

') + + logging.debug('\n{:#^80}\n{}\n{:#^80}\n'.format( + ' R script for this session ', rscript, ' End R script ')) + + with open(os.path.join(output_path, 'periodicity.R'), 'w') as r: + r.write(rscript) + + html += ('

R script for this session

' + '

periodicity.R

' + '

Next step: Metagene analysis

') + + with open(html_file, 'w') as f: + f.write(html) + + +if __name__ == '__main__': + + parser = argparse.ArgumentParser( + description='Plot triplet periodicity for different read lengths.') + + # required arguments + flags = parser.add_argument_group('required arguments') + flags.add_argument( + '--rdata_load', required=True, + help='riboSeqR data from input preparation step (Prepare.rda)') + flags.add_argument( + '--fasta_file', required=True, + help='FASTA file of the reference transcriptome') + + # optional arguments + parser.add_argument( + '--start_codons', + help='Start codon(s) to use. (default: %(default)s)', default='ATG') + parser.add_argument( + '--stop_codons', help='Stop codon(s) to use. (default: %(default)s)', + default='TAG, TAA, TGA') + parser.add_argument( + '--include_lengths', + help='Lengths of ribosome footprints to be included ' + '(default: %(default)s)', default='25:30') + parser.add_argument( + '--analyze_plot_lengths', + help='Lenghts of reads to be analyzed for frame-shift and plotting ' + '(default: %(default)s)', default='26:30') + parser.add_argument( + '--text_legend', + help='Text for legend used in the plot (default: %(default)s)', + default='Frame 0, Frame 1, Frame 2') + parser.add_argument( + '--rdata_save', help='File to write RData to (default: %(default)s)', + default='Periodicity.rda') + parser.add_argument('--html_file', help='Output file for results (HTML)') + parser.add_argument('--output_path', + help='Files are saved in this directory') + parser.add_argument( + '--debug', help='Produce debug output', action='store_true') + + args = parser.parse_args() + if args.debug: + logging.basicConfig(format='%(module)s: %(levelname)s - %(message)s', + level=logging.DEBUG, stream=sys.stdout) + logging.debug('Supplied Arguments\n{}\n'.format(vars(args))) + + if not os.path.exists(args.output_path): + os.mkdir(args.output_path) + + find_periodicity( + rdata_load=args.rdata_load, start_codons=args.start_codons, + stop_codons=args.stop_codons, fasta_file=args.fasta_file, + include_lengths=args.include_lengths, + analyze_plot_lengths=args.analyze_plot_lengths, + text_legend=args.text_legend, + rdata_save=args.rdata_save, html_file=args.html_file, + output_path=args.output_path) +logging.debug("Done!") diff -r 000000000000 -r c34c364ce75d riboseqr/utils.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/riboseqr/utils.py Tue Jul 21 14:48:39 2015 +0100 @@ -0,0 +1,93 @@ +"""Common functions""" + + +def process_args(args, ret_type='str', ret_mode=None): + """Split arguments (only strings) on comma, return in requested + + ret_type + (str, int or bool) + + as + + ret_mode + char vector - c(1,2,3) + list vector - list(1,2,3) + list - as list [1,2,3] + None - same as input + + """ + if args: + all_args = [item.strip() for item in args.split(',')] + else: + all_args = [] + num_options = len(all_args) + if num_options == 1 and len(all_args[0]): + if ret_type == 'int': + option = int(all_args[0]) + if ret_mode == 'charvector': + # if option: + return 'c({})'.format(option) + # else: + # return 'c()' + elif ret_mode == 'listvector': + # if option: + return 'list({})'.format(option) + # else: + # return 'list()' + elif ret_mode == 'list': + # if option: + return [option] + # else: + # return [] + else: + return option + else: + # str, bool + option = all_args[0] + if ret_mode == 'charvector': + # if len(option): + return 'c("{}")'.format(option) + # else: + # return 'c("")' + elif ret_mode == 'listvector': + # if option: + return 'list("{}")'.format(option) + # else: + # return 'list("")' + elif ret_mode == 'list': + # if option: + return [option] + # else: + # return [] + else: + return option + elif num_options > 1: + if ret_type == 'int': + options = tuple([int(item) for item in all_args]) + if ret_mode == 'charvector': + # if len(options): + return 'c{}'.format(options) + # else: + # return 'c()' + elif ret_mode == 'listvector': + # if len(options): + return 'list{}'.format(options) + # else: + # return 'list()' + elif ret_mode == 'list': + return list(options) + else: + options = tuple(all_args) + if ret_mode == 'charvector': + # if len(options): + return 'c{}'.format(options) + elif ret_mode == 'listvector': + return 'list{}'.format(options) + elif ret_mode == 'list': + # if len(all_args): + return all_args + # else: + # return [] + else: + # as original with spaces stripped + return ','.join(all_args) diff -r 000000000000 -r c34c364ce75d ribosome_profile.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/ribosome_profile.xml Tue Jul 21 14:48:39 2015 +0100 @@ -0,0 +1,128 @@ + + + (Step 4) Plot Ribosome profile using riboSeqR. + + + R + readline + rpy2 + riboseqr_wrapper_deps + + + + + riboseqr/ribosome_profile.py + --rdata_load "$rdata_load" + --transcript_name "$transcript_name" + --transcript_length "$transcript_length" + --transcript_cap "$transcript_cap" + --html_file "$html_file" + --output_path "$html_file.files_path" + + + + value.name == "Metagene analysis (R data file)" + + + + len(value.split(',')) == 1 + + + + + + + + + + len(value.split(',')) == 1 + len(value.split(' ')) == 1 + + + + + + + + + + + + + + + + + + + + + + + + +Plot Ribosome Profile +--------------------- +riboSeqR version: ``1.0.5``. + +This tool can be used for generating a ribosome profile plot for a given +transcript. The input is the R data file from the previous +step - Metagene analysis. + +How to use? +----------- +Inputs +...... +Select *Metagene analysis (R data file)* from the previous step, enter name +of the transcript to plot, review/change parameters and execute program. + +.. class:: warningmark + + The transcript name should correspond to names used in SAM alignments + and FASTA file of the transcriptome. + +Outputs +....... +The following files will be generated on completion: + +Plot ribosome profile (HTML report) + +A HTML file with results and links to other output files - plot for the +specified transcript (PDF) and R script used for the session. + +riboSeqR functions used +....................... +``plotTranscript``. + +For detailed description of the functions and the options used, please consult +the riboSeqR documentation. + +Links +..... +`riboSeqR <http://bioconductor.org/packages/3.0/bioc/html/riboSeqR.html>`_. + + + + + @Manual{, + title = {riboSeqR: Analysis of sequencing data from ribosome + profiling experiments.}, + author = {Thomas J. Hardcastle}, + year = {2014}, + note = {R package version 1.0.5}, + } + + + diff -r 000000000000 -r c34c364ce75d tests/__init__.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tests/__init__.py Tue Jul 21 14:48:39 2015 +0100 @@ -0,0 +1,1 @@ +# -*- coding: utf-8 -*- \ No newline at end of file diff -r 000000000000 -r c34c364ce75d tests/test_riboseqr.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tests/test_riboseqr.py Tue Jul 21 14:48:39 2015 +0100 @@ -0,0 +1,58 @@ +"""riboSeqR Galaxy unit tests""" +import unittest +from riboseqr import utils + + +class PrepareTestCase(unittest.TestCase): + + def test_process_args(self): + """Test processing arguments. """ + # "ATG" -> c("ATG") + rs = utils.process_args('ATG', ret_mode='charvector') + self.assertEqual(rs, 'c("ATG")','Return string as a character vector.') + + # stop codons "TAG,TAA,TGA" -> c("TAG", "TAA", "TGA"). Also + # replicate names, seqnames. + rs = utils.process_args('TAG,TAA,TGA', ret_mode='charvector') + self.assertEqual( + rs, "c('TAG', 'TAA', 'TGA')", + 'Return comma separated strings as a character vector.') + + # "" -> None + rs = utils.process_args('') + self.assertIsNone(rs, 'Return empty string as None.') + + # "27,28" -> c(27, 28) + rs = utils.process_args("27,28", ret_type='int', ret_mode='charvector') + self.assertEqual( + rs, "c(27, 28)", 'Return number strings as a character vector.') + + # "27,28" -> [27, 28] + rs = utils.process_args("27,28", ret_type='int', ret_mode='list') + self.assertEqual(rs, [27, 28], 'Return number strings as a list.') + + # "0,2" -> list(0,2) + rs = utils.process_args("0,2", ret_type='int', ret_mode='listvector') + self.assertEqual( + rs, "list(0, 2)", 'Return number strings as a list vector.') + + # "50" -> 50 + rs = utils.process_args("50", ret_type='int') + self.assertEqual(rs, 50, 'Return number string as a number.') + + # "-200" -> -200 + rs = utils.process_args("-200", ret_type='int') + self.assertEqual(rs, -200, 'Return number string as a number.') + + # "TRUE" -> TRUE + rs = utils.process_args("TRUE", ret_type='bool') + self.assertEqual(rs, 'TRUE', 'Return bool string as bool.') + + # 'chlamy17,chlamy3' -> 'chlamy17,chlamy3' for ribo and rna names + rs = utils.process_args('chlamy17,chlamy3') + self.assertEqual(rs, 'chlamy17,chlamy3', 'Return csv string as string.') + + # 'chlamy17.idx, chlamy3.idx' -> ['chlamy17.idx', 'chlamy3.idx'] + rs = utils.process_args('chlamy17.idx, chlamy3.idx', ret_mode='list') + self.assertEqual(rs, ['chlamy17.idx', 'chlamy3.idx'], + 'Return files as a list.') diff -r 000000000000 -r c34c364ce75d tool_dependencies.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tool_dependencies.xml Tue Jul 21 14:48:39 2015 +0100 @@ -0,0 +1,100 @@ + + + + + + + + + + + + + https://pypi.python.org/packages/source/r/rpy2/rpy2-2.3.10.tar.gz + + + + + + + + + + + export PYTHONPATH=$PYTHONPATH:$INSTALL_DIR/lib/python && + export LDFLAGS="-L$READLINE_LIB_PATH -lreadline" && + export CPPFLAGS="-I$READLINE_INCLUDE_PATH" && + python setup.py build --r-home $R_HOME --r-home-lib $R_HOME/lib install --install-lib $INSTALL_DIR/lib/python + + + + $INSTALL_DIR/lib/python + + + $ENV[READLINE_LIB_PATH] + + + $ENV[RHOME]/lib + + + $ENV[R_ROOT_DIR]/lib + + + + + rpy2 is a redesign and rewrite of rpy. It is providing a low-level interface to R, + a proposed high-level interface, including wrappers to graphical libraries, + as well as R-like structures and functions. + + http://rpy.sourceforge.net/rpy2.html + + + + + + + + + + + https://github.com/vimalkumarvelayudhan/riboseqr_wrapper_deps/raw/master/0.4.0/BiocGenerics_0.12.1.tar.gz?raw=true + + + https://github.com/vimalkumarvelayudhan/riboseqr_wrapper_deps/raw/master/0.4.0/S4Vectors_0.4.0.tar.gz?raw=true + + + https://github.com/vimalkumarvelayudhan/riboseqr_wrapper_deps/raw/master/0.4.0/IRanges_2.0.1.tar.gz?raw=true + + + https://github.com/vimalkumarvelayudhan/riboseqr_wrapper_deps/raw/master/0.4.0/GenomeInfoDb_1.2.5.tar.gz?raw=true + + + https://github.com/vimalkumarvelayudhan/riboseqr_wrapper_deps/raw/master/0.4.0/XVector_0.6.0.tar.gz?raw=true + + + https://github.com/vimalkumarvelayudhan/riboseqr_wrapper_deps/raw/master/0.4.0/GenomicRanges_1.18.4.tar.gz?raw=true + + + https://github.com/vimalkumarvelayudhan/riboseqr_wrapper_deps/raw/master/0.4.0/abind_1.4-3.tar.gz?raw=true + + + https://github.com/vimalkumarvelayudhan/riboseqr_wrapper_deps/raw/master/0.4.0/baySeq_2.0.50.tar.gz?raw=true + + + https://github.com/vimalkumarvelayudhan/riboseqr_wrapper_deps/raw/master/0.4.0/riboSeqR_1.0.5.tar.gz?raw=true + + + + + + Dependency installation for riboseqr_wrapper + + + diff -r 000000000000 -r c34c364ce75d triplet.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/triplet.xml Tue Jul 21 14:48:39 2015 +0100 @@ -0,0 +1,161 @@ + + + (Step 2) Plot triplet periodicity for different read lengths. + + + R + readline + rpy2 + riboseqr_wrapper_deps + + + + + riboseqr/triplet.py + --rdata_load "$rdata_load" + --fasta_file "$fasta_file" + --start_codons "$start_codons" + --stop_codons "$stop_codons" + --include_lengths "$include_lengths" + --analyze_plot_lengths "$analyze_plot_lengths" + --text_legend "$text_legend" + --rdata_save "$rdata_save" + --html_file "$html_file" + --output_path "$html_file.files_path" + + + + value.name == "Prepare riboSeqR input (R data file)" + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +Triplet Periodicity +------------------- +riboSeqR version: ``1.0.5``. + +This tool can be used to plot triplet periodicity for different read lengths. +The inputs for this program are: + +#. Prepare riboSeqR input (R data file) from the previous step. +#. FASTA format file of the reference transcriptome. + +.. class:: infomark + +Please note that the FASTA header of the transcriptome sequences must +match the sequence names from the SAM file. + +How to use? +----------- +Inputs +...... +Select prepared R data file from the previous step. + +Select FASTA format file of the reference transcriptome, review/change other +options if necessary and execute program. + +Outputs +....... +The following files will be generated on completion: + +#. Triplet periodicity (HTML report) + +A HTML file with results and links to other output files - triplet +periodicity plot (PDF) and R script used for the session. + +#. Triplet periodicity (R data file) + +Used as input for the next step - *Metagene analysis*. + +riboSeqR functions used +....................... +``findCDS``, ``frameCounting``, ``readingFrame`` and ``plotFS``. + +For detailed description of the functions and the options used, please consult +the riboSeqR documentation. + +Links +..... +`riboSeqR <http://bioconductor.org/packages/3.0/bioc/html/riboSeqR.html>`_. + + + + + @Manual{, + title = {riboSeqR: Analysis of sequencing data from ribosome + profiling experiments.}, + author = {Thomas J. Hardcastle}, + year = {2014}, + note = {R package version 1.0.5}, + } + + +