# HG changeset patch # User vmarcon # Date 1486406269 18000 # Node ID b126ea31824f4da7b77acd79b7815ff7f3d04f15 1st Uploaded diff -r 000000000000 -r b126ea31824f README.txt --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/README.txt Mon Feb 06 13:37:49 2017 -0500 @@ -0,0 +1,63 @@ +REPET dependency + +For REPET version 2.5.1 (REPET with some patchs, please contact urgi-contact@versailles.inra.fr to get it) + +REPET dependency handle the creation of essential environment variables. + +!!!!WARNING : REPET has to be installed!!!! + + +After you have install your tool with this dependency, please go to check if defined environment variables suit with your configuration. +Check this file : +***tool_dependency_dir***/repet/2.5/vmarcon/package_repet_2_5/***revision***/env.sh +Please replace with correct values : + * tool_dependency_dir (value in galaxy.ini file) + * revision + +Then in this file (env.sh) modify variable value in order to adjust it to your +system : + - Database connexion (REPET_HOST, REPET_USER, REPET_PW, REPET_DB, REPET_PORT) + - Job manager (REPET_JOB_MANAGER, REPET_QUEUE) + - Working environment (REPET_PATH, REPET_NUCL_BANK, REPET_PROT_BANK, REPET_HMM_PROFILES, REPET_RDNA_BANK) + + +If you want REPET working in a specific temporary directory, fill the variable REPET_TMP_DIR. + +If you don't want to use one or several databanks, remove the corresponding variable. + + + +------------- + Galaxy Page +------------- + +!!!! To get the content of the page and example datasets, please ask it sendig an e-mail at urgi-contact@versailles.inra.fr . !!!! + + + +To explain in detail to your users how TEannot_lite works, please make a galaxy Page: +- Connect in your Galaxy. +- Go to the "Saved Pages" (User > Saved Pages). +- Create a new page ("Add new Page" button in the top right corner) named 'teannot'. +- Click on "teannot" and "Edit content". +- Paste the content of the URGI TEannot page. +- Save. + +Now you have to create the two Embed Galaxy Object (example on the blue and green area). +- On the URGI TEannot page, download "DmelChr4.fa", "TElib_DmelChr4.fa" and "TElib_DmelCrh4.classif" by clicking on "Save dataset" (green area). +- Upload this file on your Galaxy instance where REPET Lite is installed IN A NEW HISTORY. +- Rename these datasets as in example "DmelChr4.fa", "TElib_DmelChr4.fa" and "TElib_DmelCrh4.classif". +- Launch "Repet Lite - TEannot" with "DmelChr4.fa" as Fasta alignment input, "TElib_DmelChr4.fa" as Fasta TE library and "TElib_DmelCrh4.classif" as Classification file. +- Go back to the page in edition mode. +- Add a green area - Embed Datasets - DmelChr4.fa +- Add a green area - Embed Datasets - TElib_DmelChr4.fa +- Add a green area - Embed Datasets - TElib_DmelChr4.classif +- Add the blue area - Embed Histories - your history with TEannot result. +- Save again. + +To publish your page: +- Go to the list of "Saved Pages" +- Click on "teannot" and "Share or Publish" +- And "Make Page Accessible and Publish" + +Now this page is accessible with the URL shown and in Shared Data > Published Pages diff -r 000000000000 -r b126ea31824f TEannot.sh --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/TEannot.sh Mon Feb 06 13:37:49 2017 -0500 @@ -0,0 +1,50 @@ +#!/bin/bash +set -e + +fasta=$1 +library=$2 +outputfile=$3 +outputmaskedfile=$4 +outputlog=$5 +outputconfig=$6 +outputStats=$7 +classif=$8 +outputmasked_SSRmaskfile=$9 + +projectname=$(date "+%Y%m%d") + +add='' + +if [ -f $classif ] +then + add='-c '$classif +fi + +if [ -f $outputStats ] +then + add=$add' -s' +fi + +`dirname $0`'/'TEannot_lite.py -i $fasta -l $library -o $outputfile $add > $outputlog +projectname_complete=$(ls $(pwd)|grep $projectname) +working_dir=$(pwd)/$projectname_complete +sed -i 's@'"$working_dir"'@'$projectname'@g' $outputlog +mv $outputfile-$projectname.gff3 $outputfile +mv $outputfile-$projectname.mask $outputmaskedfile +mv $outputfile-$projectname.mask_SSRmask.fa $outputmasked_SSRmaskfile +if [ -f $outputStats ] +then + mv $outputfile-$projectname-TEstats.txt $outputStats +fi + +workingconfigfile=$working_dir/TEannot_Galaxy_config_$projectname_complete +sed -i 's|repet_host:.*|repet_host:|g' $workingconfigfile +sed -i 's|repet_user:.*|repet_user:|g' $workingconfigfile +sed -i 's|repet_pw:.*|repet_pw:|g' $workingconfigfile +sed -i 's|repet_db:.*|repet_db:|g' $workingconfigfile +sed -i 's|repet_port:.*|repet_port:|g' $workingconfigfile +sed -i 's|repet_job_manager:.*|repet_job_manager:|g' $workingconfigfile +sed -i 's|project_name:.*|project_name: '$projectname'|g' $workingconfigfile +sed -i 's|project_dir:.*|project_dir:|g' $workingconfigfile +sed -i 's|tmpDir:.*|tmpDir:|g' $workingconfigfile +mv $workingconfigfile $outputconfig diff -r 000000000000 -r b126ea31824f TEannot.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/TEannot.xml Mon Feb 06 13:37:49 2017 -0500 @@ -0,0 +1,327 @@ + + + + Genome annotation for masking transposable elements + + + + python + repet + + + + + + + + + + + + + + + + + + + TEannot.sh $fasta $library $outputfile $outputmaskedfile $outputlog $outputconfig + #if str( $withStats ) == "yes": + $outputstatsfile + #else : + $withStats + #end if + $classif + $outputmasked_SSRmaskfile + + + + + + + + + + + + + + + + + + + + + + (withStats == 'yes') + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + Published Pages" + +----------------- +Workflow position +----------------- + +**Upstream tools** + +=========== ========================== ======= +Name output file(s) format +=========== ========================== ======= +TEdenovo Fasta file with TE library fasta +=========== ========================== ======= + + +---------- +Input file +---------- + +Fasta file + Genome file at fasta format + +Library file + Fasta file with a library of transposable elements from TEdenovo. + +---------- +Parameters +---------- + +Masked file + To get an additionnal output file : Masked fasta file + + +------------ +Output files +------------ + +Output_gff3 + GFF3 file with transposable elements +Output_masked_fasta + Input fasta file masked with TE infos +Output_config + File to show which params have been used +Output_stats + File with statistics on TE library + +------------ +Dependencies +------------ + + +--------------------------------------------------- + +--------------- +Working example +--------------- + +Input files +=========== + +Fasta file +---------- + +:: + + >dmel_chr4 + GAGAACCGTCCTGTAAGTACTCTTGCTTTAAATACGAAAGTAATACTAATCCATGACGCTTAAGTCGAAGAGAGAATAAGTCAATATTTAATTGGACTCATCGCTTATGTTCATCATGAATCTATAGTTAACTTGATGTTGTGCTCCATGTACGATATAAAAAGTTAGATA + + +Fasta Library +------------- + +:: + + >DTX-incomp_20150325110123-B-G1-Map3 + ATACAGCTGCGGTTAAAATAATAGCACTACTGCAGGTGGAAAGTTGATTTCCTAAAAAAA + ATTATTAAATGTTTATATTTTTTTAAGTCAGATTGCATGAATAATAAGTACCATATGTTG + GCTCTCTGAGCAAGAAATTTTTAGTCTCT + >DTX-incomp_20150325110123-B-P1.0-Map3 + CTTGTGTCCGCACTTCGTGCCTCAAGATATGAACAAAGCAAAGACACTAGAATAATTCTA + GTGTATTACTTTGATATTACTTTTGCAATAAACAGTTATCATATTTTTA + + +Output files +============ + +GFF3 output : +------------- + +:: + + ##gff-version 3 + dmel_chr4 test_REPET_TEs match 971161 971469 0.0 - . ID=ms1_dmel_chr4_DTX-incomp_DmelChr4-B-G1-Map3;Target=DTX-incomp_DmelChr4-B-G1-Map3 45 542 + dmel_chr4 test_REPET_TEs match_part 971161 971271 0.0 - . ID=mp1-1_dmel_chr4_DTX-incomp_DmelChr4-B-G1-Map3;Parent=ms1_dmel_chr4_DTX-incomp_DmelChr4-B-G1-Map3;Target=DTX-incomp_DmelChr4-B-G1-Map3 435 542;Identity=94.4 + +Masked fasta output : +--------------------- + +:: + + >dmel_chr4 + GAGAACCGTCCTGTAAGTACTCTTGCTTTAAATACGXXXXXXXXXXXXXXXXXXXXACGCTTAAGTCGAAGAGAGAATAAGTCAATATTTAATTGGACTCATCGCTTATGTTCATCATGAATCTATAGTTAACTTGATGTTGTGCTCCATGTACGATATAAAAAGTTAGATA + +Config file : +------------- + +:: + + [repet_env] + repet_version: 2.4 + repet_host: ****** + repet_user: ****** + +Statistics file : +----------------- + +:: + + nb of sequences: 8 + nb of matched sequences: 8 + cumulative coverage: 133656 bp + +]]> + + + + De Novo Annotation Approaches}, + year = {2011}, + month = {01}, + volume = {6}, + url = {http://dx.doi.org/10.1371%2Fjournal.pone.0016526}, + pages = {e16526}, + abstract = { +

Transposable elements (TEs) are mobile, repetitive DNA sequences that are almost ubiquitous in prokaryotic and eukaryotic genomes. They have a large impact on genome structure, function and evolution. With the recent development of high-throughput sequencing methods, many genome sequences have become available, making possible comparative studies of TE dynamics at an unprecedented scale. Several methods have been proposed for the de novo identification of TEs in sequenced genomes. Most begin with the detection of genomic repeats, but the subsequent steps for defining TE families differ. High-quality TE annotations are available for the Drosophila melanogaster and Arabidopsis thaliana genome sequences, providing a solid basis for the benchmarking of such methods. We compared the performance of specific algorithms for the clustering of interspersed repeats and found that only a particular combination of algorithms detected TE families with good recovery of the reference sequences. We then applied a new procedure for reconciling the different clustering results and classifying TE sequences. The whole approach was implemented in a pipeline using the REPET package. Finally, we show that our combined approach highlights the dynamics of well defined TE families by making it possible to identify structural variations among their copies. This approach makes it possible to annotate TE families and to study their diversification in a single analysis, improving our understanding of TE dynamics at the whole-genome scale and for diverse species.

+ }, + number = {1}, + doi = {10.1371/journal.pone.0016526} + }]]>
+ Arabidopsis thaliana Junk DNA Reveals a Continuum between Repetitive Elements and Genomic Dark Matter}, + year = {2014}, + month = {04}, + volume = {9}, + url = {http://dx.doi.org/10.1371%2Fjournal.pone.0094101}, + pages = {e94101}, + abstract = {

Eukaryotic genomes contain highly variable amounts of DNA with no apparent function. This so-called junk DNA is composed of two components: repeated and repeat-derived sequences (together referred to as the repeatome), and non-annotated sequences also known as genomic dark matter. Because of their high duplication rates as compared to other genomic features, transposable elements are predominant contributors to the repeatome and the products of their decay is thought to be a major source of genomic dark matter. Determining the origin and composition of junk DNA is thus important to help understanding genome evolution as well as host biology. In this study, we have used a combination of tools enabling to show that the repeatome from the small and reducing A. thaliana genome is significantly larger than previously thought. Furthermore, we present the concepts and results from a series of innovative approaches suggesting that a significant amount of the A. thaliana dark matter is of repetitive origin. As a tentative standard for the community, we propose a deep compendium annotation of the A. thaliana repeatome that may help addressing farther genome evolution as well as transcriptional and epigenetic regulation in this model plant.

}, + number = {4}, + doi = {10.1371/journal.pone.0094101} + }]]>
+
+ +
diff -r 000000000000 -r b126ea31824f TEannot_lite.py --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/TEannot_lite.py Mon Feb 06 13:37:49 2017 -0500 @@ -0,0 +1,245 @@ +#!/usr/bin/env python + +import os +import sys +import time +import glob +import shutil +import ConfigParser +import re + +if not "REPET_PATH" in os.environ.keys(): + print("ERROR: no environment variable REPET_PATH") + sys.exit(1) + +if (not "REPET_DB" in os.environ.keys()) or (not "REPET_HOST" in os.environ.keys()) or (not "REPET_PORT" in os.environ.keys()) or (not "REPET_USER" in os.environ.keys()) or (not "REPET_PW" in os.environ.keys()): + print "ERROR: there is at least one environment database variable missing : REPET_DB, REPET_PORT, REPET_HOST, REPET_USER or REPET_PW" + sys.exit(1) + +if not "REPET_JOB_MANAGER" in os.environ.keys(): + print "ERROR: no environment variable REPET_JOB_MANAGER" + sys.exit(1) + + +if not "%s/bin" % os.environ["REPET_PATH"] in os.environ["PATH"]: + os.environ["PATH"] = "%s/bin:%s" % (os.environ["REPET_PATH"], os.environ["PATH"]) + +sys.path.append(os.environ["REPET_PATH"]) +if not "PYTHONPATH" in os.environ.keys(): + os.environ["PYTHONPATH"] = os.environ["REPET_PATH"] +else: + os.environ["PYTHONPATH"] = "%s:%s" % (os.environ["REPET_PATH"], os.environ["PYTHONPATH"]) + + +from commons.core.LoggerFactory import LoggerFactory +from commons.core.checker.RepetException import RepetException +from commons.core.utils.FileUtils import FileUtils +from commons.core.utils.RepetOptionParser import RepetOptionParser +from commons.core.seq.FastaUtils import * #FastaUtils +from commons.core.sql.DbFactory import DbFactory + +LOG_DEPTH = "TEannot.pipeline" + +class TEannot_lite(object): + + def __init__(self, configFileName = "", fastaFileName = "", libraryFileName = "", verbosity = 0): + self._configFileName = configFileName + self._fastaFileName = os.path.abspath(fastaFileName) + self._libraryFileName = os.path.abspath(libraryFileName) + self._projectName = time.strftime("%Y%m%d%H%M%S") + self._outputGff = "" + self._classif = "" + #self._maskedThreshold = 80 + self._statsFile = "" + self._outputMasked = "" + if "REPET_TMP_DIR" in os.environ.keys(): + self._tmp_dir = os.environ["REPET_TMP_DIR"] + else : + self._tmp_dir = "" + self._verbosity = verbosity + self._log = LoggerFactory.createLogger("%s.%s" % (LOG_DEPTH, self.__class__.__name__), self._verbosity) + + def setAttributesFromCommandLine(self): + description = "This script is a ligth version of TEannot. It writes configuration file and launches TEannot." + epilog = "Example: TEannot_lite.py -i fastaFileName -l fastaLibraryFileName \n" + version = "1.1" + parser = RepetOptionParser(description = description, epilog = epilog, version = version) + parser.add_option("-i", "--fasta", dest = "fastaFileName" , action = "store" , type = "string", help ="Input fasta file name ", default = "") + parser.add_option("-l", "--lib", dest = "libraryFileName" , action = "store" , type = "string", help ="Input fasta library file name ", default = "") + parser.add_option("-c", "--withClassif", dest = "withClassif" , action = "store" , type = "string" , metavar="CLASSIFFILE" , help ="[optional] To add classification informations in GFF3 file, please put classif file from TEdenovo step. ", default = "") + #parser.add_option("-t", "--maskedThreshold", dest = "maskedThreshold" , action = "store", type = "int", metavar="80", help ="[optional] [default: 80] To choose the threshold of the identity percent for the masked fasta file. ", default = 80) + parser.add_option("-s", "--stats", dest="withStats", action="store_true",help = " Get statistical file in output.", default = False) + parser.add_option("-o", "--output", dest="outputLabel" , action = "store", type = "string", help = " [optional] Label for GFF3 output file", default = "") + parser.add_option("-v", "--verbosity", dest = "verbosity", action = "store", type = "int", metavar="2", help = "Verbosity [optional] [default: 2]", default = 2) + options = parser.parse_args()[0] + self._setAttributesFromOptions(options) + + def _setAttributesFromOptions(self, options): + self.setConfigFileName("") + if options.fastaFileName=="": + print "ERROR : You have to enter an input fasta file" + print "Example: TEdenovo_lite.py -i fastaFileName \n" + print "More option : TEdenovo_lite.py --help " + exit(1) + else : + self._fastaFileName = os.path.abspath(options.fastaFileName) + if options.libraryFileName=="": + print "ERROR : You have to enter an input libary fasta file" + print "Example: TEannot_lite.py -i fastaFileName -l fastaLibraryFileName \n" + print "More option : TEannot_lite.py --help " + exit(1) + else : + self._libraryFileName = os.path.abspath(options.libraryFileName) + if options.outputLabel=="": + fastaBaseName=os.path.abspath(re.search(r'([^\/\\]*)\.[fa|fasta|fsa|fas]',options.fastaFileName).groups()[0]) + options.outputLabel = fastaBaseName + self._outputGff = os.path.abspath(options.outputLabel+'-%s.gff3'%self._projectName[:8]) + + if options.withClassif!='': + self._classif = os.path.abspath(options.withClassif) + + self._outputMasked = os.path.abspath(options.outputLabel+'-%s.mask'%self._projectName[:8]) + #if options.maskedThreshold : + # self._maskedThreshold = options.maskedThreshold + if options.withStats : + self._statsFile = os.path.abspath(options.outputLabel+'-%s-TEstats.txt'%self._projectName[:8]) + self._verbosity = options.verbosity + + def setConfigFileName(self, configFileName): + self._configFileName = configFileName + if not self._configFileName: + self._configFileName = "TEannot_Galaxy_config_%s" % self._projectName + + def setAttributesFromConfigFile(self, configFileName): + config = ConfigParser.ConfigParser() + config.readfp( open(configFileName) ) + + def _writeConfigFile(self): + if FileUtils.isRessourceExists(self._configFileName): + self._logAndRaise("Configuration file '%s' already exists. Won't be overwritten.") + + shutil.copy("%s/config/TEannot.cfg" % os.environ.get("REPET_PATH"), self._configFileName) + self.setAttributesFromConfigFile(self._configFileName) + + os.system("sed -i 's|repet_host: |repet_host: %s|' %s" % (os.environ["REPET_HOST"], self._configFileName)) + os.system("sed -i 's|repet_user: |repet_user: %s|' %s" % (os.environ["REPET_USER"], self._configFileName)) + os.system("sed -i 's|repet_pw: |repet_pw: %s|' %s" % (os.environ["REPET_PW"], self._configFileName)) + os.system("sed -i 's|repet_db: |repet_db: %s|' %s" % (os.environ["REPET_DB"], self._configFileName)) + os.system("sed -i 's|repet_port: 3306|repet_port: %s|' %s" % (os.environ["REPET_PORT"], self._configFileName)) + os.system("sed -i 's|repet_job_manager: SGE|repet_job_manager: %s|' %s" % (os.environ["REPET_JOB_MANAGER"], self._configFileName)) + os.system("sed -i 's|project_name: |project_name: %s|' %s" % (self._projectName, self._configFileName)) + os.system("sed -i 's|project_dir: |project_dir: %s|' %s" % (os.getcwd().replace("/", "\/"), self._configFileName)) + os.system("sed -i 's|do_join: yes|do_join: no|' %s" % ( self._configFileName)) + os.system("sed -i 's|add_SSRs: no|add_SSRs: yes|' %s" % ( self._configFileName)) + os.system("sed -i 's|gff3_compulsory_match_part: no|gff3_compulsory_match_part: yes|' %s" % ( self._configFileName)) + os.system("sed -i 's|BLR_sensitivity: 3|BLR_sensitivity: 2|' %s" % ( self._configFileName)) + os.system("sed -i 's|tmpDir:|tmpDir: %s|g' %s" % (self._tmp_dir,self._configFileName)) + if self._classif!="" : + os.system("sed -i 's|gff3_with_classif_info: no|gff3_with_classif_info: yes|' %s" % ( self._configFileName)) + os.system("sed -i 's|classif_table_name: |classif_table_name: %s_consensus_classif|' %s" % ( self._projectName,self._configFileName)) + + def _mergeOutputGff(self): + file_out=open(self._outputGff,'w') + file_out.write('##gff-version 3\n') + file_out.close() + directory="%s_GFF3chr/"%self._projectName + outGffs = glob.glob("%s*.gff3"%directory) + for outGff in outGffs : + os.system("grep -v '#' %s >> %s"%(outGff,self._outputGff)) + os.system("sed -i 's|%s_REPET_TEs|REPET_TEs|g' %s" % (self._projectName,self._outputGff)) + + def _launchTEannot(self): + print "START time: %s" % time.strftime("%Y-%m-%d %H:%M:%S") + lCmds = [] + lCmds.append( "TEannot.py -P %s -C %s -S 1 -v %i" % (self._projectName, self._configFileName, self._verbosity) ) + lCmds.append( "TEannot.py -P %s -C %s -S 2 -a BLR -v %i" % (self._projectName, self._configFileName, self._verbosity) ) + lCmds.append( "TEannot.py -P %s -C %s -S 2 -a RM -v %i" % (self._projectName, self._configFileName, self._verbosity) ) + lCmds.append( "TEannot.py -P %s -C %s -S 2 -a CEN -v %i" % (self._projectName, self._configFileName, self._verbosity) ) + lCmds.append( "TEannot.py -P %s -C %s -S 2 -a BLR -r -v %i" % (self._projectName, self._configFileName, self._verbosity) ) # + lCmds.append( "TEannot.py -P %s -C %s -S 2 -a RM -r -v %i" % (self._projectName, self._configFileName, self._verbosity) ) # + lCmds.append( "TEannot.py -P %s -C %s -S 2 -a CEN -r -v %i" % (self._projectName, self._configFileName, self._verbosity) ) # + lCmds.append( "TEannot.py -P %s -C %s -S 4 -s TRF -v %i" % (self._projectName, self._configFileName, self._verbosity) ) + lCmds.append( "TEannot.py -P %s -C %s -S 4 -s RMSSR -v %i" % (self._projectName, self._configFileName, self._verbosity) ) + lCmds.append( "TEannot.py -P %s -C %s -S 4 -s Mreps -v %i" % (self._projectName, self._configFileName, self._verbosity) ) + lCmds.append( "TEannot.py -P %s -C %s -S 5 -v %i" % (self._projectName, self._configFileName, self._verbosity) ) + lCmds.append( "TEannot.py -P %s -C %s -S 3 -c BLR+RM+CEN -v %i" % (self._projectName, self._configFileName, self._verbosity) ) + lCmds.append( "TEannot.py -P %s -C %s -S 7 -v %i" % (self._projectName, self._configFileName, self._verbosity) ) + lCmds.append( "TEannot.py -P %s -C %s -S 8 -v %i -o GFF3" % (self._projectName, self._configFileName, self._verbosity) ) + + if self._classif!='': + self._setClassifTable() + + for cmd in lCmds: + returnValue = os.system(cmd) + if returnValue != 0: + print "ERROR: command '%s' returned %i" % (cmd, returnValue) + self._cleanTables() + sys.exit(1) + + print "END time: %s" % time.strftime("%Y-%m-%d %H:%M:%S") + + + def _maskFasta(self): + pathFile = self._outputMasked+"_tmp.path" + setFile = self._outputMasked+"_tmp.set" + lCmds = [] + lCmds.append("srptExportTable.py -i %s_chr_allTEs_nr_noSSR_path -C %s -o %s -v %s" % (self._projectName,self._configFileName,pathFile,self._verbosity)) + lCmds.append("MaskSeqFromCoord.py -i %s -m %s -f path -X -o %s -v %s" % (self._fastaFileName,pathFile,self._outputMasked,self._verbosity)) + lCmds.append("srptExportTable.py -i %s_chr_allSSRs_set -C %s -o %s -v %s " % (self._projectName,self._configFileName, setFile,self._verbosity)) + lCmds.append("MaskSeqFromCoord.py -i %s -m %s -f set -X -o %s_SSRmask.fa -v %s" % (self._outputMasked, setFile, self._outputMasked, self._verbosity)) + + for cmd in lCmds: + returnValue = os.system(cmd) + if returnValue != 0: + print "ERROR: command '%s' returned %i" % (cmd, returnValue) + self._cleanTables() + sys.exit(1) + + #os.system("rm -f %s"%pathFile) + + def _createStatsFile(self): + fastaFile=open(self._fastaFileName) + fastaLength=FastaUtils.dbCumLength( fastaFile ) + cmd = "PostAnalyzeTELib.py -a 3 -g {0} -p {1}_chr_allTEs_nr_noSSR_path -s {1}_refTEs_seq".format(fastaLength,self._projectName) + os.system(cmd) + cmd = "mv %s_chr_allTEs_nr_noSSR_path.globalAnnotStatsPerTE.txt %s"%(self._projectName,self._statsFile) + os.system(cmd) + + def _setClassifTable(self): + iDb = DbFactory.createInstance() + iDb.createTable("%s_consensus_classif" % self._projectName, "classif", self._classif, True) + iDb.close() + + def _launchListAndDropTables(self): + cmd = "ListAndDropTables.py" + cmd += " -C %s" % self._configFileName + cmd += " -d '%s'" % self._projectName + os.system(cmd) + + def _cleanJobsTable(self): + db = DbFactory.createInstance( configFileName = self._configFileName ) + sql_cmd="DELETE FROM jobs WHERE groupid like '%s%%';"%self._projectName + db.execute( sql_cmd ) + db.close() + + def _cleanTables(self): + self._launchListAndDropTables() + self. _cleanJobsTable() + + def run(self): + os.mkdir(self._projectName) + os.chdir(self._projectName) + self._writeConfigFile() + os.symlink(self._fastaFileName,"%s/%s.fa" %(os.getcwd(),self._projectName)) #creer repertoire projet + os.symlink(self._libraryFileName,"%s/%s_refTEs.fa" %(os.getcwd(),self._projectName)) + self._launchTEannot() + self._mergeOutputGff() + self._maskFasta() + if self._statsFile : + self._createStatsFile() + self._cleanTables() + +if __name__ == '__main__': + iTEannot= TEannot_lite() + iTEannot.setAttributesFromCommandLine() + iTEannot.run() diff -r 000000000000 -r b126ea31824f tool_dependencies.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/tool_dependencies.xml Mon Feb 06 13:37:49 2017 -0500 @@ -0,0 +1,6 @@ + + + + + +