annotate smart_toolShed/commons/core/parsing/README_MultiFasta2SNPFile @ 0:e0f8dcca02ed

Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
author yufei-luo
date Thu, 17 Jan 2013 10:52:14 -0500
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
1 *** DESCRIPTION: ***
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
2 This program takes as input a multifasta file (with sequences already aligned together formated in fasta in the same file), considers the first sequence as the reference sequence, infers polymorphims and generates output files in GnpSNP exchange format.
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
3
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
4
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
5 *** INSTALLATION: ***
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
6 Dependancies:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
7 - First you need Python installed in your system.
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
8 - Repet libraries are also required.
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
9
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
10 *** OPTIONS OF THE LAUNCHER: ***
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
11
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
12 -h: this help
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
13
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
14 Mandatory options:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
15 -b: Name of the batch of submitted sequences
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
16 -g: Name of the gene
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
17 -t: Scientific name of the taxon concerned
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
18
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
19 Exclusive options (use either the first or the second)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
20 -f: Name of the multifasta input file (for one input file)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
21 -d: Name of the directory containing multifasta input file(s) (for several input files)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
22
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
23
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
24
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
25 *** COMMAND LINE EXAMPLE (for package use): ***
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
26 - First, you need to set up the environment variable PYTHONPATH (lo link with the dependancies).
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
27
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
28 - Then for one input file (here our example), run:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
29
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
30 python multifastaParserLauncher.py -b Batch_test -g GeneX -t "Arabidopsis thaliana" -f Exemple_multifasta_input.fasta
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
31
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
32
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
33 - For several input files, create a directory in the root of the uncompressed package and put your input files in it. Then use this type of command line:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
34
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
35 python multifastaParserLauncher.py -b Batch_test -g GeneX -t "Arabidopsis thaliana" -d <Name_of_the_directory>
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
36
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
37 Each one of the input files will generate a directory with his set of output files.
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
38
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
39
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
40 *** SIMPLE USE (for package use): ***
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
41 Two executables (one for windows, the other for linux/unix) are in the package.
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
42 They show the command lines to use in order to set up environment variables and then to run the parser on our sample input file (Example_multifasta_input.fasta).
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
43 You can edit the executable and custom the command line to use it with your own input file.
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
44
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
45
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
46 *** BACKLOG (next version) ***
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
47 When the launcher is called for several input files (with -d option), the parser should be able to generate only one set of files describing all the batches (one batch per input file).
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
48 So below are listed the tasks of the backlog dedicated to this feature:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
49
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
50 - in Multifasta2SNPFile class:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
51 # CONSTRUCTOR: Modify the constructor to add a "several batches" mode called without BatchName and GeneName
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
52 # RUNNING METHOD: Add the run_several_batches(directory) method that will browse the input files and iterate over them to run each of them successively (see runSeveralInputFile() method of the launcher)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
53 => 2 days
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
54
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
55 # BATCH MANAGEMENT: Modify createBatchDict() to create one batch per file in the dictionary and add a class variable to point toward the current batch (ex: self._iCurrentLineNumber)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
56 # BATCH-LINE MANAGEMENT: Modify _completeBatchLineListWithCurrentIndividual method to allow several batch and link lines to batches (for the moment hard coded batch no1)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
57 # SUBSNP MANAGEMENT: check that all elements (dSUbSNP) added in SubSNP list (lSubSNPFileResults) is linked to the current batch (for the moment hard coded batch no1)
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
58 Impacted methods: manageSNPs(), createSubSNPFromAMissingPolym(), addMissingAllelesAndSubSNPsForOnePolym(), mergeAllelesAndSubSNPsFromOverlappingIndels()
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
59 => + 2 days
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
60
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
61 - in Multifasta2SNPFileWriter class:
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
62 # Modify all the method _write<X>File (ex: _writeSubSNPFile) to write in append mode and externalize all open and close file
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
63 # Create one method to open all the output files and call it in Multifasta2SNPFile run_several_batches method
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
64 # Create one method to close all the output files and call it in Multifasta2SNPFile run_several_batches method
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
65
e0f8dcca02ed Uploaded S-MART tool. A toolbox manages RNA-Seq and ChIP-Seq data.
yufei-luo
parents:
diff changeset
66 => + 2 days