comparison commons/core/parsing/README_MultiFasta2SNPFile @ 36:44d5973c188c

Uploaded
author m-zytnicki
date Tue, 30 Apr 2013 15:02:29 -0400
parents 769e306b7933
children
comparison
equal deleted inserted replaced
35:d94018ca4ada 36:44d5973c188c
1 *** DESCRIPTION: ***
2 This program takes as input a multifasta file (with sequences already aligned together formated in fasta in the same file), considers the first sequence as the reference sequence, infers polymorphims and generates output files in GnpSNP exchange format.
3
4
5 *** INSTALLATION: ***
6 Dependancies:
7 - First you need Python installed in your system.
8 - Repet libraries are also required.
9
10 *** OPTIONS OF THE LAUNCHER: ***
11
12 -h: this help
13
14 Mandatory options:
15 -b: Name of the batch of submitted sequences
16 -g: Name of the gene
17 -t: Scientific name of the taxon concerned
18
19 Exclusive options (use either the first or the second)
20 -f: Name of the multifasta input file (for one input file)
21 -d: Name of the directory containing multifasta input file(s) (for several input files)
22
23
24
25 *** COMMAND LINE EXAMPLE (for package use): ***
26 - First, you need to set up the environment variable PYTHONPATH (lo link with the dependancies).
27
28 - Then for one input file (here our example), run:
29
30 python multifastaParserLauncher.py -b Batch_test -g GeneX -t "Arabidopsis thaliana" -f Exemple_multifasta_input.fasta
31
32
33 - For several input files, create a directory in the root of the uncompressed package and put your input files in it. Then use this type of command line:
34
35 python multifastaParserLauncher.py -b Batch_test -g GeneX -t "Arabidopsis thaliana" -d <Name_of_the_directory>
36
37 Each one of the input files will generate a directory with his set of output files.
38
39
40 *** SIMPLE USE (for package use): ***
41 Two executables (one for windows, the other for linux/unix) are in the package.
42 They show the command lines to use in order to set up environment variables and then to run the parser on our sample input file (Example_multifasta_input.fasta).
43 You can edit the executable and custom the command line to use it with your own input file.
44
45
46 *** BACKLOG (next version) ***
47 When the launcher is called for several input files (with -d option), the parser should be able to generate only one set of files describing all the batches (one batch per input file).
48 So below are listed the tasks of the backlog dedicated to this feature:
49
50 - in Multifasta2SNPFile class:
51 # CONSTRUCTOR: Modify the constructor to add a "several batches" mode called without BatchName and GeneName
52 # RUNNING METHOD: Add the run_several_batches(directory) method that will browse the input files and iterate over them to run each of them successively (see runSeveralInputFile() method of the launcher)
53 => 2 days
54
55 # BATCH MANAGEMENT: Modify createBatchDict() to create one batch per file in the dictionary and add a class variable to point toward the current batch (ex: self._iCurrentLineNumber)
56 # BATCH-LINE MANAGEMENT: Modify _completeBatchLineListWithCurrentIndividual method to allow several batch and link lines to batches (for the moment hard coded batch no1)
57 # SUBSNP MANAGEMENT: check that all elements (dSUbSNP) added in SubSNP list (lSubSNPFileResults) is linked to the current batch (for the moment hard coded batch no1)
58 Impacted methods: manageSNPs(), createSubSNPFromAMissingPolym(), addMissingAllelesAndSubSNPsForOnePolym(), mergeAllelesAndSubSNPsFromOverlappingIndels()
59 => + 2 days
60
61 - in Multifasta2SNPFileWriter class:
62 # Modify all the method _write<X>File (ex: _writeSubSNPFile) to write in append mode and externalize all open and close file
63 # Create one method to open all the output files and call it in Multifasta2SNPFile run_several_batches method
64 # Create one method to close all the output files and call it in Multifasta2SNPFile run_several_batches method
65
66 => + 2 days