annotate commons/core/parsing/README_MultiFasta2SNPFile @ 11:2da30502c2f1

Updated CompareOverlappingSmallQuery.xml
author m-zytnicki
date Thu, 14 Mar 2013 05:37:08 -0400
parents 769e306b7933
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
6
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
1 *** DESCRIPTION: ***
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
2 This program takes as input a multifasta file (with sequences already aligned together formated in fasta in the same file), considers the first sequence as the reference sequence, infers polymorphims and generates output files in GnpSNP exchange format.
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
3
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
4
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
5 *** INSTALLATION: ***
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
6 Dependancies:
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
7 - First you need Python installed in your system.
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
8 - Repet libraries are also required.
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
9
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
10 *** OPTIONS OF THE LAUNCHER: ***
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
11
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
12 -h: this help
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
13
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
14 Mandatory options:
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
15 -b: Name of the batch of submitted sequences
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
16 -g: Name of the gene
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
17 -t: Scientific name of the taxon concerned
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
18
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
19 Exclusive options (use either the first or the second)
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
20 -f: Name of the multifasta input file (for one input file)
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
21 -d: Name of the directory containing multifasta input file(s) (for several input files)
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
22
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
23
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
24
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
25 *** COMMAND LINE EXAMPLE (for package use): ***
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
26 - First, you need to set up the environment variable PYTHONPATH (lo link with the dependancies).
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
27
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
28 - Then for one input file (here our example), run:
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
29
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
30 python multifastaParserLauncher.py -b Batch_test -g GeneX -t "Arabidopsis thaliana" -f Exemple_multifasta_input.fasta
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
31
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
32
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
33 - For several input files, create a directory in the root of the uncompressed package and put your input files in it. Then use this type of command line:
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
34
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
35 python multifastaParserLauncher.py -b Batch_test -g GeneX -t "Arabidopsis thaliana" -d <Name_of_the_directory>
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
36
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
37 Each one of the input files will generate a directory with his set of output files.
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
38
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
39
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
40 *** SIMPLE USE (for package use): ***
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
41 Two executables (one for windows, the other for linux/unix) are in the package.
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
42 They show the command lines to use in order to set up environment variables and then to run the parser on our sample input file (Example_multifasta_input.fasta).
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
43 You can edit the executable and custom the command line to use it with your own input file.
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
44
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
45
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
46 *** BACKLOG (next version) ***
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
47 When the launcher is called for several input files (with -d option), the parser should be able to generate only one set of files describing all the batches (one batch per input file).
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
48 So below are listed the tasks of the backlog dedicated to this feature:
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
49
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
50 - in Multifasta2SNPFile class:
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
51 # CONSTRUCTOR: Modify the constructor to add a "several batches" mode called without BatchName and GeneName
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
52 # RUNNING METHOD: Add the run_several_batches(directory) method that will browse the input files and iterate over them to run each of them successively (see runSeveralInputFile() method of the launcher)
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
53 => 2 days
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
54
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
55 # BATCH MANAGEMENT: Modify createBatchDict() to create one batch per file in the dictionary and add a class variable to point toward the current batch (ex: self._iCurrentLineNumber)
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
56 # BATCH-LINE MANAGEMENT: Modify _completeBatchLineListWithCurrentIndividual method to allow several batch and link lines to batches (for the moment hard coded batch no1)
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
57 # SUBSNP MANAGEMENT: check that all elements (dSUbSNP) added in SubSNP list (lSubSNPFileResults) is linked to the current batch (for the moment hard coded batch no1)
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
58 Impacted methods: manageSNPs(), createSubSNPFromAMissingPolym(), addMissingAllelesAndSubSNPsForOnePolym(), mergeAllelesAndSubSNPsFromOverlappingIndels()
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
59 => + 2 days
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
60
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
61 - in Multifasta2SNPFileWriter class:
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
62 # Modify all the method _write<X>File (ex: _writeSubSNPFile) to write in append mode and externalize all open and close file
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
63 # Create one method to open all the output files and call it in Multifasta2SNPFile run_several_batches method
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
64 # Create one method to close all the output files and call it in Multifasta2SNPFile run_several_batches method
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
65
769e306b7933 Change the repository level.
yufei-luo
parents:
diff changeset
66 => + 2 days