diff resfinder/db_pointfinder/README.md @ 0:55051a9bc58d draft default tip

Uploaded
author dcouvin
date Mon, 10 Jan 2022 20:06:07 +0000
parents
children
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/resfinder/db_pointfinder/README.md	Mon Jan 10 20:06:07 2022 +0000
@@ -0,0 +1,185 @@
+PointFinder Database documentation
+=============
+
+The PointFinder database is a curated database of resistance causing
+chromosomal point mutations.
+
+## Content of the repository
+_TODO_
+
+## Installation
+Clone the database
+```bash
+git clone https://git@bitbucket.org/genomicepidemiology/pointfinder_db.git
+```
+The database can be used with BLAST as-is.
+
+If you want to use the database with the stand-alone ResFinder tool, and wishes
+to use the mapping based method (available from ResFinder version 4.0.0), the
+database needs to be indexed.
+
+### Installing KMA (optional):
+
+If you are running the stand-alone ResFinder in docker, you may be able to skip
+installing KMA, and just rely on the temporary KMA installation done by the
+INSTALL script (or if you are just lazy and don't want to type the path to kma).
+
+If you are not running ResFinder stand-alone in docker, you will need to
+install KMA, if the mapping based method is needed (recommended).
+
+#### Download and install KMA
+```bash
+# Go to the directory in which you want KMA installed
+cd /some/path
+# Clone KMA
+git clone https://bitbucket.org/genomicepidemiology/kma.git
+# Go to kma directory and compile code
+cd kma && make
+```
+
+### Indexing with *INSTALL.py*
+If you have KMA installed you either need to have the kma_index in your PATH or
+you need to provide the path to kma_index to INSTALL.py
+
+#### a) Run INSTALL.py in interactive mode
+```bash
+# Go to the database directory
+cd path/to/resfinder_db
+python3 INSTALL.py
+```
+If kma_index was found in your path a lot of indexing information will be
+printed to your terminal, and will end with the word "done".
+
+If kma_index wasn't found you will recieve the following output:
+```bash
+KMA index program, kma_index, does not exist or is not executable
+Please input path to executable kma_index program or choose one of the options below:
+	1. Install KMA using make, index db, then remove KMA.
+	2. Exit
+```
+You can now write the path to kma_index and finish with <enter> or you can
+enter "1" or "2" and finish with <enter>.
+
+If "1" is chosen, the script will attempt to install kma in your systems
+default temporary location. If the installation is successful it will proceed
+to index your database, when finished it will delete the kma installation again.
+
+#### b) Run INSTALL.py in non_interactive mode
+```bash
+# Go to the database directory
+cd path/to/pointfinder_db
+python3 INSTALL.py /path/to/kma_index non_interactive
+```
+The path to kma_index can be omitted if it exists in PATH or if the script
+should attempt to do an automatic temporary installation of KMA.
+
+#### c) Index database manually (not recommended)
+It is possible to index the databases manually, but is generally not recommended
+as it is more prone to error. If you choose to do so, be aware of the naming of
+the indexed files.
+
+This is an example of how to index the PointFinder database files:
+```bash
+# Go to the database directory
+cd path/to/pointfinder_db
+# create indexing directory
+mkdir kma_indexing
+# Index files using kma_index
+kma_index -i db_pointfinder/campylobacter/*.fsa -o db_pointfinder/campylobacter/campylobacter
+kma_index -i db_pointfinder/escherichia_coli/*.fsa -o db_pointfinder/escherichia_coli/escherichia_coli
+kma_index -i db_pointfinder/enterococcus_faecalis/*.fsa -o db_pointfinder/enterococcus_faecalis/enterococcus_faecalis
+kma_index -i db_pointfinder/enterococcus_faecium/*.fsa -o db_pointfinder/enterococcus_faecium/enterococcus_faecium
+kma_index -i db_pointfinder/neisseria_gonorrhoeae/*.fsa -o db_pointfinder/neisseria_gonorrhoeae/neisseria_gonorrhoeae
+kma_index -i db_pointfinder/salmonella/*.fsa -o db_pointfinder/salmonella/salmonella
+kma_index -i db_pointfinder/mycobacterium_tuberculosis/*.fsa -o db_pointfinder/mycobacterium_tuberculosis/mycobacterium_tuberculosis
+```
+
+## PointFinder database format
+
+Each species that the PointFinder database covers has its own folder in which the database for the corresponding species resides.
+Four types of files exists within a PointFinder database:
+
+1. One or more FASTA files ending with the extension ".fsa". These FASTA files contains the reference (wild type) sequence of a specific region, mutations are found with respect to these sequences. The fasta header in these files must contain just the gene name as given in the resistens-overview.txt file.
+2. A file called "genes.txt". It is used to describe which of the FASTA sequences that should be employed when using the database.
+3. One or no file called "RNA_genes.txt". Defines which of the FASTA files are RNA genes.
+4. A file called "resistens-overview.txt". Defines the resistance causing mutations and phenotypes. The File is described in details below.
+
+### resistens-overview.txt
+
+The file is a text file in tab separated format. The first line starts with a #, followed by the headers for the table. Indels are always described at the end. If any indels are described, a line consisting of only "# Indels", should precede the first indel entry (row).
+
+|     Header   | Explanation                                                                                                       |
+| -------------|-------------------------------------------------------------------------------------------------------------------|
+| Gene_ID      | Gene ID as written in the genes.txt file                                                                          |
+| Gene_name    | Name of gene or region                                                                                            |
+| Codon_pos    | Nucleotide position in the FASTA file where the mutation starts                                                   |
+| Ref_nuc      | The reference sequence at the mutation, "-" if mutation is an indel                                               |
+| Ref_codon    | One letter aa or nucleotide describing the reference sequence. Can also be "del" or "ins" if mutation is an indel |
+| Res_codon    | Comma separated list of nucleotides or amino acids (1-letter code)                                                |
+| Resistance   | Comma separated list of antibiotics                                                                               |
+| PMID         | Comma separated list of pubmed IDs describing the mutation                                                        |
+| Mechanism    | Description of the resistance mechanism                                                                           |
+| Notes        | Text with other information                                                                                       |
+| Required_mut | Other mutations needed in order to gain resistance (see below for more details)                                   |
+
+### Required_mut format
+
+There are several layers that needs to be addressed in this field. Starting at the bottom, if a mutation is only dependent on a single other mutation it is written like so:
+```
+<Gene_ID>_<Ref_codon><Codon_pos><muts>
+```
+Where Gene_ID, Ref_codon, and Codon_pos are described in the above table. "muts" are either a single mutation written in 1-letter code or a number of possible mutations separated by a dot. Like so:
+```
+<Gene_ID>_<Ref_codon><Codon_pos><mut1>.<mut2>.<mut3>...<mutN>
+```
+Example of a required mutation in gyrA at position 83, changing an S to either L, W, A, or V:
+```
+gyrA_S83L.W.A.V
+```
+If there are several required mutations, that all need to be present, they are separated by commas, like so:
+```
+M = <Gene_ID>_<Ref_codon><Codon_pos><muts>
+M,M,M
+```
+Example:
+```
+pmrA_S39I,pmrA_R81S
+```
+If there are several groups of mutations that each can confer resistance with the mutation in question, but is independent of each other, the groups are separated by semicolons, like so:
+```
+M1,M1,M1;M2,M2,M2
+```
+Note that a group can consist of just one required mutation.
+Example of a required mutation in gyrA either at position 83 or 87:
+```
+gyrA_S83Y.F.A;gyrA_D87N.G.Y.K
+```
+
+## Documentation
+
+The documentation available as of the date of this release can be found at
+https://bitbucket.org/genomicepidemiology/pointfinder_db/overview.
+
+
+Citation
+=======
+
+When using the method please cite:
+
+Not yet published
+
+
+License
+=======
+
+Licensed under the Apache License, Version 2.0 (the "License");
+you may not use this file except in compliance with the License.
+You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+Unless required by applicable law or agreed to in writing, software
+distributed under the License is distributed on an "AS IS" BASIS,
+WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+See the License for the specific language governing permissions and
+limitations under the License.