Mercurial > repos > dcouvin > resfinder4
diff resfinder/db_pointfinder/README.md @ 0:55051a9bc58d draft default tip
Uploaded
author | dcouvin |
---|---|
date | Mon, 10 Jan 2022 20:06:07 +0000 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/resfinder/db_pointfinder/README.md Mon Jan 10 20:06:07 2022 +0000 @@ -0,0 +1,185 @@ +PointFinder Database documentation +============= + +The PointFinder database is a curated database of resistance causing +chromosomal point mutations. + +## Content of the repository +_TODO_ + +## Installation +Clone the database +```bash +git clone https://git@bitbucket.org/genomicepidemiology/pointfinder_db.git +``` +The database can be used with BLAST as-is. + +If you want to use the database with the stand-alone ResFinder tool, and wishes +to use the mapping based method (available from ResFinder version 4.0.0), the +database needs to be indexed. + +### Installing KMA (optional): + +If you are running the stand-alone ResFinder in docker, you may be able to skip +installing KMA, and just rely on the temporary KMA installation done by the +INSTALL script (or if you are just lazy and don't want to type the path to kma). + +If you are not running ResFinder stand-alone in docker, you will need to +install KMA, if the mapping based method is needed (recommended). + +#### Download and install KMA +```bash +# Go to the directory in which you want KMA installed +cd /some/path +# Clone KMA +git clone https://bitbucket.org/genomicepidemiology/kma.git +# Go to kma directory and compile code +cd kma && make +``` + +### Indexing with *INSTALL.py* +If you have KMA installed you either need to have the kma_index in your PATH or +you need to provide the path to kma_index to INSTALL.py + +#### a) Run INSTALL.py in interactive mode +```bash +# Go to the database directory +cd path/to/resfinder_db +python3 INSTALL.py +``` +If kma_index was found in your path a lot of indexing information will be +printed to your terminal, and will end with the word "done". + +If kma_index wasn't found you will recieve the following output: +```bash +KMA index program, kma_index, does not exist or is not executable +Please input path to executable kma_index program or choose one of the options below: + 1. Install KMA using make, index db, then remove KMA. + 2. Exit +``` +You can now write the path to kma_index and finish with <enter> or you can +enter "1" or "2" and finish with <enter>. + +If "1" is chosen, the script will attempt to install kma in your systems +default temporary location. If the installation is successful it will proceed +to index your database, when finished it will delete the kma installation again. + +#### b) Run INSTALL.py in non_interactive mode +```bash +# Go to the database directory +cd path/to/pointfinder_db +python3 INSTALL.py /path/to/kma_index non_interactive +``` +The path to kma_index can be omitted if it exists in PATH or if the script +should attempt to do an automatic temporary installation of KMA. + +#### c) Index database manually (not recommended) +It is possible to index the databases manually, but is generally not recommended +as it is more prone to error. If you choose to do so, be aware of the naming of +the indexed files. + +This is an example of how to index the PointFinder database files: +```bash +# Go to the database directory +cd path/to/pointfinder_db +# create indexing directory +mkdir kma_indexing +# Index files using kma_index +kma_index -i db_pointfinder/campylobacter/*.fsa -o db_pointfinder/campylobacter/campylobacter +kma_index -i db_pointfinder/escherichia_coli/*.fsa -o db_pointfinder/escherichia_coli/escherichia_coli +kma_index -i db_pointfinder/enterococcus_faecalis/*.fsa -o db_pointfinder/enterococcus_faecalis/enterococcus_faecalis +kma_index -i db_pointfinder/enterococcus_faecium/*.fsa -o db_pointfinder/enterococcus_faecium/enterococcus_faecium +kma_index -i db_pointfinder/neisseria_gonorrhoeae/*.fsa -o db_pointfinder/neisseria_gonorrhoeae/neisseria_gonorrhoeae +kma_index -i db_pointfinder/salmonella/*.fsa -o db_pointfinder/salmonella/salmonella +kma_index -i db_pointfinder/mycobacterium_tuberculosis/*.fsa -o db_pointfinder/mycobacterium_tuberculosis/mycobacterium_tuberculosis +``` + +## PointFinder database format + +Each species that the PointFinder database covers has its own folder in which the database for the corresponding species resides. +Four types of files exists within a PointFinder database: + +1. One or more FASTA files ending with the extension ".fsa". These FASTA files contains the reference (wild type) sequence of a specific region, mutations are found with respect to these sequences. The fasta header in these files must contain just the gene name as given in the resistens-overview.txt file. +2. A file called "genes.txt". It is used to describe which of the FASTA sequences that should be employed when using the database. +3. One or no file called "RNA_genes.txt". Defines which of the FASTA files are RNA genes. +4. A file called "resistens-overview.txt". Defines the resistance causing mutations and phenotypes. The File is described in details below. + +### resistens-overview.txt + +The file is a text file in tab separated format. The first line starts with a #, followed by the headers for the table. Indels are always described at the end. If any indels are described, a line consisting of only "# Indels", should precede the first indel entry (row). + +| Header | Explanation | +| -------------|-------------------------------------------------------------------------------------------------------------------| +| Gene_ID | Gene ID as written in the genes.txt file | +| Gene_name | Name of gene or region | +| Codon_pos | Nucleotide position in the FASTA file where the mutation starts | +| Ref_nuc | The reference sequence at the mutation, "-" if mutation is an indel | +| Ref_codon | One letter aa or nucleotide describing the reference sequence. Can also be "del" or "ins" if mutation is an indel | +| Res_codon | Comma separated list of nucleotides or amino acids (1-letter code) | +| Resistance | Comma separated list of antibiotics | +| PMID | Comma separated list of pubmed IDs describing the mutation | +| Mechanism | Description of the resistance mechanism | +| Notes | Text with other information | +| Required_mut | Other mutations needed in order to gain resistance (see below for more details) | + +### Required_mut format + +There are several layers that needs to be addressed in this field. Starting at the bottom, if a mutation is only dependent on a single other mutation it is written like so: +``` +<Gene_ID>_<Ref_codon><Codon_pos><muts> +``` +Where Gene_ID, Ref_codon, and Codon_pos are described in the above table. "muts" are either a single mutation written in 1-letter code or a number of possible mutations separated by a dot. Like so: +``` +<Gene_ID>_<Ref_codon><Codon_pos><mut1>.<mut2>.<mut3>...<mutN> +``` +Example of a required mutation in gyrA at position 83, changing an S to either L, W, A, or V: +``` +gyrA_S83L.W.A.V +``` +If there are several required mutations, that all need to be present, they are separated by commas, like so: +``` +M = <Gene_ID>_<Ref_codon><Codon_pos><muts> +M,M,M +``` +Example: +``` +pmrA_S39I,pmrA_R81S +``` +If there are several groups of mutations that each can confer resistance with the mutation in question, but is independent of each other, the groups are separated by semicolons, like so: +``` +M1,M1,M1;M2,M2,M2 +``` +Note that a group can consist of just one required mutation. +Example of a required mutation in gyrA either at position 83 or 87: +``` +gyrA_S83Y.F.A;gyrA_D87N.G.Y.K +``` + +## Documentation + +The documentation available as of the date of this release can be found at +https://bitbucket.org/genomicepidemiology/pointfinder_db/overview. + + +Citation +======= + +When using the method please cite: + +Not yet published + + +License +======= + +Licensed under the Apache License, Version 2.0 (the "License"); +you may not use this file except in compliance with the License. +You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + +Unless required by applicable law or agreed to in writing, software +distributed under the License is distributed on an "AS IS" BASIS, +WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +See the License for the specific language governing permissions and +limitations under the License.