Mercurial > repos > dcouvin > resfinder4
view resfinder/db_pointfinder/README.md @ 0:55051a9bc58d draft default tip
Uploaded
author | dcouvin |
---|---|
date | Mon, 10 Jan 2022 20:06:07 +0000 |
parents | |
children |
line wrap: on
line source
PointFinder Database documentation ============= The PointFinder database is a curated database of resistance causing chromosomal point mutations. ## Content of the repository _TODO_ ## Installation Clone the database ```bash git clone https://git@bitbucket.org/genomicepidemiology/pointfinder_db.git ``` The database can be used with BLAST as-is. If you want to use the database with the stand-alone ResFinder tool, and wishes to use the mapping based method (available from ResFinder version 4.0.0), the database needs to be indexed. ### Installing KMA (optional): If you are running the stand-alone ResFinder in docker, you may be able to skip installing KMA, and just rely on the temporary KMA installation done by the INSTALL script (or if you are just lazy and don't want to type the path to kma). If you are not running ResFinder stand-alone in docker, you will need to install KMA, if the mapping based method is needed (recommended). #### Download and install KMA ```bash # Go to the directory in which you want KMA installed cd /some/path # Clone KMA git clone https://bitbucket.org/genomicepidemiology/kma.git # Go to kma directory and compile code cd kma && make ``` ### Indexing with *INSTALL.py* If you have KMA installed you either need to have the kma_index in your PATH or you need to provide the path to kma_index to INSTALL.py #### a) Run INSTALL.py in interactive mode ```bash # Go to the database directory cd path/to/resfinder_db python3 INSTALL.py ``` If kma_index was found in your path a lot of indexing information will be printed to your terminal, and will end with the word "done". If kma_index wasn't found you will recieve the following output: ```bash KMA index program, kma_index, does not exist or is not executable Please input path to executable kma_index program or choose one of the options below: 1. Install KMA using make, index db, then remove KMA. 2. Exit ``` You can now write the path to kma_index and finish with <enter> or you can enter "1" or "2" and finish with <enter>. If "1" is chosen, the script will attempt to install kma in your systems default temporary location. If the installation is successful it will proceed to index your database, when finished it will delete the kma installation again. #### b) Run INSTALL.py in non_interactive mode ```bash # Go to the database directory cd path/to/pointfinder_db python3 INSTALL.py /path/to/kma_index non_interactive ``` The path to kma_index can be omitted if it exists in PATH or if the script should attempt to do an automatic temporary installation of KMA. #### c) Index database manually (not recommended) It is possible to index the databases manually, but is generally not recommended as it is more prone to error. If you choose to do so, be aware of the naming of the indexed files. This is an example of how to index the PointFinder database files: ```bash # Go to the database directory cd path/to/pointfinder_db # create indexing directory mkdir kma_indexing # Index files using kma_index kma_index -i db_pointfinder/campylobacter/*.fsa -o db_pointfinder/campylobacter/campylobacter kma_index -i db_pointfinder/escherichia_coli/*.fsa -o db_pointfinder/escherichia_coli/escherichia_coli kma_index -i db_pointfinder/enterococcus_faecalis/*.fsa -o db_pointfinder/enterococcus_faecalis/enterococcus_faecalis kma_index -i db_pointfinder/enterococcus_faecium/*.fsa -o db_pointfinder/enterococcus_faecium/enterococcus_faecium kma_index -i db_pointfinder/neisseria_gonorrhoeae/*.fsa -o db_pointfinder/neisseria_gonorrhoeae/neisseria_gonorrhoeae kma_index -i db_pointfinder/salmonella/*.fsa -o db_pointfinder/salmonella/salmonella kma_index -i db_pointfinder/mycobacterium_tuberculosis/*.fsa -o db_pointfinder/mycobacterium_tuberculosis/mycobacterium_tuberculosis ``` ## PointFinder database format Each species that the PointFinder database covers has its own folder in which the database for the corresponding species resides. Four types of files exists within a PointFinder database: 1. One or more FASTA files ending with the extension ".fsa". These FASTA files contains the reference (wild type) sequence of a specific region, mutations are found with respect to these sequences. The fasta header in these files must contain just the gene name as given in the resistens-overview.txt file. 2. A file called "genes.txt". It is used to describe which of the FASTA sequences that should be employed when using the database. 3. One or no file called "RNA_genes.txt". Defines which of the FASTA files are RNA genes. 4. A file called "resistens-overview.txt". Defines the resistance causing mutations and phenotypes. The File is described in details below. ### resistens-overview.txt The file is a text file in tab separated format. The first line starts with a #, followed by the headers for the table. Indels are always described at the end. If any indels are described, a line consisting of only "# Indels", should precede the first indel entry (row). | Header | Explanation | | -------------|-------------------------------------------------------------------------------------------------------------------| | Gene_ID | Gene ID as written in the genes.txt file | | Gene_name | Name of gene or region | | Codon_pos | Nucleotide position in the FASTA file where the mutation starts | | Ref_nuc | The reference sequence at the mutation, "-" if mutation is an indel | | Ref_codon | One letter aa or nucleotide describing the reference sequence. Can also be "del" or "ins" if mutation is an indel | | Res_codon | Comma separated list of nucleotides or amino acids (1-letter code) | | Resistance | Comma separated list of antibiotics | | PMID | Comma separated list of pubmed IDs describing the mutation | | Mechanism | Description of the resistance mechanism | | Notes | Text with other information | | Required_mut | Other mutations needed in order to gain resistance (see below for more details) | ### Required_mut format There are several layers that needs to be addressed in this field. Starting at the bottom, if a mutation is only dependent on a single other mutation it is written like so: ``` <Gene_ID>_<Ref_codon><Codon_pos><muts> ``` Where Gene_ID, Ref_codon, and Codon_pos are described in the above table. "muts" are either a single mutation written in 1-letter code or a number of possible mutations separated by a dot. Like so: ``` <Gene_ID>_<Ref_codon><Codon_pos><mut1>.<mut2>.<mut3>...<mutN> ``` Example of a required mutation in gyrA at position 83, changing an S to either L, W, A, or V: ``` gyrA_S83L.W.A.V ``` If there are several required mutations, that all need to be present, they are separated by commas, like so: ``` M = <Gene_ID>_<Ref_codon><Codon_pos><muts> M,M,M ``` Example: ``` pmrA_S39I,pmrA_R81S ``` If there are several groups of mutations that each can confer resistance with the mutation in question, but is independent of each other, the groups are separated by semicolons, like so: ``` M1,M1,M1;M2,M2,M2 ``` Note that a group can consist of just one required mutation. Example of a required mutation in gyrA either at position 83 or 87: ``` gyrA_S83Y.F.A;gyrA_D87N.G.Y.K ``` ## Documentation The documentation available as of the date of this release can be found at https://bitbucket.org/genomicepidemiology/pointfinder_db/overview. Citation ======= When using the method please cite: Not yet published License ======= Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.