comparison README.md @ 8:9de392f2fc02 draft

"planemo upload commit d6433b48c9bae079edb06364147f19500501c986"
author petr-novak
date Tue, 28 Jun 2022 12:33:22 +0000
parents 7b0bbe7477c4
children ff01d4263391
comparison
equal deleted inserted replaced
7:c33d6583e548 8:9de392f2fc02
1 # dante_ltr 1 # DANTE_LTR
2 2
3 Tool for identification of complete LTR retrotransposons based on analysis of protein 3 Tool for identifying complete LTR retrotransposons based on analysis of protein domains identified with the [DANTE tool](https://github.com/kavonrtep/dante). Both DANTE and DANTE_LTR are available on [Galaxy server](ttps://repeatexplorer-elixir.cerit-sc.cz/).
4 domains identified by DANTE tool. 4
5 ## Principle of DANTE _LTR
6 Complete retrotransposons are identified as clusters of protein domains recognized by the DANTE tool. The domains in the clusters must be assigned to a single retrotransposon lineage by DANTE. In addition, the orientation and order of the protein domains, as well as the distances between them, must conform to the characteristics of elements from REXXdb database [Neumann et al. (2019)](https://mobilednajournal.biomedcentral.com/articles/10.1186/s13100-018-0144-1).
7 In the next step, the 5' and 3' regions of the putative retrotransposon are examined for the presence of 5' and 3' long terminal repeats. If 5'- and 3'-long terminal repeats are detected, detection of target site duplication (TSD) and primer binding site (PSB) is performed. The detected LTR retrotranspsons are classified into 5 categories:
8 - Elements with protein domains, 5'LTR, 3'LTR, TSD and PBS - rank **DLTP**.
9 - Elements with protein domains, 5'LTR, 3'LTR, and PBS (TSD was not found) Rank **DLP**
10 - Elements with protein domains, 5' LTR, 3'LTR, TSD (PBS was not found) - rank **DTL**
11 - Elements with protein domains, 5'LTR and 3'LTR (PBS and TDS were not found) - rank **DL**
12 - Elements as clusters of protein domains with the same classification, no LTRs - rank **D**.
13
14 ![dante_ltr_workflow.png](dante_ltr_workflow.png)
15
5 16
6 ## Installation: 17 ## Installation:
7 18
8 ```shell 19 ```shell
9 conda create -n dante_ltr -c bioconda -c conda-forge -c petrnovak dante_ltr 20 conda create -n dante_ltr -c bioconda -c conda-forge -c petrnovak dante_ltr
10 ``` 21 ```
22
23 ## Input data
24 One input is a reference sequence in fasta fromat. The second input is an annotation of the reference genome using the tool DANTE in GFF3 format. For better results, use the unfiltered full output of the DANTE pipeline.
25
26
11 ## Usage 27 ## Usage
28
29 ### Detection of complete LTR retrotransposons
12 30
13 ```shell 31 ```shell
14 Usage: ./extract_putative_ltr.R COMMAND [OPTIONS] 32 Usage: ./extract_putative_ltr.R COMMAND [OPTIONS]
15 33
16 34
25 output file path and prefix 43 output file path and prefix
26 44
27 -c NUMBER, --cpu=NUMBER 45 -c NUMBER, --cpu=NUMBER
28 Number of cpu to use [default 5] 46 Number of cpu to use [default 5]
29 47
48 -M NUMBER, --max_missing_domains=NUMBER
49 Maximum number of missing domains is retrotransposon [default 0]
50
51 -L NUMBER, --min_relative_length=NUMBER
52 Minimum relative length of protein domain to be considered for retrostransposon detection [default 0.6]
30 -h, --help 53 -h, --help
31 Show this help message and exit 54 Show this help message and exit
55
32 ``` 56 ```
33 57
34 ## Example 58 #### Example:
59
35 ```shell 60 ```shell
36 mkdir -p tmp 61 mkdir -p tmp
37 ./extract_putative_ltr.R -g test_data/sample_DANTE.gff3 -s test_data/sample_genome.fasta -o tmp/ltr_annotation 62 ./extract_putative_ltr.R -g test_data/sample_DANTE.gff3 -s test_data/sample_genome.fasta -o tmp/ltr_annotation
38 ``` 63 ```
39 64
40 ## Output files 65 #### Files in the output of `extract_putative_ltr.R`:
41
42
43 ### Output of script `extract_putative_ltr.R`:
44
45 66
46 - `prefix.gff3` - annotation of all identified elements 67 - `prefix.gff3` - annotation of all identified elements
68 - `prefix_D.fasta` - partial elements with protein **d**omains
47 - `prefix_DL.fasta` - elements with protein **d**omains and **L**TR 69 - `prefix_DL.fasta` - elements with protein **d**omains and **L**TR
48 - `prefix_DLTP.fasta` - elements with **d**omains, **L**TR, **T**SD and **P**BS 70 - `prefix_DLTP.fasta` - elements with **d**omains, **L**TR, **T**SD and **P**BS
49 - `prefix_DLP.fasta` - elements with **d**omains, **L**TR and **P**BS 71 - `prefix_DLP.fasta` - elements with **d**omains, **L**TR and **P**BS
50 - `prefix_DLT.fasta` - elements with **d**omains, **L**TR, **T**SD 72 - `prefix_DLT.fasta` - elements with **d**omains, **L**TR, **T**SD
51 - `prefix_statistics.csv` - number of elements in individual categories 73 - `prefix_statistics.csv` - number of elements in individual categories
74
75
76
77 ### Validation of LTR retrotransposons detected un previous step:
78
79 ```shell
80 ./clean_ltr.R --help
81 Usage: ./clean_ltr.R COMMAND [OPTIONS]
82
83
84 Options:
85 -g GFF3, --gff3=GFF3
86 gff3 with LTR Transposable elements
87
88 -s REFERENCE_SEQUENCE, --reference_sequence=REFERENCE_SEQUENCE
89 reference sequence as fasta
90
91 -o OUTPUT, --output=OUTPUT
92 output file prefix
93
94 -c NUMBER, --cpu=NUMBER
95 Number of cpu to use [default 5]
96
97 -h, --help
98 Show this help message and exit
99 ```
100
101 This script check for potentially chimeric elements and removes them from GFF3 file.
102
103 #### Example
104 ```shell
105 ./clean_ltr.R -g test_data/sample_DANTE_LTR_annotation.gff3 -s test_data/sample_genome.fasta -o tmp/ltr_annotation_clean
106 ```
107