Mercurial > repos > petr-novak > dante_ltr
comparison README.md @ 12:ff01d4263391 draft
"planemo upload commit 414119ad7c44562d2e956b765e97ca113bc35b2b-dirty"
author | petr-novak |
---|---|
date | Thu, 21 Jul 2022 08:23:15 +0000 |
parents | 9de392f2fc02 |
children |
comparison
equal
deleted
inserted
replaced
11:54bd36973253 | 12:ff01d4263391 |
---|---|
27 ## Usage | 27 ## Usage |
28 | 28 |
29 ### Detection of complete LTR retrotransposons | 29 ### Detection of complete LTR retrotransposons |
30 | 30 |
31 ```shell | 31 ```shell |
32 Usage: ./extract_putative_ltr.R COMMAND [OPTIONS] | 32 Usage: ./detect_putative_ltr.R COMMAND [OPTIONS] |
33 | 33 |
34 | 34 |
35 Options: | 35 Options: |
36 -g GFF3, --gff3=GFF3 | 36 -g GFF3, --gff3=GFF3 |
37 gff3 with dante results | 37 gff3 with dante results |
57 | 57 |
58 #### Example: | 58 #### Example: |
59 | 59 |
60 ```shell | 60 ```shell |
61 mkdir -p tmp | 61 mkdir -p tmp |
62 ./extract_putative_ltr.R -g test_data/sample_DANTE.gff3 -s test_data/sample_genome.fasta -o tmp/ltr_annotation | 62 ./detect_putative_ltr.R -g test_data/sample_DANTE.gff3 -s test_data/sample_genome.fasta -o tmp/ltr_annotation |
63 ``` | 63 ``` |
64 | 64 |
65 #### Files in the output of `extract_putative_ltr.R`: | 65 #### Files in the output of `extract_putative_ltr.R`: |
66 | 66 |
67 - `prefix.gff3` - annotation of all identified elements | 67 - `prefix.gff3` - annotation of all identified elements |
70 - `prefix_DLTP.fasta` - elements with **d**omains, **L**TR, **T**SD and **P**BS | 70 - `prefix_DLTP.fasta` - elements with **d**omains, **L**TR, **T**SD and **P**BS |
71 - `prefix_DLP.fasta` - elements with **d**omains, **L**TR and **P**BS | 71 - `prefix_DLP.fasta` - elements with **d**omains, **L**TR and **P**BS |
72 - `prefix_DLT.fasta` - elements with **d**omains, **L**TR, **T**SD | 72 - `prefix_DLT.fasta` - elements with **d**omains, **L**TR, **T**SD |
73 - `prefix_statistics.csv` - number of elements in individual categories | 73 - `prefix_statistics.csv` - number of elements in individual categories |
74 | 74 |
75 For large genomes, you can your `detect_putative_ltr_wrapper.py`. This script will split input fasta to smaller chunks and run `detect_putative_ltr.R` on each chunk to limit memory usage. Output will be merged after all chunks are processed. | |
75 | 76 |
77 ```shell | |
78 usage: detect_putative_ltr_wrapper.py [-h] -g GFF3 -s REFERENCE_SEQUENCE -o | |
79 OUTPUT [-c CPU] [-M MAX_MISSING_DOMAINS] | |
80 [-L MIN_RELATIVE_LENGTH] | |
81 [-S MAX_CHUNK_SIZE] | |
82 | |
83 detect_putative_ltr_wrapper.py is a wrapper for | |
84 detect_putative_ltr.R | |
85 | |
86 optional arguments: | |
87 -h, --help show this help message and exit | |
88 -g GFF3, --gff3 GFF3 gff3 file | |
89 -s REFERENCE_SEQUENCE, --reference_sequence REFERENCE_SEQUENCE | |
90 reference sequence as fasta file | |
91 -o OUTPUT, --output OUTPUT | |
92 output file path and prefix | |
93 -c CPU, --cpu CPU number of CPUs | |
94 -M MAX_MISSING_DOMAINS, --max_missing_domains MAX_MISSING_DOMAINS | |
95 -L MIN_RELATIVE_LENGTH, --min_relative_length MIN_RELATIVE_LENGTH | |
96 Minimum relative length of protein domain to be considered | |
97 for retrostransposon detection | |
98 -S MAX_CHUNK_SIZE, --max_chunk_size MAX_CHUNK_SIZE | |
99 If size of reference sequence is greater than this value, | |
100 reference is analyzed in chunks of this size. This is | |
101 just approximate value - sequences which are longer | |
102 are are not split, default is 100000000 | |
103 ``` | |
76 | 104 |
77 ### Validation of LTR retrotransposons detected un previous step: | 105 ### Validation of LTR retrotransposons detected un previous step: |
78 | 106 |
79 ```shell | 107 ```shell |
80 ./clean_ltr.R --help | 108 ./clean_ltr.R --help |