Mercurial > repos > petr-novak > dante_ltr
diff README.md @ 12:ff01d4263391 draft
"planemo upload commit 414119ad7c44562d2e956b765e97ca113bc35b2b-dirty"
author | petr-novak |
---|---|
date | Thu, 21 Jul 2022 08:23:15 +0000 |
parents | 9de392f2fc02 |
children |
line wrap: on
line diff
--- a/README.md Wed Jul 13 11:02:55 2022 +0000 +++ b/README.md Thu Jul 21 08:23:15 2022 +0000 @@ -29,7 +29,7 @@ ### Detection of complete LTR retrotransposons ```shell -Usage: ./extract_putative_ltr.R COMMAND [OPTIONS] +Usage: ./detect_putative_ltr.R COMMAND [OPTIONS] Options: @@ -59,7 +59,7 @@ ```shell mkdir -p tmp -./extract_putative_ltr.R -g test_data/sample_DANTE.gff3 -s test_data/sample_genome.fasta -o tmp/ltr_annotation +./detect_putative_ltr.R -g test_data/sample_DANTE.gff3 -s test_data/sample_genome.fasta -o tmp/ltr_annotation ``` #### Files in the output of `extract_putative_ltr.R`: @@ -72,7 +72,35 @@ - `prefix_DLT.fasta` - elements with **d**omains, **L**TR, **T**SD - `prefix_statistics.csv` - number of elements in individual categories +For large genomes, you can your `detect_putative_ltr_wrapper.py`. This script will split input fasta to smaller chunks and run `detect_putative_ltr.R` on each chunk to limit memory usage. Output will be merged after all chunks are processed. +```shell +usage: detect_putative_ltr_wrapper.py [-h] -g GFF3 -s REFERENCE_SEQUENCE -o + OUTPUT [-c CPU] [-M MAX_MISSING_DOMAINS] + [-L MIN_RELATIVE_LENGTH] + [-S MAX_CHUNK_SIZE] + +detect_putative_ltr_wrapper.py is a wrapper for + detect_putative_ltr.R + +optional arguments: + -h, --help show this help message and exit + -g GFF3, --gff3 GFF3 gff3 file + -s REFERENCE_SEQUENCE, --reference_sequence REFERENCE_SEQUENCE + reference sequence as fasta file + -o OUTPUT, --output OUTPUT + output file path and prefix + -c CPU, --cpu CPU number of CPUs + -M MAX_MISSING_DOMAINS, --max_missing_domains MAX_MISSING_DOMAINS + -L MIN_RELATIVE_LENGTH, --min_relative_length MIN_RELATIVE_LENGTH + Minimum relative length of protein domain to be considered + for retrostransposon detection + -S MAX_CHUNK_SIZE, --max_chunk_size MAX_CHUNK_SIZE + If size of reference sequence is greater than this value, + reference is analyzed in chunks of this size. This is + just approximate value - sequences which are longer + are are not split, default is 100000000 +``` ### Validation of LTR retrotransposons detected un previous step: