annotate README.org @ 0:cf3cea0a3039 draft

Uploaded
author petr-novak
date Thu, 07 Oct 2021 06:07:34 +0000
parents
children e955b40ad3a4
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
cf3cea0a3039 Uploaded
petr-novak
parents:
diff changeset
1 #+TITLE: RepeatExplorer based Assembly Annotation Pipeline
cf3cea0a3039 Uploaded
petr-novak
parents:
diff changeset
2
cf3cea0a3039 Uploaded
petr-novak
parents:
diff changeset
3 * Tools in repository
cf3cea0a3039 Uploaded
petr-novak
parents:
diff changeset
4 ** Extract Repeat Library from RepeatExplorer Archive
cf3cea0a3039 Uploaded
petr-novak
parents:
diff changeset
5 (=extract_re_contigs.xml=)
cf3cea0a3039 Uploaded
petr-novak
parents:
diff changeset
6
cf3cea0a3039 Uploaded
petr-novak
parents:
diff changeset
7 This toll will extract library of repeats based on RepeatExplorer2 analysis. Library is available as fasta file.
cf3cea0a3039 Uploaded
petr-novak
parents:
diff changeset
8
cf3cea0a3039 Uploaded
petr-novak
parents:
diff changeset
9 ** Format repeat library
cf3cea0a3039 Uploaded
petr-novak
parents:
diff changeset
10 (=format_repeat_library.xml=)
cf3cea0a3039 Uploaded
petr-novak
parents:
diff changeset
11
cf3cea0a3039 Uploaded
petr-novak
parents:
diff changeset
12 This tool append classification of repeats to library of repeats. Type of repeat is then part of sequence name in format:
cf3cea0a3039 Uploaded
petr-novak
parents:
diff changeset
13
cf3cea0a3039 Uploaded
petr-novak
parents:
diff changeset
14 ~>sequence_id#classification_level1/classification_level2/...~ this enable to specify classification hierarchy
cf3cea0a3039 Uploaded
petr-novak
parents:
diff changeset
15 Classification of sequneces in library is provided using =CLUSTER_TABLE.csv= (part of RE2 output)
cf3cea0a3039 Uploaded
petr-novak
parents:
diff changeset
16
cf3cea0a3039 Uploaded
petr-novak
parents:
diff changeset
17 This file can then be used for annotation of repeat in your assembly:
cf3cea0a3039 Uploaded
petr-novak
parents:
diff changeset
18 ** Repeat Annotation
cf3cea0a3039 Uploaded
petr-novak
parents:
diff changeset
19 (=repeat_annotate_custom.xml=)
cf3cea0a3039 Uploaded
petr-novak
parents:
diff changeset
20
cf3cea0a3039 Uploaded
petr-novak
parents:
diff changeset
21 Internally annotation is performed using RepeatMasker search. Output from RepeatMasker is parsed to remove duplicated and overlaping annotations, Conflicts in annotations are resolved using hierarchical classification of repeats provided in custom database
cf3cea0a3039 Uploaded
petr-novak
parents:
diff changeset
22
cf3cea0a3039 Uploaded
petr-novak
parents:
diff changeset
23 * test data
cf3cea0a3039 Uploaded
petr-novak
parents:
diff changeset
24
cf3cea0a3039 Uploaded
petr-novak
parents:
diff changeset
25 - ~test_assembly_1.fasta~ with ~test_db_1_satellites.fasta~ (include CLASS followed by double underscore - syntax 1)
cf3cea0a3039 Uploaded
petr-novak
parents:
diff changeset
26 - ~test_assembly_2.fasta~ with ~test_db_2_RE_repeats.fasta~ (include full hierarchical classification)
cf3cea0a3039 Uploaded
petr-novak
parents:
diff changeset
27
cf3cea0a3039 Uploaded
petr-novak
parents:
diff changeset
28
cf3cea0a3039 Uploaded
petr-novak
parents:
diff changeset
29
cf3cea0a3039 Uploaded
petr-novak
parents:
diff changeset
30 #+begin_comment
cf3cea0a3039 Uploaded
petr-novak
parents:
diff changeset
31 create tarball for toolshed:
cf3cea0a3039 Uploaded
petr-novak
parents:
diff changeset
32 tar -czvf ../repeat_annotation_pipeline.tar.gz --exclude test_data --exclude .git .
cf3cea0a3039 Uploaded
petr-novak
parents:
diff changeset
33
cf3cea0a3039 Uploaded
petr-novak
parents:
diff changeset
34 #+end_comment>