annotate test-data/output_detailed_summary.txt @ 0:ce13b4c42256 draft

planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
author iuc
date Wed, 21 Feb 2024 19:27:16 +0000
parents
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
1 COMPLETENESS ASSESSMENT
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
2 ------------
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
3 #This benchmark gives an estimate of the completeness of the gene set based on the presence or not of conserved genes of the target lineage.
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
4 #Conserved genes are defined using Hierarchical Orthologous Groups (HOGs) defined at a certain taxonomic clade, which is a proxy for the ancestral gene repertoire of this clade. HOGs are considered conserved if they have at least one gene in >80% of the extant species.
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
5 #Because representatives of these groups are expected to be present in the target species repertoire, the proportion of missing HOGs proxies the proportion of missing genes in the total gene repertoire of the target proteome.
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
6 #Ancestral genes used for this benchmark were in single copy in the selected ancestral lineage, but no assumption is made regarding their propensity to duplicate - they are not universal single copy genes. This benchmark reports the proportion of those genes that are found in multiple copies in target proteomes, and whether it corresponds to a known duplication event in descendants of this gene family (Expected) or not (Unexpected).
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
7
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
8 The clade used was: Hominidae
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
9 Number of conserved HOGs: 17786
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
10
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
11 #Results on conserved HOGs:
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
12 Single: 39 (0.22%)
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
13 Duplicated: 0 (0.00%)
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
14 Duplicated, Unexpected: 0 (0.00%)
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
15 Duplicated, Expected: 0 (0.00%)
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
16 Missing: 17747 (99.78%)
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
17
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
18
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
19 CONSISTENCY ASSESSMENT
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
20 -------------------------
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
21 #This benchmark gives the proportion of annotated protein-coding genes in the query proteome that likely correspond to an actual protein-coding gene by comparing to the known gene families of the selected ancestral lineage.
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
22
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
23 ##High-level categories
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
24 #Genes in the "Consistent" category correspond to a gene family known to exist in the selected lineage. Genes in the "Inconsistent" or “Contaminants'' categories correspond to known gene families from different lineages. Such genes are deemed contaminants if more genes than expected by chance correspond to the same species. They are deemed Inconsistent if they correspond to other species seemingly at random. Genes are classified in the “Unknown” category if they do not share enough similarity with known gene families: they may be orphan genes or erroneous protein sequences.
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
25
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
26 ##Subcategories
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
27 #Partial hit proteins are those that share similarity with proteins in known gene families on only part of their sequence: they can indicate poorly defined gene models, structurally divergent genes, or erroneous annotation.
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
28 #Fragmented proteins are those whose length is smaller than the proteins from the gene families they share similarity with (<50% median length): they are likely fragmentend sequences or erroneous annotations.
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
29
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
30 Number of proteins in the whole proteome: 44
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
31
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
32 #Consistent lineage placements
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
33 Total Consistent: 43 (97.73%)
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
34 Consistent, partial hits: 0 (0.00%)
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
35 Consistent, fragmented: 0 (0.00%)
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
36
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
37 #Inconsistent lineage placements
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
38 Total Inconsistent: 1 (2.27%)
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
39 Inconsistent, partial hits: 0 (0.00%)
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
40 Inconsistent, fragmented: 0 (0.00%)
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
41
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
42 #Contaminants
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
43 Total Contaminants: 0 (0.00%)
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
44 Contaminants, partial hits: 0 (0.00%)
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
45 Contaminants, fragmented: 0 (0.00%)
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
46
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
47 #Unknown
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
48 Total Unknown: 0 (0.00%)
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
49
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
50
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
51 SPECIES COMPOSITION
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
52 -------------------
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
53 #This benchmark gives an estimate of the species composition of the dataset, according to HOGs placement. It reports the clades most consistent with the taxonomic distribution of gene families where coding-genes for the query proteomes were placed. The species to which most of the proteins in the query proteome are consistent with is called "Main species." The others are potential contaminants.
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
54 #This section also lists the numbers of proteins that can be associated to each of these clades, based on the taxonomic placement of the gene families they share similarity with.
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
55
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
56
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
57 ##Detected species
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
58
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
59 #Main species
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
60 Clade: Homo sapiens
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
61 Number of associated query proteins: 44 (100.00%)
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
62
ce13b4c42256 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
iuc
parents:
diff changeset
63