Mercurial > repos > iuc > omark

diff test-data/output_detailed_summary.txt @ 0:ce13b4c42256 draft
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/omark/ commit 8ff9ada22d22cb94ddfff51bcdd3ab7d30104f1a
author: iuc
date: Wed, 21 Feb 2024 19:27:16 +0000
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/output_detailed_summary.txt	Wed Feb 21 19:27:16 2024 +0000
@@ -0,0 +1,63 @@
+COMPLETENESS ASSESSMENT
+------------
+#This benchmark gives an estimate of the completeness of the gene set based on the presence or not of conserved genes of the target lineage.
+#Conserved genes are defined using Hierarchical Orthologous Groups (HOGs) defined at a certain taxonomic clade, which is a proxy for the ancestral gene repertoire of this clade. HOGs are considered conserved if they have at least one gene in >80% of the extant species. 
+#Because representatives of these groups are expected to be present in the target species repertoire, the proportion of missing HOGs proxies the proportion of missing genes in the total gene repertoire of the target proteome.
+#Ancestral genes used for this benchmark were in single copy in the selected ancestral lineage, but no assumption is made regarding their propensity to duplicate - they are not universal single copy genes. This benchmark reports the proportion of those genes that are found in multiple copies in target proteomes, and whether it corresponds to a known duplication event in descendants of this gene family (Expected) or not (Unexpected).
+
+The clade used was: Hominidae
+Number of conserved HOGs: 17786
+
+#Results on conserved HOGs:
+Single: 39 (0.22%)
+Duplicated: 0 (0.00%)
+Duplicated, Unexpected: 0 (0.00%)
+Duplicated, Expected: 0 (0.00%)
+Missing: 17747 (99.78%)
+
+
+CONSISTENCY ASSESSMENT
+-------------------------
+#This benchmark gives the proportion of annotated protein-coding genes in the query proteome that likely correspond to an actual protein-coding gene by comparing to the known gene families of the selected ancestral lineage.
+
+##High-level categories
+#Genes in the "Consistent" category correspond to a gene family known to exist in the selected lineage. Genes in the "Inconsistent" or “Contaminants'' categories correspond to known gene families from different lineages. Such genes are deemed contaminants if more genes than expected by chance correspond to the same species. They are deemed Inconsistent if they correspond to other species seemingly at random. Genes are classified in the “Unknown” category if they do not share enough similarity with known gene families: they may be orphan genes or erroneous protein sequences.
+
+##Subcategories
+#Partial hit proteins are those that share similarity with proteins in known gene families on only part of their sequence: they can indicate poorly defined gene models, structurally divergent genes, or erroneous annotation. 
+#Fragmented proteins are those whose length is smaller than the proteins from the gene families they share similarity with (<50% median length): they are likely fragmentend sequences or erroneous annotations.
+
+Number of proteins in the whole proteome: 44
+
+#Consistent lineage placements
+Total Consistent: 43 (97.73%)
+Consistent, partial hits: 0 (0.00%)
+Consistent, fragmented: 0 (0.00%)
+
+#Inconsistent lineage placements
+Total Inconsistent: 1 (2.27%)
+Inconsistent, partial hits: 0 (0.00%)
+Inconsistent, fragmented: 0 (0.00%)
+
+#Contaminants
+Total Contaminants: 0 (0.00%)
+Contaminants, partial hits: 0 (0.00%)
+Contaminants, fragmented: 0 (0.00%)
+
+#Unknown
+Total Unknown: 0 (0.00%)
+
+
+SPECIES COMPOSITION
+-------------------
+#This benchmark gives an estimate of the species composition of the dataset, according to HOGs placement. It reports the clades most consistent with the taxonomic distribution of gene families where coding-genes for the query proteomes were placed. The species to which most of the proteins in the query proteome are consistent with is called "Main species." The others are potential contaminants.
+#This section also lists the numbers of proteins that can be associated to each of these clades, based on the taxonomic placement of the gene families they share similarity with.
+
+
+##Detected species
+
+#Main species
+Clade: Homo sapiens
+Number of associated query proteins: 44 (100.00%)
+
+