comparison hd.xml @ 30:46bfbec0f9e6 draft

planemo upload for repository https://github.com/monikaheinzl/duplexanalysis_galaxy/tree/master/tools/hd commit 033dd7b750f68e8aa68f327d7d72bd311ddbee4e-dirty
author mheinzl
date Wed, 07 Aug 2019 04:01:32 -0400
parents 6b15b3b6405c
children 8beced3064e3
comparison
equal deleted inserted replaced
29:6b15b3b6405c 30:46bfbec0f9e6
1 <?xml version="1.0" encoding="UTF-8"?> 1 <?xml version="1.0" encoding="UTF-8"?>
2 <tool id="hd" name="HD:" version="1.0.1"> 2 <tool id="hd" name="HD:" version="1.0.2">
3 <description>hamming distance analysis of duplex tags</description> 3 <description>hamming distance analysis of duplex tags</description>
4 <requirements> 4 <requirements>
5 <requirement type="package" version="2.7">python</requirement> 5 <requirement type="package" version="2.7">python</requirement>
6 <requirement type="package" version="1.4.0">matplotlib</requirement> 6 <requirement type="package" version="1.4.0">matplotlib</requirement>
7 </requirements> 7 </requirements>
85 The output is one PDF file with the plots of the Hamming distance, a tabular file with the data of the plot for each dataset and a tabular file with the chimeric tags. The PDF file contains several panles: 85 The output is one PDF file with the plots of the Hamming distance, a tabular file with the data of the plot for each dataset and a tabular file with the chimeric tags. The PDF file contains several panles:
86 86
87 1. This first page contains a graph representing the Hamming distance stratified by their family sizes. 87 1. This first page contains a graph representing the Hamming distance stratified by their family sizes.
88 2. The second page contains the same informations as the first page but it is plotted the other way around: a family size distribution which is stratified by the Hamming distance. 88 2. The second page contains the same informations as the first page but it is plotted the other way around: a family size distribution which is stratified by the Hamming distance.
89 3. The third page contains the **first step** of the **chimera analysis**: HDs of the individual parts of the tags and their sums. First the tags are splitted into two halves (notated as a and b in the graph) and the minimum HD for part a (=HD a) is calculated. In the next step the data is subsetted by selecting only those tags that showed the minimum HD in half a. The HD of the second half is then calculated by comparing the b halves of the sample to the subset of halves from one step before and look for the maximum HD (=HD b'). Finally, the same approach is repeated but starts this time with the calculation of the minimum HD of part b (=HD b) followed by the calculation of the maximum HD of part a (=HD a') to identify all possible chimeras in the dataset. 89 3. The third page contains the **first step** of the **chimera analysis**: HDs of the individual parts of the tags and their sums. First the tags are splitted into two halves (notated as a and b in the graph) and the minimum HD for part a (=HD a) is calculated. In the next step the data is subsetted by selecting only those tags that showed the minimum HD in half a. The HD of the second half is then calculated by comparing the b halves of the sample to the subset of halves from one step before and look for the maximum HD (=HD b'). Finally, the same approach is repeated but starts this time with the calculation of the minimum HD of part b (=HD b) followed by the calculation of the maximum HD of part a (=HD a') to identify all possible chimeras in the dataset.
90 4. The fourth page contains the **second step** of the **chimera analysis**: the absolute difference between the partial HDs (=delta HD). The HD of a chimeric reads is normally very different between its halves and therefore, the difference (=absolute delta) between those HDs should be very large, which would make it possible to identify chimeras from true molecules. To get a more accurate number of chimeric tags in the later steps, the maximum difference will be selected since the calculation of the HDs of the parts was performed twice for each tag in the third step. 90 4. The fourth page contains the **second step** of the **chimera analysis**: the absolute difference between the partial HDs (=delta HD). The HD of a chimeric reads is normally very different between its halves and therefore, the difference (=absolute delta) between those HDs should be very large, which would make it possible to identify chimeras from true molecules. To get a more accurate number of chimeric tags, the absolute difference that contributed to the maximum relative difference will be choosen since the calculation of the HDs of the parts was performed twice for each tag in the third step.
91 5. The fifth page contains the **third step** of the **chimera analysis**: the relative differences of the partial HDs (=relative delta HD). Since it is not known whether the absolute difference originates due to a low and a very large HD in both halves or one half is completely identical (HD=0) to a second molecule, the relative difference is calculated by dividing the absolute difference by the HD of the whole tag (=sum of the partial HDs). The plot can be interpreted as the following: 91 5. The fifth page contains the **third step** of the **chimera analysis**: the relative differences of the partial HDs (=relative delta HD). Since it is not known whether the absolute difference originates due to a low and a very large HD in both halves or one half is completely identical (HD=0) to a second molecule, the relative difference is calculated by dividing the absolute difference by the HD of the whole tag (=sum of the partial HDs). To get a more accurate number of chimeric tags, the maximum value will be choosen since the calculation of the HDs of the parts was performed twice for each tag in the third step. The plot can be interpreted as the following:
92 92
93 - Low relative differences indicate that the total HD is almost equal split up into partial HDs. This case would be expected, if all tags originate from different molecules. 93 - Low relative differences indicate that the total HD is almost equal split up into partial HDs. This case would be expected, if all tags originate from different molecules.
94 - Higher relative differences occur either due to low total HDs and/or larger absolute differences, both things that indicate that 2 tags were originally the same tag. 94 - Higher relative differences occur either due to low total HDs and/or larger absolute differences, both things that indicate that 2 tags were originally the same tag.
95 - A relative difference of 1 means that one part of the tags is identical. Since it is very unlikely that by chance two different tags have a HD of 0 between one of their parts, the HDs in the other part are probably artificially introduced (chimeric reads). 95 - A relative difference of 1 means that one part of the tags is identical. Since it is very unlikely that by chance two different tags have a HD of 0 between one of their parts, the HDs in the other part are probably artificially introduced (chimeric reads).
96 96
97 6. The last page contains a graph representing the **HD of the chimeric tags** which is at the same time the HD of the non-identical halves of the chimeric tags with a relative difference of 1 from the previous page. 97 6. The sixth page contains a graph representing the **HD of the chimeric tags** which is at the same time the HD of the non-identical halves of the chimeric tags with a relative difference of 1 from the previous page.
98 98
99 7. The last page is only generated when the parameter "only DCS in the analysis?" is **False**. The graph represents the **HD of the chimeric tags** which is at the same time the HD of the non-identical halves of the chimeric tags and indicates if they can form a DCS or not. 99 7. The last page is only generated when the parameter "only DCS in the analysis?" is **False**. The graph represents the **HD of the chimeric tags** which is at the same time the HD of the non-identical halves of the chimeric tags and indicates if they can form a DCS or not.
100 100
101 .. class:: infomark 101 .. class:: infomark
102 102