view test-data/td_output.tab @ 0:3e56058d9552 draft default tip

planemo upload for repository https://github.com/monikaheinzl/duplexanalysis_galaxy/tree/master/tools/hd commit 9bae9043a53f1e07b502acd1082450adcb6d9e31-dirty
author mheinzl
date Wed, 16 Oct 2019 04:17:59 -0400
parents
children
line wrap: on
line source

td_data.tab
nr of tags	20
sample size	20

Tag distance separated by family size
	FS=1	FS=2	FS=3	FS=4	FS=5-10	FS>10	sum	
TD=1	5	1	1	1	1	0	9	
TD=6	3	0	0	0	0	0	3	
TD=7	4	0	0	0	1	0	5	
TD=8	2	0	0	1	0	0	3	
sum	14	1	1	2	2	0	20	

Family size distribution separated by Tag distance
	TD=1	TD=2	TD=3	TD=4	TD=5-8	TD>8	sum	
FS=1	5	0	0	0	9	0	14	
FS=2	1	0	0	0	0	0	1	
FS=3	1	0	0	0	0	0	1	
FS=4	1	0	0	0	1	0	2	
FS=6	1	0	0	0	0	0	1	
FS=7	0	0	0	0	1	0	1	
sum	9	0	0	0	11	0	20	


max. family size in sample:	7
absolute frequency:	1
relative frequency:	0.05

Chimera Analysis:
The tags are splitted into two halves (part a and b) for which the Tag distances (TD) are calculated seperately.
The tag distance of the first half (part a) is calculated by comparing part a of the tag in the sample against all a parts in the dataset and by selecting the minimum value (TD a.min).
In the next step, we select those tags that showed the minimum TD and estimate the TD for the second half (part b) of the tag by comparing part b against the previously selected subset.
The maximum value represents then TD b.max. Finally, these process is repeated but starting with part b instead and TD b.min and TD a.max are calculated.
Next, the absolute differences between TD a.min & TD b.max and TD b.min & TD a.max are estimated (delta HD).
These are then divided by the sum of both parts (TD a.min + TD b.max or TD b.min + TD a.max, respectively) which give the relative differences between the partial HDs (rel. delta HD).
For simplicity, we used the maximum value of the relative differences and the respective delta HD.
Note that when only tags that can form a DCS are included in the analysis, the family sizes for both directions (ab and ba) of the strand will be included in the plots.

length of one half of the tag	12

Tag distance of each half in the tag
	TD a.min	TD b.max	TD b.min	TD a.max	TD a.min + b.max, TD a.max + b.min	sum	
TD=0	20	0	8	1	0	29	
TD=1	0	0	1	19	8	28	
TD=2	0	0	0	0	1	1	
TD=5	0	0	3	0	0	3	
TD=6	0	0	2	0	3	5	
TD=7	0	1	6	0	4	11	
TD=8	0	2	0	0	7	9	
TD=9	0	1	0	0	1	2	
TD=10	0	2	0	0	2	4	
TD=11	0	7	0	0	7	14	
TD=12	0	7	0	0	7	14	
sum	20	20	20	20	40	120	

Absolute delta Tag distance within the tag
	FS=1	FS=2	FS=3	FS=4	FS=5-10	FS>10	sum	
diff=7	1	0	0	0	0	0	1	
diff=8	1	0	0	0	1	0	2	
diff=9	1	0	0	0	0	0	1	
diff=10	2	0	0	0	0	0	2	
diff=11	4	0	1	1	1	0	7	
diff=12	5	1	0	1	0	0	7	
sum	14	1	1	2	2	0	20	

Chimera analysis: relative delta Tag distance
	FS=1	FS=2	FS=3	FS=4	FS=5-10	FS>10	sum	
diff=1.0	14	1	1	2	2	0	20	
sum	14	1	1	2	2	0	20	

All tags are filtered and only those tags where one half is identical (TD=0) and therefore, have a relative delta TD of 1, are kept.
These tags are considered as chimeras.
Tag distance of chimeric families separated after FS
	FS=1	FS=2	FS=3	FS=4	FS=5-10	FS>10	sum	
TD=7	1	0	0	0	0	0	1	
TD=8	1	0	0	0	1	0	2	
TD=9	1	0	0	0	0	0	1	
TD=10	2	0	0	0	0	0	2	
TD=11	4	0	1	1	1	0	7	
TD=12	5	1	0	1	0	0	7	
sum	14	1	1	2	2	0	20	

Tag distance of chimeric families separated after DCS and single SSCS (ab, ba)
	DCS	SSCS ab	SSCS ba	sum	
TD=7.0	0	0	1	1	
TD=8.0	0	1	1	2	
TD=9.0	0	1	0	1	
TD=10.0	0	1	1	2	
TD=11.0	0	3	4	7	
TD=12.0	0	2	5	7	
sum	0	8	12	20