view test-data/hd_output.tab @ 31:8beced3064e3 draft

planemo upload for repository https://github.com/monikaheinzl/duplexanalysis_galaxy/tree/master/tools/hd commit 033dd7b750f68e8aa68f327d7d72bd311ddbee4e-dirty
author mheinzl
date Wed, 14 Aug 2019 03:49:32 -0400
parents 6b15b3b6405c
children
line wrap: on
line source

hd_data.tab
nr of tags	20
sample size	20

Hamming distance separated by family size
	FS=1	FS=2	FS=3	FS=4	FS=5-10	FS>10	sum	
HD=1	5	1	1	1	1	0	9	
HD=6	3	0	0	0	0	0	3	
HD=7	4	0	0	0	1	0	5	
HD=8	2	0	0	1	0	0	3	
sum	14	1	1	2	2	0	20	

Family size distribution separated by Hamming distance
	HD=1	HD=2	HD=3	HD=4	HD=5-8	HD>8	sum	
FS=1	5	0	0	0	9	0	14	
FS=2	1	0	0	0	0	0	1	
FS=3	1	0	0	0	0	0	1	
FS=4	1	0	0	0	1	0	2	
FS=6	1	0	0	0	0	0	1	
FS=7	0	0	0	0	1	0	1	
sum	9	0	0	0	11	0	20	


max. family size in sample:	7
absolute frequency:	1
relative frequency:	0.05

The Hamming distances were calculated by comparing the first halve against all halves and selected the minimum value (HD a).
For the second half of the tag, we compared them against all tags which resulted in the minimum HD of the previous step and selected the maximum value (HD b').
Finally, it was possible to calculate the absolute and relative differences between the HDs (absolute and relative delta HD).
These calculations were repeated, but starting with the second half in the first step to find all possible chimeras in the data (HD b and HD  For simplicity we used the maximum value between the delta values in the end.
When only tags that can form DCS were allowed in the analysis, family sizes for the forward and reverse (ab and ba) will be included in the plots.

length of one half of the tag	12

Hamming distance of each half in the tag
	HD DCS	HD b'	HD b	HD a'	HD a+b', a'+b	sum	
HD=0	20	0	8	1	0	29	
HD=1	0	0	1	19	8	28	
HD=2	0	0	0	0	1	1	
HD=5	0	0	3	0	0	3	
HD=6	0	0	2	0	3	5	
HD=7	0	1	6	0	4	11	
HD=8	0	2	0	0	7	9	
HD=9	0	1	0	0	1	2	
HD=10	0	2	0	0	2	4	
HD=11	0	7	0	0	7	14	
HD=12	0	7	0	0	7	14	
sum	20	20	20	20	40	120	

Absolute delta Hamming distance within the tag
	FS=1	FS=2	FS=3	FS=4	FS=5-10	FS>10	sum	
diff=7	1	0	0	0	0	0	1	
diff=8	1	0	0	0	1	0	2	
diff=9	1	0	0	0	0	0	1	
diff=10	2	0	0	0	0	0	2	
diff=11	4	0	1	1	1	0	7	
diff=12	5	1	0	1	0	0	7	
sum	14	1	1	2	2	0	20	

Chimera analysis: relative delta Hamming distance
	FS=1	FS=2	FS=3	FS=4	FS=5-10	FS>10	sum	
diff=1.0	14	1	1	2	2	0	20	
sum	14	1	1	2	2	0	20	

All tags were filtered: only those tags where at least one half was identical (HD=0) and therefore, had a relative delta of 1 were kept. These tags are considered as chimeric.
So the Hamming distances of the chimeric tags are shown.
Hamming distance of chimeric families separated after FS
	FS=1	FS=2	FS=3	FS=4	FS=5-10	FS>10	sum	
HD=7	1	0	0	0	0	0	1	
HD=8	1	0	0	0	1	0	2	
HD=9	1	0	0	0	0	0	1	
HD=10	2	0	0	0	0	0	2	
HD=11	4	0	1	1	1	0	7	
HD=12	5	1	0	1	0	0	7	
sum	14	1	1	2	2	0	20	

Hamming distance of chimeric families separated after DCS and single SSCS
	DCS	SSCS ab	SSCS ba	sum	
HD=7.0	0	0	1	1	
HD=8.0	0	1	1	2	
HD=9.0	0	1	0	1	
HD=10.0	0	1	1	2	
HD=11.0	0	3	4	7	
HD=12.0	0	2	5	7	
sum	0	8	12	20