What it does
TETyper is designed for typing a specific transposable element (TE) of interest from paired-end sequencing data. It determines single nucleotide variants (SNVs) and deletions within the TE, as well as flanking sequences surrounding the TE.
Input
SNP Profiles: A tab-delimited file with the following columns:
SNPs are represented in the format [REF][POSITION][ALT], and separated by pipe (|) characters. SNPs should be ordered by position. Valid alt-bases for heterozygous SNPs are: M,R,W,S,Y,K
For example:
1 none none 2 C8015T none 3 C8015T|T9621C none 4 T7199A|C8015T|T9621C none 6 C7509G|T7917G none N2 none C8015Y N4 none A5178R N5 none C8015Y|T9663Y
Structural Variant Profiles: A tab-delimited file with the following columns:
Structural Variants are represented in the format [START-POSITION]-[END-POSITION], and separated by pipe (|) characters.
For example:
Tn4401b none Tn4401a 7020-7118 Tn4401h 6919-7106 Tn4401_truncC 1-7127|9198-10006
Output
TETyper will produce a tab-seperated output file with the following outputs:
Column | Description |
---|---|
Deletions | A list of sequence ranges corresponding to regions of the reference classified as deletions for this sample, or "none" for no deletions. |
Structural_variant | If --struct_profiles is specified and the pattern of deletions above corresponds to one of these profiles, then the profile name is given, otherwise "unknown". |
SNPs_homozygous | A list of homozygous SNPs identified, or "none". |
SNPs_heterozygous | A list of heterozygous SNPs identified, or "none". |
Heterozygous_SNP_counts | For each heterozygous SNP, the number of reads supporting the reference and alternative calls, or "none" if there are no heterozygous SNPs. |
SNP_variant | If --snp_profiles is specified and the pattern of homozygous and heterozygous SNPs corresponds to one of these profiles, then the profile name is given. Otherwise "unknown". |
Combined_variant | Single name combining Structural_variant and SNP_variant, separated by "-". |
Left_flanks | A list of distinct sequences passing quality filters that flank the start position of the reference. |
Right_flanks | A list of distinct sequences passing quality filters that flank the end position of the reference. |
Left_flank_counts | The number of high quality reads supporting each of the left flanking sequences. |
Right_flank_counts | The number of high quality reads supporting each of the right flanking sequences. |
X_Y_presence | If --show_region is specified as --show_region X-Y, this column shows 1 if the entirety of that region is classified as present (i.e. no overlap with deleted regions), or 0 otherwise. If --show_region is unspecified, this column is omitted. |