Galaxy |

TETyper (version 1.1+galaxy1)

Collection or paired reads:

Forward strand:

FASTQ dataset

Reverse strand:

FASTQ dataset

Select a SNP profile from your history or use one from a tool data table?:

Select a structural variant profile from your history or use one from a tool data table?:

Transposable Element Reference:

Flank Length:

Length of flanking region to extract.

Minimum Reads:

Minimum read number for including a specific flanking sequence.

Minimum Reads (each strand):

Minimum read number for each strand for including a specific flanking sequence.

Minimum Mapped Length:

Minimum length of mapping for a read to be used in determining flanking sequences. Higher values are more robust to spurious mapping. Lower values will recover more reads.

Minimum quality:

Minimum quality value across extracted flanking sequence.

Include log in output:

What it does

TETyper is designed for typing a specific transposable element (TE) of interest from paired-end sequencing data. It determines single nucleotide variants (SNVs) and deletions within the TE, as well as flanking sequences surrounding the TE.

Input

SNP Profiles: A tab-delimited file with the following columns:

Profile ID
Homozygous SNPs
Heterozygous SNPs

SNPs are represented in the format [REF][POSITION][ALT], and separated by pipe (|) characters. SNPs should be ordered by position. Valid alt-bases for heterozygous SNPs are: M,R,W,S,Y,K

For example:

1     none    none
2     C8015T  none
3     C8015T|T9621C   none
4     T7199A|C8015T|T9621C    none
6     C7509G|T7917G   none
N2    none    C8015Y
N4    none    A5178R
N5    none    C8015Y|T9663Y

Structural Variant Profiles: A tab-delimited file with the following columns:

Profile ID
Structural Variants

Structural Variants are represented in the format [START-POSITION]-[END-POSITION], and separated by pipe (|) characters.

For example:

Tn4401b       none
Tn4401a       7020-7118
Tn4401h       6919-7106
Tn4401_truncC 1-7127|9198-10006

Output

TETyper will produce a tab-seperated output file with the following outputs:

Column	Description
Deletions	A list of sequence ranges corresponding to regions of the reference classified as deletions for this sample, or "none" for no deletions.
Structural_variant	If --struct_profiles is specified and the pattern of deletions above corresponds to one of these profiles, then the profile name is given, otherwise "unknown".
SNPs_homozygous	A list of homozygous SNPs identified, or "none".
SNPs_heterozygous	A list of heterozygous SNPs identified, or "none".
Heterozygous_SNP_counts	For each heterozygous SNP, the number of reads supporting the reference and alternative calls, or "none" if there are no heterozygous SNPs.
SNP_variant	If --snp_profiles is specified and the pattern of homozygous and heterozygous SNPs corresponds to one of these profiles, then the profile name is given. Otherwise "unknown".
Combined_variant	Single name combining Structural_variant and SNP_variant, separated by "-".
Left_flanks	A list of distinct sequences passing quality filters that flank the start position of the reference.
Right_flanks	A list of distinct sequences passing quality filters that flank the end position of the reference.
Left_flank_counts	The number of high quality reads supporting each of the left flanking sequences.
Right_flank_counts	The number of high quality reads supporting each of the right flanking sequences.
X_Y_presence	If --show_region is specified as --show_region X-Y, this column shows 1 if the entirety of that region is classified as present (i.e. no overlap with deleted regions), or 0 otherwise. If --show_region is unspecified, this column is omitted.