Galaxy | Tool Preview

TETyper (version 1.1+galaxy1)
FASTQ dataset
FASTQ dataset
Length of flanking region to extract.
Minimum read number for including a specific flanking sequence.
Minimum read number for each strand for including a specific flanking sequence.
Minimum length of mapping for a read to be used in determining flanking sequences. Higher values are more robust to spurious mapping. Lower values will recover more reads.
Minimum quality value across extracted flanking sequence.

What it does

TETyper is designed for typing a specific transposable element (TE) of interest from paired-end sequencing data. It determines single nucleotide variants (SNVs) and deletions within the TE, as well as flanking sequences surrounding the TE.

Input

SNP Profiles: A tab-delimited file with the following columns:

  1. Profile ID
  2. Homozygous SNPs
  3. Heterozygous SNPs

SNPs are represented in the format [REF][POSITION][ALT], and separated by pipe (|) characters. SNPs should be ordered by position. Valid alt-bases for heterozygous SNPs are: M,R,W,S,Y,K

For example:

1     none    none
2     C8015T  none
3     C8015T|T9621C   none
4     T7199A|C8015T|T9621C    none
6     C7509G|T7917G   none
N2    none    C8015Y
N4    none    A5178R
N5    none    C8015Y|T9663Y

Structural Variant Profiles: A tab-delimited file with the following columns:

  1. Profile ID
  2. Structural Variants

Structural Variants are represented in the format [START-POSITION]-[END-POSITION], and separated by pipe (|) characters.

For example:

Tn4401b       none
Tn4401a       7020-7118
Tn4401h       6919-7106
Tn4401_truncC 1-7127|9198-10006

Output

TETyper will produce a tab-seperated output file with the following outputs:

Column Description
Deletions A list of sequence ranges corresponding to regions of the reference classified as deletions for this sample, or "none" for no deletions.
Structural_variant If --struct_profiles is specified and the pattern of deletions above corresponds to one of these profiles, then the profile name is given, otherwise "unknown".
SNPs_homozygous A list of homozygous SNPs identified, or "none".
SNPs_heterozygous A list of heterozygous SNPs identified, or "none".
Heterozygous_SNP_counts For each heterozygous SNP, the number of reads supporting the reference and alternative calls, or "none" if there are no heterozygous SNPs.
SNP_variant If --snp_profiles is specified and the pattern of homozygous and heterozygous SNPs corresponds to one of these profiles, then the profile name is given. Otherwise "unknown".
Combined_variant Single name combining Structural_variant and SNP_variant, separated by "-".
Left_flanks A list of distinct sequences passing quality filters that flank the start position of the reference.
Right_flanks A list of distinct sequences passing quality filters that flank the end position of the reference.
Left_flank_counts The number of high quality reads supporting each of the left flanking sequences.
Right_flank_counts The number of high quality reads supporting each of the right flanking sequences.
X_Y_presence If --show_region is specified as --show_region X-Y, this column shows 1 if the entirety of that region is classified as present (i.e. no overlap with deleted regions), or 0 otherwise. If --show_region is unspecified, this column is omitted.