Galaxy |

MiRDeep2 Mapper (version 2.0.0.8.1)

Pool multiple read sets:

Deep sequencing reads:

Reads in fastq or FASTA format

Remove reads with non-standard nucleotides:

Remove all entries that have a sequence that contains letters other than a,c,g,t,u,n,A,C,G,T,U,N. (-j)

Convert RNA to DNA alphabet (to map against genome):

(-i)

Clip 3' Adapter Sequence:

(-k)

Discard reads shorter than this length:

Set to 0 to keep all reads. (-l)

Collapse reads and/or Map:

(-m) and/or (-p)

Will you select a reference genome from your history or use a built-in index?:

Map to genome. (-p)

Select a reference genome:

If your genome of interest is not listed, contact your Galaxy admin.

Map with one mismatch in the seed (mapping takes longer):

(-q)

A read is allowed to map up to this number of positions in the genome:

Map threshold. (-r)

What it does

The MiRDeep2 Mapper module is designed as a tool to process deep sequencing reads and/or map them to the reference genome. The module works in sequence space, and can process or map data that is in sequence FASTA format. A number of the functions of the mapper module are implemented specifically with Solexa/Illumina data in mind.

Input

Default input is a file in FASTA format, seq.txt or qseq.txt format. More input can be given depending on the options used.

Output

The output depends on the options used. Either a FASTA file with processed reads or an arf file with with mapped reads, or both, are output.

Arf format: Is a proprietary file format generated and processed by miRDeep2. It contains information of reads mapped to a reference genome. Each line in such a file contains 13 columns:

read identifier
length of read sequence
start position in read sequence that is mapped
end position in read sequence that is mapped
read sequence
identifier of the genome-part to which a read is mapped to. This is either a scaffold id or a chromosome name
length of the genome sequence a read is mapped to
start position in the genome where a read is mapped to
end position in the genome where a read is mapped to
genome sequence to which a read is mapped
genome strand information. Plus means the read is aligned to the sense-strand of the genome. Minus means it is aligned to the antisense-strand of the genome.
Number of mismatches in the read mapping
Edit string that indicates matches by lowercase 'm' and mismatches by uppercase 'M'