Galaxy | Tool Preview

MiRDeep2 (version 2.0.1.2+galaxy0)
Reads in fasta format. The identifier should contain a prefix, a runningnumber and a '_x' to indicate the number of reads that have this sequence.There should be no redundancy in the sequences.
Genome contigs in fasta format. The identifiers should be unique.
Reads mapped against genome. Mappings should be in ARF format.
miRBase miRNA sequences in fasta format. These should be the known mature sequences for the species being analyzed.
miRBase miRNA sequences in fasta format. These should be the pooled knownmature sequences for 1-5 species closely related to the species being analyzed.
miRBase miRNA precursor sequences in fasta format. These should be the known precursor sequences for the species being analyzed.
If not searching in a specific species all species in your files will be analyzed. (-t)
From miRBase in fasta format (optional) (-s)
minimum read stack height that triggers analysis. Using this option disablesautomatic estimation of the optimal value and all detected precursors are analyzed. (-a)
Maximum number of precursors to analyze when automatic excision gearing is used. If set to -1 all precursors will be analyzed. (-g).
Minimum score cut-off for predicted novel miRNAs to be displayed in the overview table. (-b)
(-c)
Output fasta files of precursors, mature and star strand for both novel and known miRNAs

What it does

MiRDeep2 is a software package for identification of novel and known miRNAs in deep sequencing data. Furthermore, it can be used for miRNA expression profiling across samples.

Input

A FASTA file with deep sequencing reads, a FASTA file of the corresponding genome, a file of mapped reads to the genome in miRDeep2 arf format, an optional fasta file with known miRNAs of the analysing species and an option fasta file of known miRNAs of related species.

Arf format: Is a proprietary file format generated and processed by miRDeep2. It contains information of reads mapped to a reference genome. Each line in such a file contains 13 columns:

  1. read identifier
  2. length of read sequence
  3. start position in read sequence that is mapped
  4. end position in read sequence that is mapped
  5. read sequence
  6. identifier of the genome-part to which a read is mapped to. This is either a scaffold id or a chromosome name
  7. length of the genome sequence a read is mapped to
  8. start position in the genome where a read is mapped to
  9. end position in the genome where a read is mapped to
  10. genome sequence to which a read is mapped
  11. genome strand information. Plus means the read is aligned to the sense-strand of the genome. Minus means it is aligned to the antisense-strand of the genome.
  12. Number of mismatches in the read mapping
  13. Edit string that indicates matches by lowercase 'm' and mismatches by uppercase 'M'