Galaxy | Tool Preview

RagTag (version 2.1.0+galaxy1)
Advanced options
Advanced options 0

Purpose

RagTag is a collection of software tools for scaffolding and improving modern genome assemblies. Tasks include:


Correct mode

RagTag offers a correction module that uses a reference genome to identify and correct potential misassemblies in a query assembly. RagTag also provides the option to verify putative misassemblies by aligning reads (from the same genotype) to the query assembly and observing read coverage near misassembly break points. In all cases, sequence is never added or subtracted. Query sequences are only broken at points of putative misassembly.

Misassemblies vs true variation

Reference-guided misassembly signatures are sometimes caused by true biological structural variation if the reference and query assemblies represent distinct genotypes (or haplotypes). The read validation feature should help to avoid some of these misassembly false positives, and the validation sensitivity can be tuned with command line parameters. However, it is ultimately up to the discretion of the user to decide if misassembly correction is appropriate. One should validate all RagTag results with independent data (usually physical, optical, or genetic maps), when possible.


Scaffold mode

Scaffolding is the process of ordering and orienting draft assembly (query) sequences into longer sequences. Gaps (stretches of "N" characters) are placed between adjacent query sequences to indicate the presence of unknown sequence. RagTag uses whole-genome alignments to a reference assembly to scaffold query sequences. RagTag does not alter input query sequence in any way and only orders and orients sequences, joining them with gaps.


Patch mode

This mode uses one genome assembly to patch another genome assembly. We define two types of patches:


Merge mode

Draft genome assemblies are often scaffolded multiple times using different approaches. For example, one might scaffold an assembly using different genome maps (physical, linkage, Hi-C, etc.), different methods, or different method parameters. RagTag merge is a tool to merge and reconcile different scaffoldings of the same assembly. In this way, one can leverage the advantages of multiple techniques to synergistically improve scaffolding.

Most tools write scaffolding results in the AGP file format, which encodes adjacency and gap information in a plain text file. To run RagTag merge, one must supply the assembly in FASTA format and at least two AGP files that define a scaffolding of the assembly. Each AGP file can optionally be assigned a weight, allowing users to assign the relative influence of each AGP on the final result.

If available, users can supply Hi-C alignments to the draft assembly to resolve conflicts in the merging graph. In this scenario, the input AGP files are used to build the initial graph, but then Hi-C alignments are used to re-weight the graph before computing the scaffolding solution.

List of accepted restriction enzymes

List of all accepted restriction enzymes and their restriction sites:

  • HindIII: AAGCTT
  • Sau3AI: GATC
  • MboI: GATC
  • DpnII: GATC
  • HinfI: GA[ATCG]TC
  • DdeI: CT[ATCG]AG
  • MseI: TTAA

For RagTag, use a comma separated list of enzymes or sites (or a mix). For example:

  • Arima Hi-C v1.0: Sau3AI,HinfI or GATC,GA[ATCG]TC
  • Arima Hi-C v2.0: Sau3AI,HinfI,DdeI,MseI or GATC,GA[ATCG]TC,CT[ATCG]AG,TTAA

Note that for restriction sites, wildcards are represented with python regex syntax, not IUPAC ambiguity codes. e.g. '[ATCG]' instead of 'N'.

Restriction enzymes are not necessarily the enzyme used for sample prep. Each is only a enzyme that cuts at the corresponding restriction site.