# HG changeset patch # User iuc # Date 1651671952 0 # Node ID 04d05400d3a602cc615a01cdd8fb5f88174a5ed3 "planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/adapter_removal/ commit 138d7e0d844a783f1e8100d264d57540199f290f" diff -r 000000000000 -r 04d05400d3a6 adapter_removal.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/adapter_removal.xml Wed May 04 13:45:52 2022 +0000 @@ -0,0 +1,452 @@ + + from HTS data + + macros.xml + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

+ +

+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +

+ +

+ + + + +

+ + + + + + + + + + + + + input_type_cond['input_type'] == 'single' + + + input_type_cond['input_type'] in ['pair', 'paired'] and input_type_cond['interleaved_output'] == 'no' + + + input_type_cond['input_type'] in ['pair', 'paired'] and input_type_cond['interleaved_output'] == 'no' + + + input_type_cond['input_type'] in ['pair', 'paired'] and input_type_cond['interleaved_output'] == 'yes' + + + 'output_singleton' in output_select + + + 'output_collapsed' in output_select + + + 'output_collapsed_truncated' in output_select + + + 'output_discarded' in output_select + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +**What it does** + +Searches for and removes adapter sequences from high-throughput sequencing (HTS) data and optionally trims low quality +bases from the 3' end of reads following adapter removal. This tool can analyze both single-end and paired-end data and +can be used to merge overlapping paired-end reads into longer consensus sequences. The tool can also construct a consensus +adapter sequence for paired-end reads. + +The following outputs, some of which are optional or are only produced under certain scenarios, are produced. + + * **settings** - File containing information on the parameters used in the run as well as verall statistics on the reads after trimming. + * **truncated** - File containing trimmed single-end reads. + * **forward truncated** - File containing trimmed mate1 reads. + * **reverse truncated** - File containing trimmed mate2 reads. + * **interleaved truncated** - File containing trimmed mate 2 reads when --interleaved-output is not enabled. + * **singleton truncated** - File containing trimmed paired reads for which the mate has been discarded. + * **collapsed** - If --collapsed is set, contains overlapping mate-pairs which have been merged into a single read (PE mode) or reads for which the adapter was identified by a minimum overlap, indicating that the entire template molecule is present. This does not include which have subsequently been trimmed due to low-quality or ambiguous nucleotides. + * **collapsed truncated** - Collapsed reads (see --outputcollapsed) which were trimmed due the presence of low-quality or ambiguous nucleotides. + * **discarded** - File containing reads discarded due to the --minlength, --maxlength or --maxns options. + +**General Options** + + * **Attempt to build a consensus adapter sequence from fully overlapping pairs of paired-end reads** - The minimum overlap is controlled by --minalignmentlength. The result will be compared with the values set using --adapter1 and --adapter2. No trimming is performed in this mode. + +**FASTQ Trimming Options** + + * **Adapter sequence expected to be found in mate 1/2 reads, specified in read direction** - For a detailed description of how to provide the appropriate adapter sequences, see the "Adapters" section of the online documentation. + * **Tabular file containing adapter sequences** - The first two columns of each line in the file are expected to correspond to values passed to --adapter1 and --adapter2. In single-end mode, only column one is required. Lines starting with # are ignored. When multiple rows are found in the table, AdapterRemoval will try each adapter (pair), and select the best aligning adapters for each FASTQ read processed. + * **Trim reads only if the overlap between read and the adapter is at least this number of bases long** - In single-end mode, reads are only trimmed if the overlap between read and the adapter is at least X bases long, not counting ambiguous nucleotides (N); this is independent of the --minalignmentlength when using --collapse, allowing a conservative selection of putative complete inserts in single-end mode, while ensuring that all possible adapter contamination is trimmed. + * **The fraction of mismatches allowed in the aligned region** - The allowed fraction of mismatches allowed in the aligned region. If the value is less than 1, then the value is used directly. If --mismatchrate is greater than 1, the rate is set to 1 / --mismatchrate. The default setting is 3 when trimming adapters, corresponding to a maximum mismatch rate of 1/3, and 10 when using --identify-adapters. + * **Slip the alignment by this number of bases in the 5' end to allow for missing bases in the 5' end of the read** - To allow for missing bases in the 5' end of the read, the program can let the alignment slip --shift bases in the 5' end. This corresponds to starting the alignment maximum --shift nucleotides into read2 (for paired-end) or the adapter (for single-end). + * **Trim the 5' of reads by a fixed amount after removing adapters, but before carrying out quality based trimming** - Trim the 5' of reads by a fixed amount after removing adapters, but before carrying out quality based trimming. Specify one value to trim mate 1 and mate 2 reads the same amount, or two values separated by a space to trim each mate different amounts. + * **Trim the 3' of reads by a fixed amount after removing adapters, but before carrying out quality based trimming** - Trim the 3' of reads by a fixed amount. See the descriptiuon of thetrim5p parameter immediately above. + * **Trim consecutive Ns from the 5' and 3' termini** - If quality trimming is also enabled (--trimqualities), then stretches of mixed low-quality bases and/or Ns are trimmed. + * **Window based quality trimming** - If window_size is greater than or equal to 1, that number is used as the window size for all reads. If window_size is greater than or equal to 0 and less than 1, then that number is multiplied by the length of individual reads to determine the window size. If the window length is zero or is greater than the current read length, then the read length is used instead. Reads are trimmed as follows for a given window size: + + 1. The new 5' is determined by locating the first window where both the average quality and the quality of the first base in the window is greater than --minquality. + 2. The new 3' is located by sliding the first window right, until the average quality becomes less than or equal to --minquality. The new 3' is placed at the last base in that window where the quality is greater than or equal to --minquality. + 3. If no 5' position could be determined, the read is discarded. + +**FASTQ Merging Options** + + * **Output complete reads with an 'M\_' name prefix and trimmed reads with an 'MT\_' name prefix** - In paired-end mode, merge overlapping mates into a single and recalculate the quality scores. In single-end mode, attempt to identify templates for which the entire sequence is available. In both cases, complete "collapsed" reads are written with a 'M\_' name prefix, and "collapsed" reads which are trimmed due to quality settings are written with a 'MT\_' name prefix. The overlap needs to be at least --minalignmentlength nucleotides, with a maximum number of mismatches determined by --mm. + * **Collapse deterministically** - Enable deterministic mode; currently only affects --collapse, different overlapping bases with equal quality are set to N quality 0, instead of being randomly sampled. Setting this option also sets --collapse. + * **Collapse conservatively** - Alternative merging algorithm inspired by FASTQ-join: For matching overlapping bases, the highest quality score is used. For mismatching overlapping bases, the highest quality base is used and the quality is set to the absolute difference in Phred-score between the two bases. For mismatching bases with identical quality scores, the base is set to 'N' and the quality score to 0 (Phred-encoded). Setting this option also sets --collapse. + + + diff -r 000000000000 -r 04d05400d3a6 macros.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/macros.xml Wed May 04 13:45:52 2022 +0000 @@ -0,0 +1,44 @@ + + 2.3.3 + 0 + 20.09 + fastqsanger.gz,fastqsanger + + + adapterremoval + + + + + adapterremoval + + + + + + + + + + + + + + + + + + + + + + + + + + + 10.1186/s13104-016-1900-2 + + + + diff -r 000000000000 -r 04d05400d3a6 test-data/reads1.fastq.gz Binary file test-data/reads1.fastq.gz has changed diff -r 000000000000 -r 04d05400d3a6 test-data/reads2.fastq.gz Binary file test-data/reads2.fastq.gz has changed