# HG changeset patch # User bgruening # Date 1564497182 14400 # Node ID b6aa3b6ba129761e95b72dcd261ba9b3da9f6f62 # Parent f211753166bd49a4e62020d0773d7cebaae10031 planemo upload for repository https://github.com/bgruening/galaxytools/tree/master/tools/bismark commit f57cd875407ce987c4897fc352c5db0eeb8e9efe diff -r f211753166bd -r b6aa3b6ba129 bismark_bowtie2_wrapper.xml --- a/bismark_bowtie2_wrapper.xml Tue Jul 30 06:30:36 2019 -0400 +++ b/bismark_bowtie2_wrapper.xml Tue Jul 30 10:33:02 2019 -0400 @@ -1,4 +1,4 @@ - + Bisulfite reads mapper bismark @@ -6,6 +6,8 @@ bowtie2 + + + + + + + + + + + + + + + + + + + + + + + + + + + diff -r f211753166bd -r b6aa3b6ba129 test-data/mapped_reads_mate.bam Binary file test-data/mapped_reads_mate.bam has changed diff -r f211753166bd -r b6aa3b6ba129 test-data/mapping_report_mate.txt --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/mapping_report_mate.txt Tue Jul 30 10:33:02 2019 -0400 @@ -0,0 +1,42 @@ +Bismark report for: input1_fq_1.fq and input1_fq_2.fq (version: v0.22.1) +Bismark was run with Bowtie 2 against the bisulfite genome of /tmp/tmpAHSx4i/ with the specified options: -q -L 20 -D 15 -R 2 --score-min L,0,-0.2 --ignore-quals --no-mixed --no-discordant --dovetail --minins 0 --maxins 500 --quiet +Option '--directional' specified (default mode): alignments to complementary strands (CTOT, CTOB) were ignored (i.e. not performed) + +Final Alignment report +====================== +Sequence pairs analysed in total: 1000 +Number of paired-end alignments with a unique best hit: 0 +Mapping efficiency: 0.0% +Sequence pairs with no alignments under any condition: 1000 +Sequence pairs did not map uniquely: 0 +Sequence pairs which were discarded because genomic sequence could not be extracted: 0 + +Number of sequence pairs with unique best (first) alignment came from the bowtie output: +CT/GA/CT: 0 ((converted) top strand) +GA/CT/CT: 0 (complementary to (converted) top strand) +GA/CT/GA: 0 (complementary to (converted) bottom strand) +CT/GA/GA: 0 ((converted) bottom strand) + +Number of alignments to (merely theoretical) complementary strands being rejected in total: 0 + +Final Cytosine Methylation Report +================================= +Total number of C's analysed: 0 + +Total methylated C's in CpG context: 0 +Total methylated C's in CHG context: 0 +Total methylated C's in CHH context: 0 +Total methylated C's in Unknown context: 0 + +Total unmethylated C's in CpG context: 0 +Total unmethylated C's in CHG context: 0 +Total unmethylated C's in CHH context: 0 +Total unmethylated C's in Unknown context: 0 + +Can't determine percentage of methylated Cs in CpG context if value was 0 +Can't determine percentage of methylated Cs in CHG context if value was 0 +Can't determine percentage of methylated Cs in CHH context if value was 0 +Can't determine percentage of methylated Cs in unknown context (CN or CHN) if value was 0 + + +Bismark completed in 0d 0h 0m 5s diff -r f211753166bd -r b6aa3b6ba129 test-data/summary_mate.txt --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/summary_mate.txt Tue Jul 30 10:33:02 2019 -0400 @@ -0,0 +1,500 @@ +Create a temporary index with the offered files from the user. Utilizing the script: bismark_genome_preparation +Generating index with: 'bismark_genome_preparation --bowtie2 /tmp/tmpAHSx4i' +Writing bisulfite genomes out into a single MFA (multi FastA) file + +Bisulfite Genome Indexer version v0.22.1 (last modified: 14 April 2019) + +Step I - Prepare genome folders - completed + + + +Total number of conversions performed: +C->T: 146875 +G->A: 150504 + +Step II - Genome bisulfite conversions - completed + + +Bismark Genome Preparation - Step III: Launching the Bowtie 2 indexer +Please be aware that this process can - depending on genome size - take several hours! +Settings: + Output files: "BS_CT.*.bt2" + Line rate: 6 (line is 64 bytes) + Lines per side: 1 (side is 64 bytes) + Offset rate: 4 (one in 16) + FTable chars: 10 + Strings: unpacked + Max bucket size: default + Max bucket size, sqrt multiplier: default + Max bucket size, len divisor: 4 + Difference-cover sample period: 1024 + Endianness: little + Actual local endianness: little + Sanity checking: disabled + Assertions: disabled + Random seed: 0 + Sizeofs: void*:8, int:4, long:8, size_t:8 +Input files DNA, FASTA: + genome_mfa.CT_conversion.fa +Building a SMALL index +Reading reference sizes + Time reading reference sizes: 00:00:00 +Calculating joined length +Writing header +Reserving space for joined string +Joining reference sequences + Time to join reference sequences: 00:00:00 +bmax according to bmaxDivN setting: 189039 +Using parameters --bmax 141780 --dcv 1024 + Doing ahead-of-time memory usage test + Passed! Constructing with these parameters: --bmax 141780 --dcv 1024 +Constructing suffix-array element generator +Building DifferenceCoverSample + Building sPrime + Building sPrimeOrder + V-Sorting samples + V-Sorting samples time: 00:00:00 + Allocating rank array + Ranking v-sort output + Ranking v-sort output time: 00:00:00 + Invoking Larsson-Sadakane on ranks + Invoking Larsson-Sadakane on ranks time: 00:00:00 + Sanity-checking and returning +Building samples +Reserving space for 12 sample suffixes +Generating random suffixes +QSorting 12 sample offsets, eliminating duplicates +QSorting sample offsets, eliminating duplicates time: 00:00:00 +Multikey QSorting 12 samples + (Using difference cover) + Multikey QSorting samples time: 00:00:00 +Calculating bucket sizes +Splitting and merging + Splitting and merging time: 00:00:00 +Avg bucket size: 756159 (target: 141779) +Converting suffix-array elements to index image +Allocating ftab, absorbFtab +Entering Ebwt loop +Getting block 1 of 1 + No samples; assembling all-inclusive block + Sorting block of length 756159 for bucket 1 + (Using difference cover) + Sorting block time: 00:00:00 +Returning block of 756160 for bucket 1 +Exited Ebwt loop +fchr[A]: 0 +fchr[C]: 235897 +fchr[G]: 235897 +fchr[T]: 386401 +fchr[$]: 756159 +Exiting Ebwt::buildToDisk() +Returning from initFromVector +Wrote 4446745 bytes to primary EBWT file: BS_CT.1.bt2 +Wrote 189044 bytes to secondary EBWT file: BS_CT.2.bt2 +Re-opening _in1 and _in2 as input streams +Returning from Ebwt constructor +Headers: + len: 756159 + bwtLen: 756160 + sz: 189040 + bwtSz: 189040 + lineRate: 6 + offRate: 4 + offMask: 0xfffffff0 + ftabChars: 10 + eftabLen: 20 + eftabSz: 80 + ftabLen: 1048577 + ftabSz: 4194308 + offsLen: 47260 + offsSz: 189040 + lineSz: 64 + sideSz: 64 + sideBwtSz: 48 + sideBwtLen: 192 + numSides: 3939 + numLines: 3939 + ebwtTotLen: 252096 + ebwtTotSz: 252096 + color: 0 + reverse: 0 +Total time for call to driver() for forward index: 00:00:00 +Reading reference sizes + Time reading reference sizes: 00:00:00 +Calculating joined length +Writing header +Reserving space for joined string +Joining reference sequences + Time to join reference sequences: 00:00:00 + Time to reverse reference sequence: 00:00:00 +bmax according to bmaxDivN setting: 189039 +Using parameters --bmax 141780 --dcv 1024 + Doing ahead-of-time memory usage test + Passed! Constructing with these parameters: --bmax 141780 --dcv 1024 +Constructing suffix-array element generator +Building DifferenceCoverSample + Building sPrime + Building sPrimeOrder + V-Sorting samples + V-Sorting samples time: 00:00:00 + Allocating rank array + Ranking v-sort output + Ranking v-sort output time: 00:00:00 + Invoking Larsson-Sadakane on ranks + Invoking Larsson-Sadakane on ranks time: 00:00:00 + Sanity-checking and returning +Building samples +Reserving space for 12 sample suffixes +Generating random suffixes +QSorting 12 sample offsets, eliminating duplicates +QSorting sample offsets, eliminating duplicates time: 00:00:00 +Multikey QSorting 12 samples + (Using difference cover) + Multikey QSorting samples time: 00:00:00 +Calculating bucket sizes +Splitting and merging + Splitting and merging time: 00:00:00 +Avg bucket size: 756159 (target: 141779) +Converting suffix-array elements to index image +Allocating ftab, absorbFtab +Entering Ebwt loop +Getting block 1 of 1 + No samples; assembling all-inclusive block + Sorting block of length 756159 for bucket 1 + (Using difference cover) + Sorting block time: 00:00:00 +Returning block of 756160 for bucket 1 +Exited Ebwt loop +fchr[A]: 0 +fchr[C]: 235897 +fchr[G]: 235897 +fchr[T]: 386401 +fchr[$]: 756159 +Exiting Ebwt::buildToDisk() +Returning from initFromVector +Wrote 4446745 bytes to primary EBWT file: BS_CT.rev.1.bt2 +Wrote 189044 bytes to secondary EBWT file: BS_CT.rev.2.bt2 +Re-opening _in1 and _in2 as input streams +Returning from Ebwt constructor +Headers: + len: 756159 + bwtLen: 756160 + sz: 189040 + bwtSz: 189040 + lineRate: 6 + offRate: 4 + offMask: 0xfffffff0 + ftabChars: 10 + eftabLen: 20 + eftabSz: 80 + ftabLen: 1048577 + ftabSz: 4194308 + offsLen: 47260 + offsSz: 189040 + lineSz: 64 + sideSz: 64 + sideBwtSz: 48 + sideBwtLen: 192 + numSides: 3939 + numLines: 3939 + ebwtTotLen: 252096 + ebwtTotSz: 252096 + color: 0 + reverse: 1 +Total time for backward call to driver() for mirror index: 00:00:01 +Settings: + Output files: "BS_GA.*.bt2" + Line rate: 6 (line is 64 bytes) + Lines per side: 1 (side is 64 bytes) + Offset rate: 4 (one in 16) + FTable chars: 10 + Strings: unpacked + Max bucket size: default + Max bucket size, sqrt multiplier: default + Max bucket size, len divisor: 4 + Difference-cover sample period: 1024 + Endianness: little + Actual local endianness: little + Sanity checking: disabled + Assertions: disabled + Random seed: 0 + Sizeofs: void*:8, int:4, long:8, size_t:8 +Input files DNA, FASTA: + genome_mfa.GA_conversion.fa +Building a SMALL index +Reading reference sizes + Time reading reference sizes: 00:00:00 +Calculating joined length +Writing header +Reserving space for joined string +Joining reference sequences + Time to join reference sequences: 00:00:00 +bmax according to bmaxDivN setting: 189039 +Using parameters --bmax 141780 --dcv 1024 + Doing ahead-of-time memory usage test + Passed! Constructing with these parameters: --bmax 141780 --dcv 1024 +Constructing suffix-array element generator +Building DifferenceCoverSample + Building sPrime + Building sPrimeOrder + V-Sorting samples + V-Sorting samples time: 00:00:00 + Allocating rank array + Ranking v-sort output + Ranking v-sort output time: 00:00:00 + Invoking Larsson-Sadakane on ranks + Invoking Larsson-Sadakane on ranks time: 00:00:00 + Sanity-checking and returning +Building samples +Reserving space for 12 sample suffixes +Generating random suffixes +QSorting 12 sample offsets, eliminating duplicates +QSorting sample offsets, eliminating duplicates time: 00:00:00 +Multikey QSorting 12 samples + (Using difference cover) + Multikey QSorting samples time: 00:00:00 +Calculating bucket sizes +Splitting and merging + Splitting and merging time: 00:00:00 +Avg bucket size: 756159 (target: 141779) +Converting suffix-array elements to index image +Allocating ftab, absorbFtab +Entering Ebwt loop +Getting block 1 of 1 + No samples; assembling all-inclusive block + Sorting block of length 756159 for bucket 1 + (Using difference cover) + Sorting block time: 00:00:00 +Returning block of 756160 for bucket 1 +Exited Ebwt loop +fchr[A]: 0 +fchr[C]: 386401 +fchr[G]: 533276 +fchr[T]: 533276 +fchr[$]: 756159 +Exiting Ebwt::buildToDisk() +Returning from initFromVector +Wrote 4446745 bytes to primary EBWT file: BS_GA.1.bt2 +Wrote 189044 bytes to secondary EBWT file: BS_GA.2.bt2 +Re-opening _in1 and _in2 as input streams +Returning from Ebwt constructor +Headers: + len: 756159 + bwtLen: 756160 + sz: 189040 + bwtSz: 189040 + lineRate: 6 + offRate: 4 + offMask: 0xfffffff0 + ftabChars: 10 + eftabLen: 20 + eftabSz: 80 + ftabLen: 1048577 + ftabSz: 4194308 + offsLen: 47260 + offsSz: 189040 + lineSz: 64 + sideSz: 64 + sideBwtSz: 48 + sideBwtLen: 192 + numSides: 3939 + numLines: 3939 + ebwtTotLen: 252096 + ebwtTotSz: 252096 + color: 0 + reverse: 0 +Total time for call to driver() for forward index: 00:00:00 +Reading reference sizes + Time reading reference sizes: 00:00:00 +Calculating joined length +Writing header +Reserving space for joined string +Joining reference sequences + Time to join reference sequences: 00:00:00 + Time to reverse reference sequence: 00:00:00 +bmax according to bmaxDivN setting: 189039 +Using parameters --bmax 141780 --dcv 1024 + Doing ahead-of-time memory usage test + Passed! Constructing with these parameters: --bmax 141780 --dcv 1024 +Constructing suffix-array element generator +Building DifferenceCoverSample + Building sPrime + Building sPrimeOrder + V-Sorting samples + V-Sorting samples time: 00:00:00 + Allocating rank array + Ranking v-sort output + Ranking v-sort output time: 00:00:00 + Invoking Larsson-Sadakane on ranks + Invoking Larsson-Sadakane on ranks time: 00:00:00 + Sanity-checking and returning +Building samples +Reserving space for 12 sample suffixes +Generating random suffixes +QSorting 12 sample offsets, eliminating duplicates +QSorting sample offsets, eliminating duplicates time: 00:00:00 +Multikey QSorting 12 samples + (Using difference cover) + Multikey QSorting samples time: 00:00:00 +Calculating bucket sizes +Splitting and merging + Splitting and merging time: 00:00:00 +Avg bucket size: 756159 (target: 141779) +Converting suffix-array elements to index image +Allocating ftab, absorbFtab +Entering Ebwt loop +Getting block 1 of 1 + No samples; assembling all-inclusive block + Sorting block of length 756159 for bucket 1 + (Using difference cover) + Sorting block time: 00:00:00 +Returning block of 756160 for bucket 1 +Exited Ebwt loop +fchr[A]: 0 +fchr[C]: 386401 +fchr[G]: 533276 +fchr[T]: 533276 +fchr[$]: 756159 +Exiting Ebwt::buildToDisk() +Returning from initFromVector +Wrote 4446745 bytes to primary EBWT file: BS_GA.rev.1.bt2 +Wrote 189044 bytes to secondary EBWT file: BS_GA.rev.2.bt2 +Re-opening _in1 and _in2 as input streams +Returning from Ebwt constructor +Headers: + len: 756159 + bwtLen: 756160 + sz: 189040 + bwtSz: 189040 + lineRate: 6 + offRate: 4 + offMask: 0xfffffff0 + ftabChars: 10 + eftabLen: 20 + eftabSz: 80 + ftabLen: 1048577 + ftabSz: 4194308 + offsLen: 47260 + offsSz: 189040 + lineSz: 64 + sideSz: 64 + sideBwtSz: 48 + sideBwtLen: 192 + numSides: 3939 + numLines: 3939 + ebwtTotLen: 252096 + ebwtTotSz: 252096 + color: 0 + reverse: 1 +Total time for backward call to driver() for mirror index: 00:00:01 +Running bismark with: 'bismark --bam --gzip --temp_dir /tmp/tmp86syD7 -o /tmp/tmp86syD7/results --quiet --fastq -L 20 -D 15 -R 2 --un --ambiguous /tmp/tmpAHSx4i -1 input1_fq_1.fq -2 input1_fq_2.fq -I 0 -X 500' +Bowtie 2 seems to be working fine (tested command 'bowtie2 --version' [2.3.5]) +Output format is BAM (default) +Alignments will be written out in BAM format. Samtools found here: '/home/abretaud/miniconda3/envs/mulled-v1-9f2317dbfb405ed6926c55752e5c11678eee3256a6ea680d1c0f912251153030/bin/samtools' +Reference genome folder provided is /tmp/tmpAHSx4i/ (absolute path is '/tmp/tmpAHSx4i/)' +FastQ format specified + +Input files to be analysed (in current folder '/tmp/tmpFC2FCZ/job_working_directory/000/4/working'): +input1_fq_1.fq +input1_fq_2.fq +Library is assumed to be strand-specific (directional), alignments to strands complementary to the original top or bottom strands will be ignored (i.e. not performed!) +Created output directory /tmp/tmp86syD7/results/! + +Output will be written into the directory: /tmp/tmp86syD7/results/ + +Using temp directory: /tmp/tmp86syD7 +Temporary files will be written into the directory: /tmp/tmp86syD7/ +Setting parallelization to single-threaded (default) + +Summary of all aligner options: -q -L 20 -D 15 -R 2 --score-min L,0,-0.2 --ignore-quals --no-mixed --no-discordant --dovetail --minins 0 --maxins 500 --quiet +Current working directory is: /tmp/tmpFC2FCZ/job_working_directory/000/4/working + +Now reading in and storing sequence information of the genome specified in: /tmp/tmpAHSx4i/ + +chr chrY_JH584300_random (182347 bp) +chr chrY_JH584301_random (259875 bp) +chr chrY_JH584302_random (155838 bp) +chr chrY_JH584303_random (158099 bp) + +Single-core mode: setting pid to 1 + +Paired-end alignments will be performed +======================================= + +The provided filenames for paired-end alignments are input1_fq_1.fq and input1_fq_2.fq +Input files are in FastQ format +Writing a C -> T converted version of the input file input1_fq_1.fq to /tmp/tmp86syD7/input1_fq_1.fq_C_to_T.fastq.gz + +Created C -> T converted version of the FastQ file input1_fq_1.fq (1000 sequences in total) + +Writing a G -> A converted version of the input file input1_fq_2.fq to /tmp/tmp86syD7/input1_fq_2.fq_G_to_A.fastq.gz + +Created G -> A converted version of the FastQ file input1_fq_2.fq (1000 sequences in total) + +Input files are input1_fq_1.fq_C_to_T.fastq.gz and input1_fq_2.fq_G_to_A.fastq.gz (FastQ) +Now running 2 instances of Bowtie 2 against the bisulfite genome of /tmp/tmpAHSx4i/ with the specified options: -q -L 20 -D 15 -R 2 --score-min L,0,-0.2 --ignore-quals --no-mixed --no-discordant --dovetail --minins 0 --maxins 500 --quiet + +Now starting a Bowtie 2 paired-end alignment for CTread1GAread2CTgenome (reading in sequences from /tmp/tmp86syD7/input1_fq_1.fq_C_to_T.fastq.gz and /tmp/tmp86syD7/input1_fq_2.fq_G_to_A.fastq.gz, with the options: -q -L 20 -D 15 -R 2 --score-min L,0,-0.2 --ignore-quals --no-mixed --no-discordant --dovetail --minins 0 --maxins 500 --quiet --norc)) +Found first alignment: +1_1/1 77 * 0 0 * * 0 0 TTGTATATATTAGATAAATTAATTTTTTTTGTTTGTATGTTAAATTTTTTAATTAATTTATTAATATTTTGTGAATTTTTAGATA AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEAEEEEEE YT:Z:UP +1_1/2 141 * 0 0 * * 0 0 TTATATATATTAAATAAATTAATTTTTTTTATTTATATATTAAATTTTTTAATTAATTTATTAATATTTTATAAATTTTTAAATA AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEAEEEEEE YT:Z:UP +Now starting a Bowtie 2 paired-end alignment for CTread1GAread2GAgenome (reading in sequences from /tmp/tmp86syD7/input1_fq_1.fq_C_to_T.fastq.gz and /tmp/tmp86syD7/input1_fq_2.fq_G_to_A.fastq.gz, with the options: -q -L 20 -D 15 -R 2 --score-min L,0,-0.2 --ignore-quals --no-mixed --no-discordant --dovetail --minins 0 --maxins 500 --quiet --nofw)) +Found first alignment: +1_1/1 77 * 0 0 * * 0 0 TTGTATATATTAGATAAATTAATTTTTTTTGTTTGTATGTTAAATTTTTTAATTAATTTATTAATATTTTGTGAATTTTTAGATA AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEAEEEEEE YT:Z:UP +1_1/2 141 * 0 0 * * 0 0 TTATATATATTAAATAAATTAATTTTTTTTATTTATATATTAAATTTTTTAATTAATTTATTAATATTTTATAAATTTTTAAATA AAAAAEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEEAEEAEEEEEE YT:Z:UP + +>>> Writing bisulfite mapping results to input1_fq_1_bismark_bt2_pe.bam <<< + +Unmapped sequences will be written to input1_fq_1.fq_unmapped_reads_1.fq.gz and input1_fq_2.fq_unmapped_reads_2.fq.gz +Ambiguously mapping sequences will be written to input1_fq_1.fq_ambiguous_reads_1.fq.gz and input1_fq_2.fq_ambiguous_reads_2.fq.gz + +Reading in the sequence files input1_fq_1.fq and input1_fq_2.fq +Processed 1000 sequences in total + + +Successfully deleted the temporary files /tmp/tmp86syD7/input1_fq_1.fq_C_to_T.fastq.gz and /tmp/tmp86syD7/input1_fq_2.fq_G_to_A.fastq.gz + +Final Alignment report +====================== +Sequence pairs analysed in total: 1000 +Number of paired-end alignments with a unique best hit: 0 +Mapping efficiency: 0.0% + +Sequence pairs with no alignments under any condition: 1000 +Sequence pairs did not map uniquely: 0 +Sequence pairs which were discarded because genomic sequence could not be extracted: 0 + +Number of sequence pairs with unique best (first) alignment came from the bowtie output: +CT/GA/CT: 0 ((converted) top strand) +GA/CT/CT: 0 (complementary to (converted) top strand) +GA/CT/GA: 0 (complementary to (converted) bottom strand) +CT/GA/GA: 0 ((converted) bottom strand) + +Number of alignments to (merely theoretical) complementary strands being rejected in total: 0 + +Final Cytosine Methylation Report +================================= +Total number of C's analysed: 0 + +Total methylated C's in CpG context: 0 +Total methylated C's in CHG context: 0 +Total methylated C's in CHH context: 0 +Total methylated C's in Unknown context: 0 + +Total unmethylated C's in CpG context: 0 +Total unmethylated C's in CHG context: 0 +Total unmethylated C's in CHH context: 0 +Total unmethylated C's in Unknown context: 0 + +Can't determine percentage of methylated Cs in CpG context if value was 0 +Can't determine percentage of methylated Cs in CHG context if value was 0 +Can't determine percentage of methylated Cs in CHH context if value was 0 +Can't determine percentage of methylated Cs in unknown context (CN or CHN) if value was 0 + + +Bismark completed in 0d 0h 0m 5s + +==================== +Bismark run complete +==================== +