# HG changeset patch # User iuc # Date 1486575905 18000 # Node ID bf27106652f3c91922c0ddb5069ad38d5a58d50e # Parent da6e10dee68b6836b74499afb399a83b2995b3e6 planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/master/tools/freebayes commit 2bfbb5ae6b801e43355fdc3f964a5111fe3fe3a1 diff -r da6e10dee68b -r bf27106652f3 freebayes.xml --- a/freebayes.xml Sun Sep 25 09:48:42 2016 -0400 +++ b/freebayes.xml Wed Feb 08 12:45:05 2017 -0500 @@ -1,7 +1,10 @@ - - - bayesian genetic variant detector + + bayesian genetic variant detector + + macros.xml + - freebayes + freebayes samtools gawk parallel @@ -9,32 +12,36 @@ - -&1 || echo "Error running samtools faidx for FreeBayes" >&2 && + ln -s -f '${reference_source.ref_file}' '${reference_fasta_filename}' && + samtools faidx '${reference_fasta_filename}' 2>&1 || echo "Error running samtools faidx for FreeBayes" >&2 && #else: #set $reference_fasta_filename = str( $reference_source.ref_file.fields.path ) #end if #for $bam_count, $input_bam in enumerate( $reference_source.input_bams ): - ln -s -f "${input_bam}" "b_${bam_count}.bam" && - ln -s -f "${input_bam.metadata.bam_index}" "b_${bam_count}.bam.bai" && + ln -s -f '${input_bam}' 'b_${bam_count}.bam' && + ln -s -f '${input_bam.metadata.bam_index}' 'b_${bam_count}.bam.bai' && #end for - ## Tabixize optional input_varinat_vcf file (for --variant-input option) + ## Tabixize optional input_variant_vcf file (for --variant-input option) #if ( str( $options_type.options_type_selector ) == 'cline' or str( $options_type.options_type_selector ) == 'full' ) and str( $options_type.optional_inputs.optional_inputs_selector ) == 'set' and str( $options_type.optional_inputs.input_variant_type.input_variant_type_selector ) == "provide_vcf": - ln -s -f "${options_type.optional_inputs.input_variant_type.input_variant_vcf}" "input_variant_vcf.vcf.gz" && - ln -s -f "${Tabixized_input}" "input_variant_vcf.vcf.gz.tbi" && + ln -s -f '${options_type.optional_inputs.input_variant_type.input_variant_vcf}' 'input_variant_vcf.vcf.gz' && + ln -s -f '${Tabixized_input}' 'input_variant_vcf.vcf.gz.tbi' && #end if #for $bam_count, $input_bam in enumerate( $reference_source.input_bams ): - samtools view -H b_${bam_count}.bam | grep "^@SQ" | cut -f 2- | awk '{ gsub("^SN:","",$1); gsub("^LN:","",$2); print $1"\t0\t"$2; }' >> regions_all.bed && + samtools view -H b_${bam_count}.bam | + grep "^@SQ" | + cut -f 2- | + awk '{ gsub("^SN:","",$1); + gsub("^LN:","",$2); + print $1"\t0\t"$2; }' >> regions_all.bed && #end for sort -u regions_all.bed > regions_uniq.bed && @@ -50,182 +57,188 @@ for i in `cat regions_uniq.bed | awk '{print $1":"$2".."$3}'`; do - echo " + echo " + + ## COMMAND LINE STARTS HERE + + freebayes - ## COMMAND LINE STARTS HERE + --region '\$i' - freebayes + #for $bam_count, $input_bam in enumerate( $reference_source.input_bams ): + --bam 'b_${bam_count}.bam' + #end for + --fasta-reference '${reference_fasta_filename}' - --region '\$i' + ## Outputs + --vcf './vcf_output/part_\$i.vcf' - #for $bam_count, $input_bam in enumerate( $reference_source.input_bams ): - --bam 'b_${bam_count}.bam' - #end for - --fasta-reference '${reference_fasta_filename}' + #if str( $target_limit_type.target_limit_type_selector ) == "limit_by_target_file": + --targets '${target_limit_type.input_target_bed}' + #elif str( $target_limit_type.target_limit_type_selector ) == "limit_by_region": + --region '${target_limit_type.region_chromosome}:${target_limit_type.region_start}..${target_limit_type.region_end}' + #end if + + ##advanced options + #if str( $options_type.options_type_selector ) == "simple": + ##do nothing as command like build up to this point is sufficinet for simple diploid calling - ## Outputs - --vcf './vcf_output/part_\$i.vcf' + #elif str( $options_type.options_type_selector ) == "simple_w_filters": + --standard-filters + --min-coverage '${options_type.min_coverage}' + #elif str( $options_type.options_type_selector ) == "naive": + --haplotype-length 0 + --min-alternate-count 1 + --min-alternate-fraction 0 + --pooled-continuous + --report-monomorphic + #elif str( $options_type.options_type_selector ) == "naive_w_filters": + --haplotype-length 0 + --min-alternate-count 1 + --min-alternate-fraction 0 + --pooled-continuous + --report-monomorphic + --standard-filters + --min-coverage '${options_type.min_coverage}' + + ## Command line direct text entry is not allowed at this time for security reasons + #elif str( $options_type.options_type_selector ) == "full": + #if str( $options_type.optional_inputs.optional_inputs_selector ) == 'set': + ${options_type.optional_inputs.report_monomorphic} - #if str( $target_limit_type.target_limit_type_selector ) == "limit_by_target_file": - --targets '${target_limit_type.input_target_bed}' - #elif str( $target_limit_type.target_limit_type_selector ) == "limit_by_region": - --region '${target_limit_type.region_chromosome}:${target_limit_type.region_start}..${target_limit_type.region_end}' - #end if - - ##advanced options - #if str( $options_type.options_type_selector ) == "simple": - ##do nothing as command like build up to this point is sufficinet for simple diploid calling + #if $options_type.optional_inputs.output_trace_option: + --trace ./trace/part_'\$i'.txt + #end if + #if $options_type.optional_inputs.output_failed_alleles_option: + --failed-alleles ./failed_alleles/part_'\$i'.bed + #end if + #if $options_type.optional_inputs.samples: + --samples '${options_type.optional_inputs.samples}' + #end if + #if $options_type.optional_inputs.populations: + --populations '${options_type.optional_inputs.populations}' + #end if + #if $options_type.optional_inputs.A: + --cnv-map '${options_type.optional_inputs.A}' + #end if + #if str( $options_type.optional_inputs.input_variant_type.input_variant_type_selector ) == "provide_vcf": + --variant-input 'input_variant_vcf.vcf.gz' ## input_variant_vcf.vcf.gz is symlinked to a galaxy-generated dataset in "Tabixize optional input_variant_vcf file" section of the command line above + ${options_type.optional_inputs.input_variant_type.only_use_input_alleles} + #end if + #if $options_type.optional_inputs.haplotype_basis_alleles: + --haplotype-basis-alleles '${options_type.optional_inputs.haplotype_basis_alleles}' + #end if + #if $options_type.optional_inputs.observation_bias: + --observation-bias '${options_type.optional_inputs.observation_bias}' + #end if + #if $options_type.optional_inputs.contamination_estimates: + --contamination-estimates '${options_type.optional_inputs.contamination_estimates}' + #end if + #end if - #elif str( $options_type.options_type_selector ) == "simple_w_filters": - --standard-filters - --min-coverage '${options_type.min_coverage}' - #elif str( $options_type.options_type_selector ) == "naive": - --haplotype-length 0 - --min-alternate-count 1 - --min-alternate-fraction 0 - --pooled-continuous - --report-monomorphic - #elif str( $options_type.options_type_selector ) == "naive_w_filters": - --haplotype-length 0 - --min-alternate-count 1 - --min-alternate-fraction 0 - --pooled-continuous - --report-monomorphic - --standard-filters - --min-coverage '${options_type.min_coverage}' + ## REPORTING + #if str( $options_type.reporting.reporting_selector ) == "set": + --pvar ${options_type.reporting.pvar} + #end if + ## POPULATION MODEL + #if str( $options_type.population_model.population_model_selector ) == "set": + --theta '${options_type.population_model.T}' + --ploidy '${options_type.population_model.P}' + ${options_type.population_model.J} + ${options_type.population_model.K} + #end if + + ## REFERENCE ALLELE + #if str( $options_type.reference_allele.reference_allele_selector ) == "set": + ${options_type.reference_allele.Z} + --reference-quality '${options_type.reference_allele.reference_quality}' + #end if - ## Command line direct text entry is not allowed at this time for security reasons - #elif str( $options_type.options_type_selector ) == "full": - #if str( $options_type.optional_inputs.optional_inputs_selector ) == 'set': - ${options_type.optional_inputs.report_monomorphic} + ## ALLELE SCOPE + #if str( $options_type.allele_scope.allele_scope_selector ) == "set": + ${options_type.allele_scope.I} + ${options_type.allele_scope.i} + ${options_type.allele_scope.X} + ${options_type.allele_scope.u} + ${options_type.allele_scope.no_partial_observations} + + -n '${options_type.allele_scope.n}' + + --haplotype-length '${options_type.allele_scope.haplotype_length}' + --min-repeat-size '${options_type.allele_scope.min_repeat_length}' + --min-repeat-entropy '${options_type.allele_scope.min_repeat_entropy}' + #end if + + ## REALIGNMENT + ${options_type.O} - #if $options_type.optional_inputs.output_trace_option: - --trace ./trace/part_'\$i'.txt - #end if - #if $options_type.optional_inputs.output_failed_alleles_option: - --failed-alleles ./failed_alleles/part_'\$i'.bed - #end if - #if $options_type.optional_inputs.samples: - --samples '${options_type.optional_inputs.samples}' - #end if - #if $options_type.optional_inputs.populations: - --populations '${options_type.optional_inputs.populations}' + ##INPUT FILTERS + #if str( $options_type.input_filters.input_filters_selector ) == "set": + ${options_type.input_filters.use_duplicate_reads} + -m '${options_type.input_filters.m}' + -q '${options_type.input_filters.q}' + -R '${options_type.input_filters.R}' + -Y '${options_type.input_filters.Y}' + -e '${options_type.input_filters.e}' + -F '${options_type.input_filters.F}' + -C '${options_type.input_filters.C}' + -G '${options_type.input_filters.G}' + + #if str( $options_type.input_filters.mismatch_filters.mismatch_filters_selector ) == "set": + -Q '${options_type.input_filters.mismatch_filters.Q}' + -U '${options_type.input_filters.mismatch_filters.U}' + -z '${options_type.input_filters.mismatch_filters.z}' + + --read-snp-limit '${options_type.input_filters.mismatch_filters.read_snp_limit}' + #end if + + --min-coverage '${options_type.input_filters.min_coverage}' + --min-alternate-qsum "${options_type.input_filters.min_alternate_qsum}" #end if - #if $options_type.optional_inputs.A: - --cnv-map '${options_type.optional_inputs.A}' + + ## POPULATION AND MAPPABILITY PRIORS + #if str( $options_type.population_mappability_priors.population_mappability_priors_selector ) == "set": + ${options_type.population_mappability_priors.k} + ${options_type.population_mappability_priors.w} + ${options_type.population_mappability_priors.V} + ${options_type.population_mappability_priors.a} #end if - #if str( $options_type.optional_inputs.input_variant_type.input_variant_type_selector ) == "provide_vcf": - --variant-input 'input_variant_vcf.vcf.gz' ## input_variant_vcf.vcf.gz is symlinked to a galaxy-generated dataset in "Tabixize optional input_varinat_vcf file" section of the command line above - ${options_type.optional_inputs.input_variant_type.only_use_input_alleles} + + ## GENOTYPE LIKELIHOODS + #if str( $options_type.genotype_likelihoods.genotype_likelihoods_selector ) == "set": + ${$options_type.genotype_likelihoods.experimental_gls} + + --base-quality-cap '${$options_type.genotype_likelihoods.base_quality_cap}' + --prob-contamination '${$options_type.genotype_likelihoods.prob_contamination}' #end if - #if $options_type.optional_inputs.haplotype_basis_alleles: - --haplotype-basis-alleles '${options_type.optional_inputs.haplotype_basis_alleles}' - #end if - #if $options_type.optional_inputs.observation_bias: - --observation-bias '${options_type.optional_inputs.observation_bias}' - #end if - #if $options_type.optional_inputs.contamination_estimates: - --contamination-estimates '${options_type.optional_inputs.contamination_estimates}' + + ## ALGORITHMIC FEATURES + #if str( $options_type.algorithmic_features.algorithmic_features_selector ) == "set": + -B '${options_type.algorithmic_features.B}' + -W '${options_type.algorithmic_features.W}' + -D '${options_type.algorithmic_features.D}' + + #if str( $options_type.algorithmic_features.genotype_variant_threshold.genotype_variant_threshold_selector ) == "set": + -S '${options_type.algorithmic_features.genotype_variant_threshold.S}' + #end if + + ${options_type.algorithmic_features.N} + ${options_type.algorithmic_features.j} + ${options_type.algorithmic_features.H} + ${options_type.algorithmic_features.genotype_qualities} + ${options_type.algorithmic_features.report_genotype_likelihood_max} + + --genotyping-max-banddepth '${options_type.algorithmic_features.genotyping_max_banddepth}' #end if #end if - ## REPORTING - #if str( $options_type.reporting.reporting_selector ) == "set": - --pvar ${options_type.reporting.pvar} - #end if - ## POPULATION MODEL - #if str( $options_type.population_model.population_model_selector ) == "set": - --theta '${options_type.population_model.T}' - --ploidy '${options_type.population_model.P}' - ${options_type.population_model.J} - ${options_type.population_model.K} - #end if - - ## REFERENCE ALLELE - #if str( $options_type.reference_allele.reference_allele_selector ) == "set": - ${options_type.reference_allele.Z} - --reference-quality '${options_type.reference_allele.reference_quality}' - #end if - - ## ALLELE SCOPE - #if str( $options_type.allele_scope.allele_scope_selector ) == "set": - ${options_type.allele_scope.I} - ${options_type.allele_scope.i} - ${options_type.allele_scope.X} - ${options_type.allele_scope.u} - -n '${options_type.allele_scope.n}' - --haplotype-length '${options_type.allele_scope.haplotype_length}' - --min-repeat-size '${options_type.allele_scope.min_repeat_length}' - --min-repeat-entropy '${options_type.allele_scope.min_repeat_entropy}' - ${options_type.allele_scope.no_partial_observations} - #end if - - ## REALIGNMENT - ${options_type.O} - - ##INPUT FILTERS - #if str( $options_type.input_filters.input_filters_selector ) == "set": - ${options_type.input_filters.use_duplicate_reads} - -m '${options_type.input_filters.m}' - -q '${options_type.input_filters.q}' - -R '${options_type.input_filters.R}' - -Y '${options_type.input_filters.Y}' + "; + done > freebayes_commands.sh && - #if str( $options_type.input_filters.mismatch_filters.mismatch_filters_selector ) == "set": - -Q '${options_type.input_filters.mismatch_filters.Q}' - -U '${options_type.input_filters.mismatch_filters.U}' - -z '${options_type.input_filters.mismatch_filters.z}' - --read-snp-limit '${options_type.input_filters.mismatch_filters.read_snp_limit}' - #end if - - -e '${options_type.input_filters.e}' - -F '${options_type.input_filters.F}' - -C '${options_type.input_filters.C}' - --min-alternate-qsum "${options_type.input_filters.min_alternate_qsum}" - -G '${options_type.input_filters.G}' - --min-coverage '${options_type.input_filters.min_coverage}' - #end if - - ## POPULATION AND MAPPABILITY PRIORS - #if str( $options_type.population_mappability_priors.population_mappability_priors_selector ) == "set": - ${options_type.population_mappability_priors.k} - ${options_type.population_mappability_priors.w} - ${options_type.population_mappability_priors.V} - ${options_type.population_mappability_priors.a} - #end if - - ## GENOTYPE LIKELIHOODS - #if str( $options_type.genotype_likelihoods.genotype_likelihoods_selector ) == "set": - --base-quality-cap '${$options_type.genotype_likelihoods.base_quality_cap}' - ${$options_type.genotype_likelihoods.experimental_gls} - --prob-contamination '${$options_type.genotype_likelihoods.prob_contamination}' - #end if - - ## ALGORITHMIC FEATURES - #if str( $options_type.algorithmic_features.algorithmic_features_selector ) == "set": - ${options_type.algorithmic_features.report_genotype_likelihood_max} - -B '${options_type.algorithmic_features.B}' - --genotyping-max-banddepth '${options_type.algorithmic_features.genotyping_max_banddepth}' - -W '${options_type.algorithmic_features.W}' - ${options_type.algorithmic_features.N} - - #if str( $options_type.algorithmic_features.genotype_variant_threshold.genotype_variant_threshold_selector ) == "set": - -S '${options_type.algorithmic_features.genotype_variant_threshold.S}' - #end if - - ${options_type.algorithmic_features.j} - ${options_type.algorithmic_features.H} - -D '${options_type.algorithmic_features.D}' - ${options_type.algorithmic_features.genotype_qualities} - #end if - #end if - - "; - done > freebayes_commands.sh && - cat freebayes_commands.sh | parallel --no-notice -j \${GALAXY_SLOTS:-1} && + cat freebayes_commands.sh | + parallel --no-notice -j \${GALAXY_SLOTS:-1} && ## make VCF header - grep "^#" "./vcf_output/part_\$i.vcf" > header.txt && for i in `cat regions_uniq.bed | awk '{print $1":"$2".."$3}'`; @@ -233,7 +246,7 @@ ## if this fails then it bails out the script cat "./vcf_output/part_\$i.vcf" | grep -v "^#" || true ; - done | sort -k1,1 -k2,2n -k5,5 -u | cat header.txt - > "${output_vcf}" + done | sort -k1,1 -k2,2n -k5,5 -u | cat header.txt - > '${output_vcf}' #if str( $options_type.options_type_selector ) == "full": #if str( $options_type.optional_inputs.optional_inputs_selector ) == 'set': @@ -256,13 +269,12 @@ #end if #end if #end if -]]> - + ]]> - - + + @@ -278,7 +290,7 @@ + help="You can upload a FASTA sequence to the history and use it as reference" /> @@ -287,93 +299,91 @@ - - - + - + - + - - - - - - - + + + + + + + help="Sets --samples, --populations, --cnv-map, --trace, --failed-alleles, --varinat-input, --only-use-input-alleles, --haplotype-basis-alleles, + --report-all-haplotype-alleles, --report-monomorphic options, --observation-bias, and --contamination-estimates"> + label="Write out failed alleles file" argument="--failed-alleles" /> + label="Write out algorithm trace file" argument="--trace"/> + help="default=By default FreeBayes will analyze all samples in its input BAM files" argument="--samples"/> + help="Each line of FILE should list a sample and a population which it is part of. The population-based bayesian inference model will + then be partitioned on the basis of the populations. [default=False]" + argument="--populations" /> + help="default=copy number is set to as specified by --ploidy. Read a copy number map from the BED file FILE, which has the format: + reference sequence, start, end, sample name, copy number ... for each region in each sample which does not have the default copy number as set by --ploidy." + argument="--cnv-map" /> - - - + - + - + + argument="--haplotype-basis-alleles" /> + label="Report even loci which appear to be monomorphic, and report all considered alleles, even those which are not in called genotypes." + argument="--report-monomorphic" /> + help="The format is [length] [alignment efficiency relative to reference] where the efficiency is 1 if there is no relative observation bias" + argument="--observation-bias" /> + help="The format should be: sample p(read=R|genotype=AR) p(read=A|genotype=AA) Sample '*' can be used to set default contamination estimates." + argument="--contamination-estimates" /> - - - + + - + - - - + + + - - - + + - + help="This serves as the single parameter to the Ewens Sampling Formula prior model. [default = 0.001]" argument='--theta'/> + + help="Model pooled samples using discrete genotypes across pools. When using this flag, set --ploidy to the number of alleles in each sample or use the --cnv-map to define per-sample ploidy. [default=False]" + argument="--pooled-discrete"/> + help="default=False." argument="--poled-continuous" /> - - - + + + help="default=False" argument="--use-reference-allele" /> + help="default=100,60" argument="--reference-quality" /> - - - + + Set alleic scope options - - - + + + + help="default=False" argument="--no-complex" /> + help="Alleles are ranked by the sum of supporting quality scores. Set to 0 to evaluate all. [default=0 (all)]" + argument="--use-best-n-alleles" /> + help="-E --max-complex-gap --haplotype-length; default=3." /> + help="default=5." argument="--min-repeat-size" /> + help="default=0 (off)." argument="--min-repeat-entropy" /> + label="Exclude observations which do not fully span the dynamically-determined detection window" + help="default=use all observations, dividing partial support across matching haplotypes when generating haplotypes." + argument="--no-partial-observations" /> - - - + + - + + + help="Sets -4, -m, -q, -R, -Y, -Q, -U, -z, -$, -e, -0, -F, -C, -3, -G, and -! options."> - + + label="Include duplicate-marked alignments in the analysis." + help="default=False (exclude duplicates marked as such in alignments)." argument="--use-duplicate-reads" /> + help="default=1" argument="--min-mapping-quality" /> + help="default=0" argument="--min-base-quality" /> + help="default=0" argument="--min-supporting-allele-qsum" /> + help="default=0" argument="--min-supporting-mapping-qsum" /> + help="Sets -Q, -U, -z, and $ options"> - - + + + label="Exclude reads with more than N [0,1] fraction of mismatches where each mismatch has base quality >= Q (second option above)" + help="default=1.0" argument="--read-max-mismatch-fraction" /> + value="1000" label="Exclude reads with more than N base mismatches, ignoring gaps with quality >= Q (third option abobe)" + argument="--read-snp-limit" /> - - - + - + help="default=~unbounded" argument="--read-snp-limit" /> + + label="Require at least this fraction of observations supporting an alternate allele within a single individual in the in order to evaluate the position" + help="default=0.2" argument="--min-alternate-fraction" /> + label="Require at least this count of observations supporting an alternate allele within a single individual in order to evaluate the position" + help="default=2" argument="--min-alternate-count" /> + label="Require at least this sum of quality of observations supporting an alternate allele within a single individual in order to evaluate the position" + help="default=0" argument="--min-alternate-qsum" /> - + label="Require at least this count of observations supporting an alternate allele within the total population in order to use the allele in analysis" + help="default=1" argument="--min-alternate-total" /> + - - - + + + help="default=False. Equivalent to --pooled-discrete --hwe-priors-off and removal of Ewens Sampling Formula component of priors." + argument="--no-population-priors" /> + label="Disable estimation of the probability of the combination arising under HWE given the allele frequency as estimated by observation frequency" + help="default=False" argument="--hwe-priors-off" /> + help="default=False. Uses read placement probability, strand balance probability, and read position (5''-3'') probability." + argument="--binomial-obs-priors-off" /> + label="Disable use of aggregate probability of observation balance between alleles as a component of the priors" + help="default=False" + argument="--allele-balance-priors-off" /> - - - + + + help="Sets --base-quality-cap, --experimental-gls, and --prob-contamination options."> - + + label="Generate genotype likelihoods using 'effective base depth' metric qual = 1-BaseQual * 1-MapQual" + help="Incorporate partial observations. This is the default when contamination estimates are provided. Optimized for diploid samples." + argument="--experimental-gls" /> + help="default=10e-9." argument="--prob-contamination" /> - - - + + + help="Sets --report-genotypes-likelihood-max, -B, --genotyping-max-banddepth, -W, -N, S, -j, -H, -D, -= options"> + label="Report genotypes using the maximum-likelihood estimate provided from genotype likelihoods." + help="default=False" argument="--report-genotype-likelihood-max" /> + help="default=1000." argument="--genotyping-max-iterations" /> + help="default=6" argument="--genotyping-max-banddepth" /> + label="Integrate all genotype combinations in our posterior space which include no more than N (1) samples with their Mth (3) best data likelihood" + help="default=1,3" argument="--posterior-integration-limits" /> + label="Skip sample genotypings for which the sample has no supporting reads" + help="default=False" argument="--exclude-unobserved-genotypes" /> + label="Limit posterior integration" argument="--genotype-variant-threshold"> - - - + + label="Limit posterior integration to samples where the second-best genotype likelihood is no more than log(N) from the highest genotype likelihood for the sample." + help="default=~unbounded" argument="--genotype-variant-threshold" /> - + + label="Use a weighted sum of base qualities around an indel, scaled by the distance from the indel" + help="default=use a minimum Base Quality in flanking sequence." argument="--harmonic-indel-quality" /> + help="default=0.9." argument="--read-dependence-factor" /> + label="Calculate the marginal probability of genotypes and report as GQ in each sample field in the VCF output" + help="-= --genotype-qualities; default=False " /> - - - + - - - + - + - + @@ -687,11 +697,11 @@ **Galaxy-specific options** -Galaxy allows six levels of control over FreeBayes options provided by **Choose parameter selection level** menu option. These are: +Galaxy allows five levels of control over FreeBayes options provided by **Choose parameter selection level** menu option. These are: 1. *Simple diploid calling*: The simples possible FreeBayes application. Equvalent of using FreeBayes with only a BAM input and no other parameter options. - 2. *Simple diploid calling with filtering and coverage*: Same as #1 plus two additional options: -0 (standard filters: --min-mapping-quality 30 --min-base-quality 20 --min-supporting-allele-qsum 0 --genotype-varinat-threshold 0) and --min-coverage. - 3. *Frequency-based pooled calling*: This is equivalent to using FreeBayes with the following options: --haplotype-length 0 --min-alternate-count 1 --min-alternate-fraction 0 --pooled-continuous --report-monomorphic. This is the best choice for calling varinats in mixtures such as viral, bacterial, or organellar genomes. + 2. *Simple diploid calling with filtering and coverage*: Same as #1 plus two additional options: -0 (standard filters: --min-mapping-quality 30 --min-base-quality 20 --min-supporting-allele-qsum 0 --genotype-varinat-threshold 0) and --min-coverage. + 3. *Frequency-based pooled calling*: This is equivalent to using FreeBayes with the following options: --haplotype-length 0 --min-alternate-count 1 --min-alternate-fraction 0 --pooled-continuous --report-monomorphic. This is the best choice for calling varinats in mixtures such as viral, bacterial, or organellar genomes. 4. *Frequency-based pooled calling with filtering and coverage*: Same as #3 but adds -0 and --min-coverage like in #2. 5. *Complete list of all options*: Gives you full control by exposing all FreeBayes options as Galaxy widgets. @@ -945,21 +955,10 @@ ------ -**Citation** - -For the underlying tool, please cite `Erik Garrison and Gabor Marth. Haplotype-based variant detection from short-read sequencing <http://arxiv.org/abs/1207.3907>`_. +**Acknowledgments** The initial version of the wrapper was produced by Dan Blankenberg and upgraded by Anton Nekrutenko. TNG was developed by Bjoern Gruening - - - @misc{1207.3907, -Author = {Erik Garrison}, -Title = {Haplotype-based variant detection from short-read sequencing}, -Year = {2012}, -Eprint = {arXiv:1207.3907}, -url = {http://arxiv.org/abs/1207.3907}, -} - + diff -r da6e10dee68b -r bf27106652f3 leftalign.xml --- a/leftalign.xml Sun Sep 25 09:48:42 2016 -0400 +++ b/leftalign.xml Wed Feb 08 12:45:05 2017 -0500 @@ -1,6 +1,9 @@ - + indels in BAM datasets + + macros.xml + freebayes samtools @@ -8,48 +11,48 @@ - + &1 || echo "Error running samtools faidx for leftalign" >&2 && #else: #set $reference_fasta_filename = str( $reference_source.ref_file.fields.path ) #end if - ##finished setting up inputs - ##start leftalign commandline - samtools view -bh "${input_bam}" | bamleftalign - --fasta-reference "${reference_fasta_filename}" - -c - --max-iterations "${iterations}" - ##outputs - > "${output_bam}" - + cat '${input_bam}' | + bamleftalign + --fasta-reference '${reference_fasta_filename}' + -c + --max-iterations "${iterations}" + > '${output_bam}' + ]]> - + - + - + - + - + - + @@ -67,17 +70,7 @@ When calling indels, it is important to homogenize the positional distribution of insertions and deletions in the input by using left realignment. Left realignment will place all indels in homopolymer and microsatellite repeats at the same position, provided that doing so does not introduce mismatches between the read and reference other than the indel. This method is computationally inexpensive and handles the most common classes of alignment inconsistency. -This is leftalign utility from FreeBayes package developed and maintained by Erik Garrison (https://github.com/ekg/freebayes). +This is leftalign utility from FreeBayes package. - - - @misc{1207.3907, - Author = {Erik Garrison}, - Title = {Haplotype-based variant detection from short-read sequencing}, - Year = {2012}, - Eprint = {arXiv:1207.3907}, - url = {http://arxiv.org/abs/1207.3907} - } - - + diff -r da6e10dee68b -r bf27106652f3 macros.xml --- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/macros.xml Wed Feb 08 12:45:05 2017 -0500 @@ -0,0 +1,22 @@ + + 1.0.2.29 + + + + + @misc{1207.3907, + Author = {Erik Garrison}, + Title = {Haplotype-based variant detection from short-read sequencing}, + Year = {2012}, + Eprint = {arXiv:1207.3907}, + url = {http://arxiv.org/abs/1207.3907} + } + + + + + + + +