Galaxy | Tool Preview

pyCalculateMutationFrequencies (version 1.0.0)
GFF file containing read data
GFF file containing interval co-ordinates
This file should have two columns: first column is the names of the chromosomes, second column is length of the chromosomes.Use pyCrac utility pyCalculateChromosomeLengths to create.
sets the minimal mutations frequency for an interval that you want to have written to our output file

pyCalculateMutationFrequencies

pyCalculateMutationFrequencies is part of the pyCRAC package. Takes an interval file and a pyReadCounters GTF file and calculates (cross-linking induced) mutation frequencies fore each interval. This tool can be used to calculate mutation frequencies for significant intervals (pyCalculateFDRs output file) or over-represented motifs (pyMotif GTF output file). It expects a pyCRAC GTF count_output_reads.gtf file and a GTF file with the intervals.

For example:

This pyCalculateFDRs GTF output file::

    # generated by pyCalculateFDRs version 0.0.3, Sat Jun  1 21:16:23 2013
    # pyCalculateFDRs.py -f test_count_output_reads.gtf -r 200 -o test_count_output_FDRs_005.gtf -v -m 0.05
    # chromosome    feature source  start   end     minimal_coverage        strand  .       attributes
    chrII   protein_coding  exon    203838  203887  3       +       .       gene_id "YBL011W"; gene_name "SCT1";
    chrII   intergenic_region       exon    407669  407708  3       +       .       gene_id "INT_0_445"; gene_name "INT_0_445";
    chrII   intergenic_region       exon    585158  585195  2       +       .       gene_id "INT_0_562"; gene_name "INT_0_562";
    chrII   protein_coding  exon    372390  372433  4       -       .       gene_id "YBR067C"; gene_name "TIP1";
    chrII   intergenic_region       exon    380754  380815  6       -       .       gene_id "INT_0_431"; gene_name "INT_0_431";
    chrIII  protein_coding  exon    138001  138044  5       +       .       gene_id "YCR012W"; gene_name "PGK1";
    chrIII  intergenic_region       exon    227997  228036  5       +       .       gene_id "INT_0_885"; gene_name "INT_0_885";
    chrIII  intergenic_region       exon    227997  228037  4       +       .       gene_id "INT_0_887"; gene_name "INT_0_887";
    chrIII  tRNA    exon    227997  228037  4       +       .       gene_id "tS(CGA)C"; gene_name "SUP61";

Will be converted into::

    # generated by pyCalculateFDRs version 0.0.3, Sat Jun  1 21:16:23 2013
    # /Library/Frameworks/EPD64.framework/Versions/Current/bin/pyCalculateFDRs.py -f test_count_output_reads.gtf -r 200 -o test_count_output_FDRs_005.gtf -v -m 0.05
    # chromosome    feature source  start   end     minimal_coverage        strand  .       attributes
    chrII   protein_coding  exon    203838  203887  3       +       .       gene_id "YBL011W"; gene_name "SCT1"; # 203882D33.3,203883D33.3,203884D33.3;
    chrII   intergenic_region       exon    407669  407708  3       +       .       gene_id "INT_0_445"; gene_name "INT_0_445"; # 407680D33.3,407681D33.3;
    chrII   intergenic_region       exon    585158  585195  2       +       .       gene_id "INT_0_562"; gene_name "INT_0_562"; # 585171D100.0,585172D100.0,585173D100.0;
    chrII   protein_coding  exon    372390  372433  4       -       .       gene_id "YBR067C"; gene_name "TIP1"; # 372412D50.0,372413D50.0;
    chrII   intergenic_region       exon    380754  380815  6       -       .       gene_id "INT_0_431"; gene_name "INT_0_431"; # 380786D90.2,380787D90.2;
    chrIII  protein_coding  exon    138001  138044  5       +       .       gene_id "YCR012W"; gene_name "PGK1"; # 138025D40.0,138026D30.0,138027D40.0;
    chrIII  intergenic_region       exon    227997  228036  5       +       .       gene_id "INT_0_885"; gene_name "INT_0_885"; # 228006D85.7,228007D100.0;
    chrIII  intergenic_region       exon    227997  228037  4       +       .       gene_id "INT_0_887"; gene_name "INT_0_887"; # 228006D85.7,228007D100.0;
    chrIII  tRNA    exon    227997  228037  4       +       .       gene_id "tS(CGA)C"; gene_name "SUP61"; # 228006D85.7,228007D100.0;

The hash character at the end of each line (#) shows chromosomal coordinates of mutated nucleotides within the cluster interval and their mutation frequencies.

For example:

# 228007D100.0

indicates that 100% of the nucleotides in position 228007 were deleted in the interval.

By setting the --mutsfreq flag you can set a limit for the lowest mutation frequency that you want to have reported. This makes it relatively easy to select those significant regions that have nucleotides with high mutation frequencies.


Parameter list

Options:

-i intervals.gtf, --intervaldatafile=intervals.gtf
                      provide the path to your GTF interval data file.
-r reads.gtf, --readdatafile=reads.gtf
                      provide the path to your GTF read data file.
-c yeast.txt, --chromfile=yeast.txt
                      Location of the chromosome info file. This file should
                      have two columns: first column is the names of the
                      chromosomes, second column is length of the
                      chromosomes. Default is yeast
-o intervals_with_muts.gtf, --output_file=intervals_with_muts.gtf
                      provide a name for an output file. By default it
                      writes to the standard output
--mutsfreq=10, --mutationfrequency=10
                      sets the minimal mutations frequency for an interval
                      that you want to have written to our output file.
                      Default = 0%. Example: if the mutsfrequency is set at
                      10 and an interval position has a mutated in less than
                      10% of the reads,then the mutation will not be
                      reported.