Galaxy |

pyCalculateFDRs

By default the FDR value is set to 0.05, meaning that there is a 5% chance that the interval is not significantly enriched. The tool reports significant intervals in the GTF format and reports overlapping genomic features. Mutation frequencies are not included but these can be added using the pyCalculateMutationFrequencies tool

NOTE! By default it calls each significant interval an "exon" but this has no meaning! It may overlap with an intron. Use bedtools to extract those intervals that overlap with introns or other features

Example of an output file:

# generated by pyCalculateFDRs version 0.0.3, Sat Jun  1 21:16:23 2013
# pyCalculateFDRs.py -f test_count_output_reads.gtf -r 200 -o test_count_output_FDRs_005.gtf -v -m 0.05
# chromosome        feature source  start   end     minimal_coverage        strand  .       attributes
chrI        protein_coding  exon    140846  140860  5       -       .       gene_id "YAL005C"; gene_name "SSA1";
chrI        intergenic_region       exon    223118  223164  4       -       .       gene_id "INT_0_179"; gene_name "INT_0_179";
chrI        intergenic_region       exon    71889   71922   3       +       .       gene_id "INT_0_94"; gene_name "INT_0_94";
chrII       intergenic_region       exon    296127  296158  3       -       .       gene_id "INT_0_365"; gene_name "INT_0_365";
chrII       intergenic_region       exon    680697  680722  4       -       .       gene_id "INT_0_626"; gene_name "INT_0_626";
chrII       intergenic_region       exon    680827  680846  4       -       .       gene_id "INT_0_626"; gene_name "INT_0_626";
chrII       snRNA   exon    680827  680838  5       -       .       gene_id "LSR1"; gene_name "LSR1";
chrII       snRNA   exon    680951  681001  5       -       .       gene_id "LSR1"; gene_name "LSR1";
chrII       intergenic_region       exon    577985  577996  3       -       .       gene_id "INT_0_556"; gene_name "INT_0_556";
chrII       protein_coding  exon    203838  203887  3       +       .       gene_id "YBL011W"; gene_name "SCT1";
chrII       protein_coding  exon    296127  296158  3       -       .       gene_id "YBR028C"; gene_name "YBR028C";

pyCalculateFDRs is part of the pyCRAC package. Takes interval information in GTF or bed format and calculates False Discovery Rates (FDRs).

Parameter list

Options:

-f read_file, --readdatafile=read_file
                      Name of the bed/gff/gtf file containing the read/cDNA
                      coordinates
--file_type=FILE_TYPE
                      this tool supports bed6, gtf and gff input files.
                      Please select from 'bed','gtf' or 'gff'. Default=gtf
-o outfile.gtf, --outfile=outfile.gtf
                      Optional. Provide the name of the output file. Default
                      is 'selected_intervals.gtf'
-r 100, --range=100
                      allows you to set the length of the UTR regions. If
                      you set '-r 50' or '--range=50', then the program will
                      set a fixed length (50 bp) regardless of whether the
                      GTF file has genes with annotated UTRs.
-a protein_coding, --annotation=protein_coding
                      select which annotation (i.e. protein_coding, ncRNA,
                      sRNA, rRNA,snoRNA,snRNA, depending on the source of
                      your GTF file) you would like to focus your analysis
                      on. Default = all annotations
-c yeast.txt, --chromfile=yeast.txt
                      Location of the chromosome info file. This file should
                      have two columns: first column is the names of the
                      chromosomes, second column is length of the
                      chromosomes. Default is yeast
--gtf=yeast.gtf
                      Name of the annotation file. Default is /usr/local/pyC
                      RAC/db/Saccharomyces_cerevisiae.EF2.59.1.2.gtf
-m MINFDR, --minfdr=MINFDR
                      To set a minimal FDR threshold for filtering interval
                      data. Default is 0.05
--min=MIN
                      to set a minimal read coverages for a region. Regions
                      with coverage less than minimum will be ignoredve an
                      FDR of zero
--iterations=ITERATIONS
                      to set the number of iterations for randomization of
                      read coordinates. Default=100