pyCalculateFDRs
By default the FDR value is set to 0.05, meaning that there is a 5% chance that the interval is not significantly enriched. The tool reports significant intervals in the GTF format and reports overlapping genomic features. Mutation frequencies are not included but these can be added using the pyCalculateMutationFrequencies tool
NOTE! By default it calls each significant interval an "exon" but this has no meaning! It may overlap with an intron. Use bedtools to extract those intervals that overlap with introns or other features
Example of an output file:
# generated by pyCalculateFDRs version 0.0.3, Sat Jun 1 21:16:23 2013 # pyCalculateFDRs.py -f test_count_output_reads.gtf -r 200 -o test_count_output_FDRs_005.gtf -v -m 0.05 # chromosome feature source start end minimal_coverage strand . attributes chrI protein_coding exon 140846 140860 5 - . gene_id "YAL005C"; gene_name "SSA1"; chrI intergenic_region exon 223118 223164 4 - . gene_id "INT_0_179"; gene_name "INT_0_179"; chrI intergenic_region exon 71889 71922 3 + . gene_id "INT_0_94"; gene_name "INT_0_94"; chrII intergenic_region exon 296127 296158 3 - . gene_id "INT_0_365"; gene_name "INT_0_365"; chrII intergenic_region exon 680697 680722 4 - . gene_id "INT_0_626"; gene_name "INT_0_626"; chrII intergenic_region exon 680827 680846 4 - . gene_id "INT_0_626"; gene_name "INT_0_626"; chrII snRNA exon 680827 680838 5 - . gene_id "LSR1"; gene_name "LSR1"; chrII snRNA exon 680951 681001 5 - . gene_id "LSR1"; gene_name "LSR1"; chrII intergenic_region exon 577985 577996 3 - . gene_id "INT_0_556"; gene_name "INT_0_556"; chrII protein_coding exon 203838 203887 3 + . gene_id "YBL011W"; gene_name "SCT1"; chrII protein_coding exon 296127 296158 3 - . gene_id "YBR028C"; gene_name "YBR028C";
pyCalculateFDRs is part of the pyCRAC package. Takes interval information in GTF or bed format and calculates False Discovery Rates (FDRs).
Parameter list
Options:
-f read_file, --readdatafile=read_file Name of the bed/gff/gtf file containing the read/cDNA coordinates --file_type=FILE_TYPE this tool supports bed6, gtf and gff input files. Please select from 'bed','gtf' or 'gff'. Default=gtf -o outfile.gtf, --outfile=outfile.gtf Optional. Provide the name of the output file. Default is 'selected_intervals.gtf' -r 100, --range=100 allows you to set the length of the UTR regions. If you set '-r 50' or '--range=50', then the program will set a fixed length (50 bp) regardless of whether the GTF file has genes with annotated UTRs. -a protein_coding, --annotation=protein_coding select which annotation (i.e. protein_coding, ncRNA, sRNA, rRNA,snoRNA,snRNA, depending on the source of your GTF file) you would like to focus your analysis on. Default = all annotations -c yeast.txt, --chromfile=yeast.txt Location of the chromosome info file. This file should have two columns: first column is the names of the chromosomes, second column is length of the chromosomes. Default is yeast --gtf=yeast.gtf Name of the annotation file. Default is /usr/local/pyC RAC/db/Saccharomyces_cerevisiae.EF2.59.1.2.gtf -m MINFDR, --minfdr=MINFDR To set a minimal FDR threshold for filtering interval data. Default is 0.05 --min=MIN to set a minimal read coverages for a region. Regions with coverage less than minimum will be ignoredve an FDR of zero --iterations=ITERATIONS to set the number of iterations for randomization of read coordinates. Default=100