pyBinCollector
pyBinCollector is part of the pyCRAC package. Allows the user to generate genome-wide coverage plots. Normalises gene lengths by dividing genes into a fixed number of bins and then calculates the hit density in each bin. The program also allows the user to input specific bin numbers to extract blocks/clusters present in these bins.
Parameter list
File input options:
-f FILE, --input_file=FILE Provide the path and name of the pyReadCounters.py or pyMotif.py GTF file. By default the program expects data from the standard input. -o OUTPUT_FILE, --output_file=OUTPUT_FILE To set an output file name. Do not add a file extension. By default, if the --outputall flag is not used, the program writes to the standard output. --gtf=yeast.gtf type the path to the gtf annotation file that you want to use. Default is /usr/local/pyCRAC/db/Saccharomyces_ cerevisiae.EF2.59.1.2.gtf
pyBinCollector.py specific options:
-a protein_coding, --annotation=protein_coding select which annotation (i.e. protein_coding, ncRNA, sRNA, rRNA, tRNA, snoRNA, all) you would like to focus your search on. Default = all --min_length=20 to set a minimum length threshold for genes. Genes shorter than the minimal length will be discarded. Default = 1 --max_length=10000 to set a maximum length threshold for genes. Genes larger than the maximum length will be discarded. Default = 100000000 -n 20, --numberofbins=20 select the number of bins you want to generate. Default=20 --binselect=2 4 allows selection of sequences that were mapped to specific bins. This option expects two numbers, one for each bin, separated by a space. For example: --binselect 20 30. --outputall use this flag to output the normalized distribution for each individual gene, rather than making a cumulative coverage plot. Useful for making box plots or for making heat maps.
Common options:
-r 100, --range=100 allows you to set the length of the UTR regions. If you set '-r 50' or '--range=50', then the program will set a fixed length (50 bp) regardless of whether the GTF file has genes with annotated UTRs. -s intron, --sequence=intron with this option you can select whether you want to generate bins from the coding or genomic sequence or introns,exon,CDS, or UTR coordinates. Default = genomic --ignorestrand To ignore strand information and all reads overlapping with genomic features will be considered sense reads. Useful for analysing ChIP or RIP data