Galaxy |

pyBinCollector

pyBinCollector is part of the pyCRAC package. Allows the user to generate genome-wide coverage plots. Normalises gene lengths by dividing genes into a fixed number of bins and then calculates the hit density in each bin. The program also allows the user to input specific bin numbers to extract blocks/clusters present in these bins.

Parameter list

File input options:

-f FILE, --input_file=FILE
                    Provide the path and name of the pyReadCounters.py or
                    pyMotif.py GTF file. By default the program expects
                    data from the standard input.
-o OUTPUT_FILE, --output_file=OUTPUT_FILE
                    To set an output file name. Do not add a file
                    extension. By default, if the --outputall flag is not
                    used, the program writes to the standard output.
--gtf=yeast.gtf
                    type the path to the gtf annotation file that you want
                    to use. Default is /usr/local/pyCRAC/db/Saccharomyces_
                    cerevisiae.EF2.59.1.2.gtf

pyBinCollector.py specific options:

-a protein_coding, --annotation=protein_coding
                    select which annotation (i.e. protein_coding, ncRNA,
                    sRNA, rRNA, tRNA, snoRNA, all) you would like to focus
                    your search on. Default = all
--min_length=20
                    to set a minimum length threshold for genes. Genes
                    shorter than the minimal length will be discarded.
                    Default = 1
--max_length=10000
                    to set a maximum length threshold for genes. Genes
                    larger than the maximum length will be discarded.
                    Default = 100000000
-n 20, --numberofbins=20
                    select the number of bins you want to generate.
                    Default=20
--binselect=2 4
                    allows selection of sequences that were mapped to
                    specific bins. This option expects two numbers, one
                    for each bin, separated by a space. For example:
                    --binselect 20 30.
--outputall
                    use this flag to output the normalized distribution
                    for each individual gene, rather than making a
                    cumulative coverage plot. Useful for making box plots
                    or for making heat maps.

Common options:

-r 100, --range=100
                    allows you to set the length of the UTR regions. If
                    you set '-r 50' or '--range=50', then the program will
                    set a fixed length (50 bp) regardless of whether the
                    GTF file has genes with annotated UTRs.
-s intron, --sequence=intron
                    with this option you can select whether you want to
                    generate bins from the coding or genomic sequence or
                    introns,exon,CDS, or UTR coordinates. Default =
                    genomic
--ignorestrand
                    To ignore strand information and all reads overlapping
                    with genomic features will be considered sense reads.
                    Useful for analysing ChIP or RIP data