Galaxy |

pyPileup

pyPileup is part of the pyCRAC package. Produces pileups containing the number of hits, substitutions and deletions for each nucleotide covered by reads in specific genes or genomic regions

Parameter list

File input options:

-f FILE, --input_file=FILE
                    As input files you can use Novoalign native output,
                    SAM, pyMotif or pyReadCounters GTF files as input
                    file. By default it expects data from the standard
                    input. Make sure to specify the file type of the file
                    you want to have analyzed using the --file_type
                    option!
-o OUTPUT_FILE, --output_file=OUTPUT_FILE
                    Use this flag to override the standard output file
                    names. All pileups will be written to one output file.
-g FILE, --genes_file=FILE
                    here you need to type in the name of your gene list
                    file (1 column) or the hittable file
--chr=FILE
                    if you simply would like to align reads against a
                    genomic sequence you should generate a tab delimited
                    file containing an identifyer, chromosome name, start
                    position, end position and strand
--gtf=annotation_file.gtf
                    type the path to the gtf annotation file that you want
                    to use
--tab=tab_file.tab
                    type the path to the tab file that contains the
                    genomic reference sequence
--file_type=FILE_TYPE
                    use this option to specify the file type (i.e. 'novo',
                    'sam', 'gtf'). This will tell the program which
                    parsers to use for processing the files. Default =
                    'novo'

pyPileup specific options:

--limit=500
                    with this option you can select how many reads mapped
                    to a particular gene/ORF/region you want to count.
                    Default = All
--iCLIP
                    This turns on the iCLIP mode and the pileups will
                    report cross-linking site frequencies in iCLIP data in
                    reference sequences

Common options:

-v, --verbose
                    prints all the status messages to a file rather than
                    the standard output
--ignorestrand
                    this flag tells the program to ignore strand
                    information and all overlapping reads will considered
                    sense reads. Useful for analysing ChIP or RIP data
--zip=FILE
                    use this option to compress all the output files in a
                    single zip file
--overlap=1
                    sets the number of nucleotides a read has to overlap
                    with a gene before it is considered a hit. Default =
                    1 nucleotide
-s genomic, --sequence=genomic
                    with this option you can select whether you want the
                    reads aligned to the genomic or the coding sequence.
                    Default = genomic
-r 100, --range=100
                    allows you to set the length of the UTR regions. If
                    you set '-r 50' or '--range=50', then the program will
                    set a fixed length (50 bp) regardless of whether the
                    GTF file has genes with annotated UTRs.

Options for novo, SAM and BAM files:

--align_quality=100, --mapping_quality=100
                    with these options you can set the alignment quality
                    (Novoalign) or mapping quality (SAM) threshold. Reads
                    with qualities lower than the threshold will be
                    ignored. Default = 0
--align_score=100
                    with this option you can set the alignment score
                    threshold. Reads with alignment scores lower than the
                    threshold will be ignored. Default = 0
-l 100, --length=100
                    to set read length threshold. Default = 1000
-m 100000, --max=100000
                    maximum number of mapped reads that will be analyzed.
                    Default = All
--unique
                    with this option reads with multiple alignment
                    locations will be removed. Default = Off
--blocks
                    with this option reads with the same start and end
                    coordinates on a chromosome will only be counted once.
                    Default = Off
--discarded=FILE
                    prints the lines from the alignments file that were
                    discarded by the parsers. This file contains reads
                    that were unmapped (NM), of poor quality (i.e. QC) or
                    paired reads that were mapped to different chromosomal
                    locations or were too far apart on the same
                    chromosome. Useful for debugging purposes
-d 1000, --distance=1000
                    this option allows you to set the maximum number of
                    base-pairs allowed between two non-overlapping paired
                    reads. Default = 1000
--mutations=delsonly
                    Use this option to only track mutations that are of
                    interest. For CRAC data this is usually deletions
                    (--mutations=delsonly). For PAR-CLIP data this is
                    usually T-C mutations (--mutations=TC). Other options
                    are: do not report any mutations: --mutations=nomuts.
                    Only report specific base mutations, for example only
                    in T's, C's and G's :--mutations=[TCG]. The brackets
                    are essential. Other nucleotide combinations are also
                    possible