Galaxy | Tool Preview

pyReadAligner (version 1.0.0)
Single column gene ID file
GTF file containing gene ID co-ordinates
Tab file containing genomic reference sequence
Alignment file of type .sam or .bam

pyReadAligner

pyReadAligner is part of the pyCRAC package. Generates multiple sequence alignments for reads mapped to individual genes or genomic regions. Produces a fasta output file.


Parameter list

File input options:

-f FILE, --input_file=FILE
                                        As input files you can use Novoalign native output or
                                        SAM files as input file. By default it expects data
                                        from the standard input. Make sure to specify the file
                                        type of the file you want to have analyzed using the
                                        --file_type option!
-o OUTPUT_FILE, --output_file=OUTPUT_FILE
                                        Use this flag to override the standard output file
                                        names. All alignments will be written to one output
                                        file.
-g FILE, --genes_file=FILE
                                        here you need to type in the name of your gene list
                                        file (1 column) or the hittable file
--chr=FILE
                                        if you simply would like to align reads against a
                                        genomic sequence you should generate a tab delimited
                                        file containing an identifyer, chromosome name, start
                                        position, end position and strand
--gtf=annotation_file.gtf
                                        type the path to the gtf annotation file that you want
                                        to use
--tab=tab_file.tab
                                        type the path to the tab file that contains the
                                        genomic reference sequence
--file_type=FILE_TYPE
                                        use this option to specify the file type (i.e. 'novo',
                                        'sam', 'gtf'). This will tell the program which
                                        parsers to use for processing the files. Default =
                                        'novo'

pyReadAligner specific options:

--limit=500
                                        with this option you can select how many reads mapped
                                        to a particular gene/ORF/region you want to count.
                                        Default = All

Common options:

--ignorestrand
                                        this flag tells the program to ignore strand
                                        information and all overlapping reads will considered
                                        sense reads. Useful for analysing ChIP or RIP data
--overlap=1
                                        sets the number of nucleotides a read has to overlap
                                        with a gene before it is considered a hit. Default =
                                        1 nucleotide
-s genomic, --sequence=genomic
                                        with this option you can select whether you want the
                                        reads aligned to the genomic or the coding sequence.
                                        Default = genomic
-r 100, --range=100
                                        allows you to set the length of the UTR regions. If
                                        you set '-r 50' or '--range=50', then the program will
                                        set a fixed length (50 bp) regardless of whether the
                                        GTF file has genes with annotated UTRs.

Options for novo, SAM and BAM files:

--align_quality=100, --mapping_quality=100
                                        with these options you can set the alignment quality
                                        (Novoalign) or mapping quality (SAM) threshold. Reads
                                        with qualities lower than the threshold will be
                                        ignored. Default = 0
--align_score=100
                                        with this option you can set the alignment score
                                        threshold. Reads with alignment scores lower than the
                                        threshold will be ignored. Default = 0
-l 100, --length=100
                                        to set read length threshold. Default = 1000
-m 100000, --max=100000
                                        maximum number of mapped reads that will be analyzed.
                                        Default = All
--unique
                                        with this option reads with multiple alignment
                                        locations will be removed. Default = Off
--blocks
                                        with this option reads with the same start and end
                                        coordinates on a chromosome will only be counted once.
                                        Default = Off
--discarded=FILE
                                        prints the lines from the alignments file that were
                                        discarded by the parsers. This file contains reads
                                        that were unmapped (NM), of poor quality (i.e. QC) or
                                        paired reads that were mapped to different chromosomal
                                        locations or were too far apart on the same
                                        chromosome. Useful for debugging purposes
-d 1000, --distance=1000
                                        this option allows you to set the maximum number of
                                        base-pairs allowed between two non-overlapping paired
                                        reads. Default = 1000
--mutations=delsonly
                                        Use this option to only track mutations that are of
                                        interest. For CRAC data this is usually deletions
                                        (--mutations=delsonly). For PAR-CLIP data this is
                                        usually T-C mutations (--mutations=TC). Other options
                                        are: do not report any mutations: --mutations=nomuts.
                                        Only report specific base mutations, for example only
                                        in T's, C's and G's :--mutations=[TCG]. The brackets
                                        are essential. Other nucleotide combinations are also
                                        possible