Galaxy |

pyReadAligner

pyReadAligner is part of the pyCRAC package. Generates multiple sequence alignments for reads mapped to individual genes or genomic regions. Produces a fasta output file.

Parameter list

File input options:

-f FILE, --input_file=FILE
                                        As input files you can use Novoalign native output or
                                        SAM files as input file. By default it expects data
                                        from the standard input. Make sure to specify the file
                                        type of the file you want to have analyzed using the
                                        --file_type option!
-o OUTPUT_FILE, --output_file=OUTPUT_FILE
                                        Use this flag to override the standard output file
                                        names. All alignments will be written to one output
                                        file.
-g FILE, --genes_file=FILE
                                        here you need to type in the name of your gene list
                                        file (1 column) or the hittable file
--chr=FILE
                                        if you simply would like to align reads against a
                                        genomic sequence you should generate a tab delimited
                                        file containing an identifyer, chromosome name, start
                                        position, end position and strand
--gtf=annotation_file.gtf
                                        type the path to the gtf annotation file that you want
                                        to use
--tab=tab_file.tab
                                        type the path to the tab file that contains the
                                        genomic reference sequence
--file_type=FILE_TYPE
                                        use this option to specify the file type (i.e. 'novo',
                                        'sam', 'gtf'). This will tell the program which
                                        parsers to use for processing the files. Default =
                                        'novo'

pyReadAligner specific options:

--limit=500
                                        with this option you can select how many reads mapped
                                        to a particular gene/ORF/region you want to count.
                                        Default = All

Common options:

--ignorestrand
                                        this flag tells the program to ignore strand
                                        information and all overlapping reads will considered
                                        sense reads. Useful for analysing ChIP or RIP data
--overlap=1
                                        sets the number of nucleotides a read has to overlap
                                        with a gene before it is considered a hit. Default =
                                        1 nucleotide
-s genomic, --sequence=genomic
                                        with this option you can select whether you want the
                                        reads aligned to the genomic or the coding sequence.
                                        Default = genomic
-r 100, --range=100
                                        allows you to set the length of the UTR regions. If
                                        you set '-r 50' or '--range=50', then the program will
                                        set a fixed length (50 bp) regardless of whether the
                                        GTF file has genes with annotated UTRs.

Options for novo, SAM and BAM files:

--align_quality=100, --mapping_quality=100
with these options you can set the alignment quality
(Novoalign) or mapping quality (SAM) threshold. Reads
with qualities lower than the threshold will be
ignored. Default = 0
--align_score=100
with this option you can set the alignment score
threshold. Reads with alignment scores lower than the
threshold will be ignored. Default = 0
-l 100, --length=100
to set read length threshold. Default = 1000
-m 100000, --max=100000
maximum number of mapped reads that will be analyzed.
Default = All
--unique
with this option reads with multiple alignment
locations will be removed. Default = Off
--blocks
with this option reads with the same start and end
coordinates on a chromosome will only be counted once.
Default = Off
--discarded=FILE
prints the lines from the alignments file that were
discarded by the parsers. This file contains reads
that were unmapped (NM), of poor quality (i.e. QC) or
paired reads that were mapped to different chromosomal
locations or were too far apart on the same
chromosome. Useful for debugging purposes
-d 1000, --distance=1000
this option allows you to set the maximum number of
base-pairs allowed between two non-overlapping paired
reads. Default = 1000
--mutations=delsonly
Use this option to only track mutations that are of
interest. For CRAC data this is usually deletions
(--mutations=delsonly). For PAR-CLIP data this is
usually T-C mutations (--mutations=TC). Other options
are: do not report any mutations: --mutations=nomuts.
Only report specific base mutations, for example only
in T's, C's and G's :--mutations=[TCG]. The brackets
are essential. Other nucleotide combinations are also
possible