pyMotif
pyMotif is part of the pyCRAC package. Looks for enriched sequence motifs in high-throughput sequencing data. Produces a GTF type output file with coordinates and Z-scores for enriched motifs. The GTF file can be visualised in genome browsers.
Parameter list
File input options:
-f intervals.gtf, --input_file=intervals.gtf Provide the path to an interval gtf file. By default it expects data from the standard input. -o OUTPUT_FILE, --output_file=OUTPUT_FILE Use this flag to override the standard file names. Do NOT add an extension. --gtf=annotation_file.gtf type the path to the gtf annotation file that you want to use --tab=tab_file.tab type the path to the tab file that contains the genomic reference sequence
pyMotif specific options:
--k_min=4 this option allows you to set the shortest k-mer length. Default = 4. --k_max=6 this option allows you to set the longest k-mer length. Default = 8. -n 100, --numberofkmers=100 choose the maximum number of enriched k-mer sequences you want to have reported in output files. Default = 1000
pyCRAC common options:
-a protein_coding, --annotation=protein_coding select which annotation (i.e. protein_coding, ncRNA, sRNA, rRNA,snoRNA,snRNA, depending on the source of your GTF file) you would like to focus your search on. Default = all annotations -r 100, --range=100 allows you to add regions flanking the genomic feature. If you set '-r 50' or '--range=50', then the program will add 50 nucleotides to each feature on each side regardless of whether the GTF file has genes with annotated UTRs. --overlap=1 sets the number of nucleotides a motif has to overlap with a genomic feature before it is considered a hit. Default = 1 nucleotide