This is a wrapper for findMotifsGenome.pl from HOMER but not all options are included.
Program will find de novo and known motifs in regions in the genome.
Usage:
findMotifsGenome.pl <pos file> <genome> <output directory> [additional options]
Example:
findMotifsGenome.pl peaks.txt mm8r peakAnalysis -size 200 -len 8
Possible Genomes:
-- or -- Custom: provide the path to genome FASTA files (directory or single file) Heads up: will create the directory "preparsed/" in same location.
Basic options:
-mask (mask repeats/lower case sequence, can also add 'r' to genome, i.e. mm9r) -bg <background position file> (genomic positions to be used as background, default=automatic) removes background positions overlapping with target positions unless -keepOverlappingBg is used -chopify (chop up large background regions to the avg size of target regions) -len <#>[,<#>,<#>...] (motif length, default=8,10,12) [NOTE: values greater 12 may cause the program to run out of memory - in these cases decrease the number of sequences analyzed (-N), or try analyzing shorter sequence regions (i.e. -size 100)] -size <#> (fragment size to use for motif finding, default=200) -size <#,#> (i.e. -size -100,50 will get sequences from -100 to +50 relative from center) -size given (uses the exact regions you give it) -S <#> (Number of motifs to optimize, default: 25) -mis <#> (global optimization: searches for strings with # mismatches, default: 2) -norevopp (don't search reverse strand for motifs) -nomotif (don't search for de novo motif enrichment) -rna (output RNA motif logos and compare to RNA motif database, automatically sets -norevopp)
Scanning sequence for motifs:
-find <motif file> (This will cause the program to only scan for motifs)
Known Motif Options/Visualization:
-mset <vertebrates|insects|worms|plants|yeast|all> (check against motif collects, default: auto) -basic (just visualize de novo motifs, don't check similarity with known motifs) -bits (scale sequence logos by information content, default: doesn't scale) -nocheck (don't search for de novo vs. known motif similarity) -mcheck <motif file> (known motifs to check against de novo motifs, -float (allow adjustment of the degeneracy threshold for known motifs to improve p-value[dangerous]) -noknown (don't search for known motif enrichment, default: -known) -mknown <motif file> (known motifs to check for enrichment, -nofacts (omit humor) -seqlogo (use weblogo/seqlogo/ghostscript to generate logos, default uses SVG now)
Sequence normalization options:
-gc (use GC% for sequence content normalization, now the default) -cpg (use CpG% instead of GC% for sequence content normalization) -noweight (no CG correction) Also -nlen <#>, -olen <#>, see homer2 section below.
Advanced options:
-h (use hypergeometric for p-values, binomial is default) -N <#> (Number of sequences to use for motif finding, default=max(50k, 2x input) -local <#> (use local background, # of equal size regions around peaks to use i.e. 2) -redundant <#> (Remove redundant sequences matching greater than # percent, i.e. -redundant 0.5) -maxN <#> (maximum percentage of N's in sequence to consider for motif finding, default: 0.7) -maskMotif <motif file1> [motif file 2]... (motifs to mask before motif finding) -opt <motif file1> [motif file 2]... (motifs to optimize or change length of) -rand (randomize target and background sequences labels) -ref <peak file> (use file for target and background - first argument is list of peak ids for targets) -oligo (perform analysis of individual oligo enrichment) -dumpFasta (Dump fasta files for target and background sequences for use with other programs) -preparse (force new background files to be created) -preparsedDir <directory> (location to search for preparsed file and/or place new files) -keepFiles (keep temporary files) -fdr <#> (Calculate empirical FDR for de novo discovery #=number of randomizations)
homer2 specific options:
-homer2 (use homer2 instead of original homer, default) -nlen <#> (length of lower-order oligos to normalize in background, default: -nlen 3) -nmax <#> (Max normalization iterations, default: 160) -neutral (weight sequences to neutral frequencies, i.e. 25%, 6.25%, etc.) -olen <#> (lower-order oligo normalization for oligo table, use if -nlen isn't working well) -p <#> (Number of processors to use, default: 1) -e <#> (Maximum expected motif instance per bp in random sequence, default: 0.01) -cache <#> (size in MB for statistics cache, default: 500) -quickMask (skip full masking after finding motifs, similar to original homer) -minlp <#> (stop looking for motifs when seed logp score gets above #, default: -10)
Original homer specific options:
-homer1 (to force the use of the original homer) -depth [low|med|high|allnight] (time spent on local optimization default: med)