DESCRIPTION
Search a fasta file for matches to a regular expression and return a bed file with the coordinates of the match and the matched sequence itself.
Output bed file has columns:
For matches on the reverse strand it is reported the start and end position on the forward strand and the matched string on the forward strand (so the G4 'GGGAGGGT' present on the reverse strand is reported as ACCCTCCC).
Note: Fasta sequences (chroms) are read in memory one at a time along with the matches for that chromosome. The order of the output is: chroms as they are found in the inut fasta, matches sorted within chroms by positions.
ARGUMENTS:
EXAMPLE:
Test data:: >mychr ACTGnACTGnACTGnTGAC
Example1 regex=ACTG:
mychr 0 4 mychr_0_4_for 4 + ACTG mychr 5 9 mychr_5_9_for 4 + ACTG mychr 10 14 mychr_10_14_for 4 + ACTG
Example2 regex=ACTG maxstr=3:
mychr 0 4 mychr_0_4_for 4 + ACT[3,4] mychr 5 9 mychr_5_9_for 4 + ACT[3,4] mychr 10 14 mychr_10_14_for 4 + ACT[3,4]
Example3 regex=AwwG:
mychr 0 5 mychr_0_5_for 5 + ACTGn mychr 5 10 mychr_5_10_for 5 + ACTGn mychr 10 15 mychr_10_15_for 5 + ACTGn