pySolexaBarcodeFilter
pySolexaBarcodeFilter is part of the pyCRAC package. Filters sequence files by barcodes.
This tool requires FASTA or FASTQ input files containing the raw data and a text file containing barcode information. To process paired end data, use -f and the -r flags to indicate the path to the forward and reverse sequencing reactions, respectively. The barcodes file should two columns separated by a tab (see the table below). The first column should contain the barcode nucleotide sequences. The second column should contain an identifier, for example, the name of the barcode or the name of the experiment. The āNā in the barcode sequence indicates a random nucleotide. Make sure to use a simple text editor like TextEdit (MacOS X), gedit (Linux/Unix) or use a text editor in the terminal. The program is case sensitive: all the nucleotide sequences should be upper case. You can freely combine different barcodes but if you are mixing samples containing random nucleotide barcodes and normal barcodes. NOTE! make sure to place the regular barcode sequence below the sequence with random nucleotides and make sure the shortest sequence is ALWAYS at the bottom in the column (see below)
Example of a barcode text file:
NNNCGCTTAGC mutant2 NNNGCGCAGC mutant1 NNNATTAG control NNNTAAGC myfavprotein AGC oldcontrol AC veryfirstbarcodedsample
Parameter list
Options:
-f FILE, --input_file=FILE name of the FASTQ or FASTA input file -r FILE, --reverse_input_file=FILE name of the paired (or reverse) FASTQ or FASTA input file --file_type=FASTQ type of file, uncompressed (fasta or fastq) or compressed (fasta.gz or fastq.gz, gzip/gunzip compressed). Default is fastq -b FILE, --barcode_list=FILE name of tab-delimited file containing barcodes and barcode names -m 1, --mismatches=1 to set the number of allowed mismatches in a barcode. A maximum of one mismatch is allowed. Default = 0 -i, --index use this option if you want to split the data using the Illumina indexing barcode information