Galaxy | Tool Preview

pyBarcodeFilter (version 1.0.0)
FastQ format
Tab delimited file with barcodes and barcode names
Set the number of allowed mismatches in a barcode

pySolexaBarcodeFilter

pySolexaBarcodeFilter is part of the pyCRAC package. Filters sequence files by barcodes.

This tool requires FASTA or FASTQ input files containing the raw data and a text file containing barcode information. To process paired end data, use -f and the -r flags to indicate the path to the forward and reverse sequencing reactions, respectively. The barcodes file should two columns separated by a tab (see the table below). The first column should contain the barcode nucleotide sequences. The second column should contain an identifier, for example, the name of the barcode or the name of the experiment. The ā€™Nā€™ in the barcode sequence indicates a random nucleotide. Make sure to use a simple text editor like TextEdit (MacOS X), gedit (Linux/Unix) or use a text editor in the terminal. The program is case sensitive: all the nucleotide sequences should be upper case. You can freely combine different barcodes but if you are mixing samples containing random nucleotide barcodes and normal barcodes. NOTE! make sure to place the regular barcode sequence below the sequence with random nucleotides and make sure the shortest sequence is ALWAYS at the bottom in the column (see below)

Example of a barcode text file:

NNNCGCTTAGC mutant2
NNNGCGCAGC  mutant1
NNNATTAG    control
NNNTAAGC    myfavprotein
AGC         oldcontrol
AC          veryfirstbarcodedsample

Parameter list

Options:

-f FILE, --input_file=FILE
                          name of the FASTQ or FASTA input file
-r FILE, --reverse_input_file=FILE
                          name of the paired (or reverse) FASTQ or FASTA input file
--file_type=FASTQ
                          type of file, uncompressed (fasta or fastq) or compressed (fasta.gz or fastq.gz, gzip/gunzip
                                              compressed). Default is fastq
-b FILE, --barcode_list=FILE
                          name of tab-delimited file containing barcodes and barcode names
-m 1, --mismatches=1
                          to set the number of allowed mismatches in a barcode. A maximum of one mismatch is allowed. Default = 0
-i, --index
                          use this option if you want to split the data using the Illumina indexing barcode information