What it does
ALFA provides a global overview of features distribution composing New Generation Sequencing dataset(s).Given a set of aligned reads (BAM files) and an annotation file (GTF format), the tool produces plots of the raw and normalized distributions of those reads among genomic categories (stop codon, 5'-UTR, CDS, intergenic, etc.) and biotypes (protein coding genes, miRNA, tRNA, etc.). Whatever the sequencing technique, whatever the organism.
ALFA acronym
Official documentation of the tool
Detailed example
Nota Bene
Input 1: Annotation File
ALFA requires as first input an annotation file (sequence, genome...) in gtf format in order to generate alfa indexes needed in a second round of the program.Indexes are files which list all the coordinates of the categories (stop codon, 5'-UTR, CDS, intergenic...) and biotypes (protein coding genes, miRNA, tRNA, ...) encountered in the annotated sequence.Gtf File must be sorted.Generation of indexes from an annotation file might be time consuming (i.e ~10min for the human genome). Thus, ALFA allows the user to submit directly indexes generated in previous runs as inputs for a new run.ALFA also enables the use of built-in indexes to save even more computational time. In order to generate easily these built-in indexes, install the data manager tool `ALFA_data_manager`_ available on the toolshed.
Input 2: Reads
ALFA requires as second input a single or a set of mapped reads file(s) in either bam or bedgraph format. The coordinates of the mapped reads will be intersected with the according categories and biotypes mentioned in the indexes.The strandness option determines which strand of the annotated sequence will be taken into account during this intersection.Bam or Bedgraph file(s) must be sorted.Chromosome names in reads and in annotation file (gtf or indexes) must be the same for the intersection to occur
Output files
The result of the intersection is a count file displaying the count of nucleotides in the reads for each genomic categories and biotypes. From this count file, plots of the raw and normalized distributions of the reads among these categories are generated.In the output files section, the user can choose what kind of files he/she desires as ALFA output. Categories Count File and Plots are proposed by default.The user can also select the 'indexes' option as output. This option is interesting if you plan to run ALFA again with the same submitted annotation file. See Nota Bene/Input 1: Annotation File for more information.
ALFA Developpers
Benoît Noël and Mathieu Bahin: compbio team, Institut de Biologie de l'Ecole Normale Supérieure de Paris