What it does
ChIPseeker is a Bioconductor package for annotating ChIP-seq data analysis. Peak Annotation is performed by the annotatePeak function. The position and strand information of nearest genes are reported, in addition to the distance from the peak to the TSS of its nearest gene. Users can define the TSS (transcription start site) region under Advanced Options, by default the TSS region is defined from -3kb to +3kb. The genomic region of the peak is reported in the annotation column. Since some annotations may overlap for a peak, ChIPseeker adopts the following priority in genomic annotation:
ChIPseeker also produces plots to help users visualise the overlaps in annotation for peaks, for example, the vennpie and upsetplot. See the ChIPseeker vignette for more information.
Inputs
A peaks file in BED, Interval or Tabular format e.g from MACS2 or DiffBind. Note that there is an option to specify if the input peaks file has a header row. No header row is assumed by default, which is usually the case for BED format e.g. MACS narrowpeak, however other formats e.g. MACS tabular format, may contain a header row.
Example:
Chrom Start End Name Score Strand 18 394599 396513 DiffBind 0 . 18 111566 112005 DiffBind 0 . 18 346463 347342 DiffBind 0 . 18 399013 400382 DiffBind 0 . 18 371109 372102 DiffBind 0 .
A GTF file for annotation. The GTF file must have fields called "gene_id" and gene_name".
Outputs
This tool outputs
- a file of annotated peaks in Interval or Tabular format
- a PDF of plots (plotAnnoPie, plotAnnoBar, vennpie, upsetplot, plotDistToTSS)
Optionally, you can choose to output
- the R script used by this tool
- an RData file
Annotated peaks
Annotation similar to below will be added to the input file.
Example - Interval format:
Chrom Start End Comment 18 394599 396513 DiffBind|0|.|Intron (ENST00000400256/ENSG00000158270, intron 1 of 1)|1|346465|400382|53918|2|ENST00000400256| 3869|COLEC12|ENSG00000158270 18 346463 347342 DiffBind|0|.|Exon (ENST00000400256/ENSG00000158270, exon 1 of 1)|1|346465|400382|53918|2|ENST00000400256|53040|COLEC12|ENSG00000158270 18 399013 400382 DiffBind|0|.|Promoter (<=1kb)|1|346465|400382|53918|2|ENST00000400256| 0|COLEC12|ENSG00000158270 18 371109 372102 DiffBind|0|.|Intron (ENST00000400256/ENSG00000158270, intron 1 of 1)|1|346465|400382|53918|2|ENST00000400256|28280|COLEC12|ENSG00000158270 18 111566 112005 DiffBind|0|.|Promoter (<=1kb)|1|111568|112005| 438|1|ENST00000608049| 0|ROCK1P1|ENSG00000263006 Columns contain the following data:
Chrom: Chromosome name
Start: Start position of site
End: End position of site
Comment: The pipe ("|") separated values in this column correspond to:
- <Any additional input columns>
- annotation (Promoter, 5’ UTR, 3’ UTR, Exon, Intron, Downstream, Intergenic)
- geneChr
- geneStart
- geneEnd
- geneLength
- geneStrand
- transcriptId
- distanceToTSS
- geneName
- geneId
Example - Tabular format:
Chrom Start End Name Score Strand Comment annotation geneChr geneStart geneEnd geneLength geneStrand transcriptId distanceToTSS geneName geneId 18 394599 396513 DiffBind 0 . 1914|7.15|5.55|7.89|-2.35|7.06e-24|9.84e-21 Intron (ENST00000400256/ENSG00000158270, intron 1 of 1) 1 346465 400382 53918 2 ENST00000400256 3869 COLEC12 ENSG00000158270 18 346463 347342 DiffBind 0 . 879|5|5.77|3.24|2.52|6.51e-06|0.00303 Exon (ENST00000400256/ENSG00000158270, exon 1 of 1) 1 346465 400382 53918 2 ENST00000400256 53040 COLEC12 ENSG00000158270 18 399013 400382 DiffBind 0 . 1369|7.62|7|8.05|-1.04|1.04e-05|0.00364 Promoter (<=1kb) 1 346465 400382 53918 2 ENST00000400256 0 COLEC12 ENSG00000158270 18 371109 372102 DiffBind 0 . 993|4.63|3.07|5.36|-2.3|8.1e-05|0.0226 Intron (ENST00000400256/ENSG00000158270, intron 1 of 1) 1 346465 400382 53918 2 ENST00000400256 28280 COLEC12 ENSG00000158270 18 111566 112005 DiffBind 0 . 439|5.71|6.53|3.63|2.89|1.27e-08|8.88e-06 Promoter (<=1kb) 1 111568 112005 438 1 ENST00000608049 0 ROCK1P1 ENSG00000263006