Galaxy | Tool Preview

Minfi pipeline (version 1.0)
Input data needs to be a list of dataset pairs, where the files are in IDAT format
Input data needs to be a list of dataset pairs, where the files are in IDAT format

What it does

The minfi package provides tools for analyzing Illumina’s Methylation arrays, with a special focus on the new 450k array for humans. The functionality addressed in this wrapper include preprocessing, QC assessments, identification of interesting methylation loci and plotting functionality.

INPUTS:

Case : Dataset collection with all samples which are of one phenotype (Example: Cancer, Disease state, Phenotype 1)

Control : Dataset collection with all samples which are of base normal phenotype (Example: Normals, Non-Disease state, Phenotype 2)

Select Preprocessing Method:

Choose one of the many preprocessing methods available. For more information on the different preprocessing methods refer to the minfi manual, https://www.bioconductor.org/packages/release/bioc/manuals/minfi/man/minfi.pdf

NOTE Many people ask us which normalization they should apply to their dataset. A good rule recommended by the authors of the package is, If there exist global biological methylation differences between your samples, as for instance a dataset with cancer and normal samples, or a dataset with different tissues/cell types, use the preprocessFunnorm function as it is aimed for such datasets. On the other hand, if you do not expect global differences between your samples, for instance a blood dataset, or one-tissue dataset, use the preprocessQuantile function. In our experience, these two normalization procedures perform always better than the functions preprocessNoob, preprocessIllumina and preprocessSWAN discussed below. For convenience, these functions are still implemented in the minfi package. This section is taken from the excellent guide provided by Jean-Philippe Fortin and Kasper Daniel Hansen.

OUTPUTS:

Plots:

Output 1: PDF file of the QC Report. Output 2: PDF file of the MDS plot.

CSV files:

Output 1: CSV file containing Differentially Methylated Positions. Output 2: CSV file containing Differentially Methylated Regions calculated using Bumphunter. Output 3: CSV file containing Large scale Differentially methylated regions.

HOW TO USE

IDAT files (Both Red and Green channel). Make paired dataset collections, with RED and GREEN channel IDAT files.

Step 1: Upload IDAT(Both Red and green channel) files using the upload tool in Galaxy.

Step 2: Once the upload is completed, select the "Operations on Multiple Datasets" in the history panel.

Step 3: Select the list of IDAT files to be analyzed, and click "For all selected".

Step 4:

Choose the "Build List of Dataset pairs". Make the pairs and label the dataset collections. Once you enter the "Create a collection of paried datasets" dialogue box, click on "Clear filters" and then choose the "Forward" == Green channel, and "Reverse" == Red channel files. You should see the pairs in green color in the bottom panel.

Rename your common prefix for the file, by removing the trailing underscore "_", and name your collection. You should have one dataset collection for "Case" and another with "Control" (Normal vs Cancer or Treatment vs Wildtype)

Step 5: Once the two dataset collections are prepared, run the tool to run a minfi pipeline.

ADVANCED PARAMETERS:

Variance shrinkage (‘shrinkVar=TRUE’) is recommended when sample sizes are small (<10). The sample variances are squeezed by computing empirical Bayes posterior means using the ‘limma’ package.

B: An integer denoting the number of resamples to use when computing null distributions. This defaults to 0. If ‘permutations’ is supplied that defines the number of permutations/bootstraps and ‘B’ is ignored.

smooth: A logical value. If TRUE the estimated profile will be smoothed with the smoother defined by ‘smoothFunction’

cutoff: A numeric value. Values of the estimate of the genomic profile above the cutoff or below the negative of the cutoff will be used as candidate regions. It is possible to give two separate values (upper and lower bounds). If one value is given, the lower bound is minus the value.