What it does
CONTRA is a tool for copy number variation (CNV) detection for targeted resequencing data such as those from whole-exome capture data. CONTRA calls copy number gains and losses for each target region with key strategies include the use of base-level log-ratios to remove GC-content bias, correction for an imbalanced library size effect on log-ratios, and the estimation of log-ratio variations via binning and interpolation. It takes standard alignment formats (BAM/SAM) and output in variant call format (VCF 4.0) for easy integration with other next generation sequencing analysis package.
Required Parameters
-t, --target Target region definition file [BED format] -s, --test Alignment file for the test sample [BAM/SAM] -c, --control Alignment file for the control sample [BAM/SAM/BED – baseline file] --bed **option has to be supplied for control with baseline file.** -f, --fasta Reference genome [FASTA] -o, --outFolder the folder name (and its path) to store the output of the analysis (this new folder will be created – error message occur if the folder exists)
Optional Parameters
--numBin Numbers of bins to group the regions. User can specify multiple experiments with different numbers of bins (comma separated). [Default: 20] --minReadDepth The threshold for minimum read depth for each bases (see Step 2 in CONTRA workflow) [Default: 10] --minNBases The threshold for minimum number of bases for each target regions (see Step 2 in CONTRA workflow) [Default: 10] --sam If the specified test and control samples are in SAM format. [Default: False] (It will always take BAM samples as default) --bed If specified, control will be a baseline file in BED format. [Default: False] Please refer to the Baseline Script section for instruction how to create baseline files from set of BAMfiles. A set of baseline files from different platform have also been provided in the CONTRA download page. --pval The p-value threshold for filtering. Based on Adjusted P-Values. Only regions that pass this threshold will be included in the VCF file. [Default: 0.05] --sampleName The name to be appended to the front of the default output name. By default, there will be nothing appended. --nomultimapped The option to remove multi-mapped reads (using SAMtools with mapping quality > 0). [default: FALSE] -p, --plot If specified, plots of log-ratio distribution for each bin will be included in the output folder [default: FALSE] --minExon Minimum number of exons in one bin (if less than this number , bin that contains small number of exons will be merged to the adjacent bins) [Default : 2000] --minControlRdForCall Minimum Control ReadDepth for call [Default: 5] --minTestRdForCall Minimum Test ReadDepth for call [Default: 0] --minAvgForCall Minimum average coverage for call [Default: 20] --maxRegionSize Maximum region size in target region (for breaking large regions into smaller regions. By default, maxRegionSize=0 means no breakdown). [Default : 0] --targetRegionSize Target region size for breakdown (if maxRegionSize is non-zero) [Default: 200] -l, --largeDeletion If specified, CONTRA will run large deletion analysis (CBS). User must have DNAcopy R-library installed to run the analysis. [False] --smallSegment CBS segment size for calling large variations [Default : 1] --largeSegment CBS segment size for calling large variations [Default : 25] --lrCallStart Log ratios start range that will be used to call CNV [Default : -0.3] --lrCallEnd Log ratios end range that will be used to call CNV [Default : 0.3] --passSize Size of exons that passed the p-value threshold compare to the original exons size [Default: 0.5]