Illumina runs
ngs_simulation.py
#if $in_type.input_type == "built-in"
--input="${ filter( lambda x: str( x[0] ) == str( $in_type.genome ), $__app__.tool_data_tables[ 'ngs_sim_fasta' ].get_fields() )[0][-1] }"
--genome=$in_type.genome
#else
--input=$in_type.input1
#end if
--read_len=$read_len
--avg_coverage=$avg_coverage
--error_rate=$error_rate
--num_sims=$num_sims
--polymorphism=$polymorphism
--detection_thresh=$detection_thresh
--output_png=$output_png
--summary_out=$summary_out
--output_summary=$output_summary
--new_file_path=$__new_file_path__
summary_out == True
**What it does**
This tool simulates an Illumina run and provides plots of false positives and false negatives. It allows for a range of simulation parameters to be set. Note that this simulation sets only one (randomly chosen) position in the genome as polymorphic, according to the value specified. Superimposed on this are "sequencing errors", which are uniformly (and randomly) distributed. Polymorphisms are assigned using the detection threshold, so if the detection threshold is set to the same as the minor allele frequency, the expected false negative rate is 50%.
**Parameter list**
These are the parameters that should be set for the simulation::
Read length (which is the same for all reads)
Average Coverage
Frequency for Minor Allele
Sequencing Error Rate
Detection Threshold
Number of Simulations
You also should choose to use either a built-in genome or supply your own FASTA file.
**Output**
There are one or two. The first is a png that contains two different plots and is always generated. The second is optional and is a text file with some summary information about the simulations that were run. Below are some example outputs for a 10-simulation run on phiX with the default settings::
Read length 76
Average coverage 200
Error rate/quality score 0.001
Number of simulations 100
Frequencies for minor allele 0.002
0.004
Detection thresholds 0.003
0.005
0.007
Include summary file Yes
Plot output (png):
.. image:: ./static/images/ngs_simulation.png
Summary output (txt)::
FP FN GENOMESIZE.5386 fprate hetcol errcol
Min. : 71.0 Min. :0.0 Mode:logical Min. :0.01318 Min. :0.004 Min. :0.007
1st Qu.:86.0 1st Qu.:1.0 NA's:10 1st Qu.:0.01597 1st Qu.:0.004 1st Qu.:0.007
Median :92.5 Median :1.0 NA Median :0.01717 Median :0.004 Median :0.007
Mean :93.6 Mean :0.9 NA Mean :0.01738 Mean :0.004 Mean :0.007
3rd Qu.:100.8 3rd Qu.:1.0 NA 3rd Qu.:0.01871 3rd Qu.:0.004 3rd Qu.:0.007
Max. :123.0 Max. :1.0 NA Max. :0.02284 Max. :0.004 Max. :0.007
False Positive Rate Summary
0.003 0.005 0.007
0.001 0.17711 0.10854 0.01673
0.009 0.18049 0.10791 0.01738
False Negative Rate Summary
0.003 0.005 0.007
0.001 1.0 0.8 1.0
0.009 0.4 0.7 0.9