Mercurial > repos > thondeboer > neat_genreads
comparison utilities/README.md @ 0:6e75a84e9338 draft
planemo upload commit e96b43f96afce6a7b7dfd4499933aad7d05c955e-dirty
author | thondeboer |
---|---|
date | Tue, 15 May 2018 02:39:53 -0400 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:6e75a84e9338 |
---|---|
1 # computeGC.py | |
2 | |
3 Takes .genomecov files produced by BEDtools genomeCov (with -d option). | |
4 | |
5 ``` | |
6 bedtools genomecov | |
7 -d \ | |
8 -ibam normal.bam \ | |
9 -g reference.fa | |
10 ``` | |
11 | |
12 ``` | |
13 python computeGC.py \ | |
14 -r reference.fa \ | |
15 -i genomecovfile \ | |
16 -w [sliding window length] \ | |
17 -o /path/to/model.p | |
18 ``` | |
19 | |
20 # computeFraglen.py | |
21 | |
22 Takes SAM file via stdin: | |
23 | |
24 ./samtools view toy.bam | python computeFraglen.py | |
25 | |
26 and creates fraglen.p model in working directory. | |
27 | |
28 | |
29 # genMutModel.py | |
30 | |
31 Takes references genome and TSV file to generate mutation models: | |
32 | |
33 ``` | |
34 python genMutModel.py \ | |
35 -r hg19.fa \ | |
36 -m inputVariants.tsv \ | |
37 -o /home/me/models.p | |
38 ``` | |
39 | |
40 Trinucleotides are identified in the reference genome and the variant file. Frequencies of each trinucleotide transition are calculated and output as a pickle (.p) file. | |
41 | |
42 # genSeqErrorModel.py | |
43 | |
44 Generates sequence error model for genReads.py -e option. | |
45 | |
46 ``` | |
47 python genSeqErrorModel.py \ | |
48 -i input_read1.fq (.gz) / input_read1.sam \ | |
49 -o output.p \ | |
50 -i2 input_read2.fq (.gz) / input_read2.sam \ | |
51 -p input_alignment.pileup \ | |
52 -q quality score offset [33] \ | |
53 -Q maximum quality score [41] \ | |
54 -n maximum number of reads to process [all] \ | |
55 -s number of simulation iterations [1000000] \ | |
56 --plot perform some optional plotting | |
57 ``` | |
58 | |
59 # plotMutModel.py | |
60 | |
61 Performs plotting and comparison of mutation models generated from genMutModel.py. | |
62 | |
63 ``` | |
64 python plotMutModel.py \ | |
65 -i model1.p [model2.p] [model3.p]... \ | |
66 -l legend_label1 [legend_label2] [legend_label3]... \ | |
67 -o path/to/pdf_plot_prefix | |
68 ``` | |
69 | |
70 # vcf_compare_OLD.py | |
71 | |
72 Tool for comparing VCF files. | |
73 | |
74 ``` | |
75 python vcf_compare_OLD.py | |
76 --version show program's version number and exit \ | |
77 -h, --help show this help message and exit \ | |
78 -r <ref.fa> * Reference Fasta \ | |
79 -g <golden.vcf> * Golden VCF \ | |
80 -w <workflow.vcf> * Workflow VCF \ | |
81 -o <prefix> * Output Prefix \ | |
82 -m <track.bed> Mappability Track \ | |
83 -M <int> Maptrack Min Len \ | |
84 -t <regions.bed> Targetted Regions \ | |
85 -T <int> Min Region Len \ | |
86 -c <int> Coverage Filter Threshold [15] \ | |
87 -a <float> Allele Freq Filter Threshold [0.3] \ | |
88 --vcf-out Output Match/FN/FP variants [False] \ | |
89 --no-plot No plotting [False] \ | |
90 --incl-homs Include homozygous ref calls [False] \ | |
91 --incl-fail Include calls that failed filters [False] \ | |
92 --fast No equivalent variant detection [False] | |
93 ``` | |
94 Mappability track examples: https://github.com/zstephens/neat-repeat/tree/master/example_mappabilityTracks | |
95 | |
96 ## Controlled Data and Germline-Reference Allele Mismatch Information | |
97 ICGC's "Access Controlled Data" documention can be found at http://docs.icgc.org/access-controlled-data. To have access to controlled germline data, a DACO must be | |
98 submitted. Open tier data can be obtained without a DACO, but germline alleles that do not match the reference genome are masked and replaced with the reference | |
99 allele. Controlled data includes unmasked germline alleles. | |
100 |