Mercurial > repos > mkhan1980 > ctcf_analysis
comparison check.xml @ 1:e3c4e5ff7f74 draft
Uploaded
| author | mkhan1980 |
|---|---|
| date | Mon, 04 Mar 2013 06:37:58 -0500 |
| parents | |
| children |
comparison
equal
deleted
inserted
replaced
| 0:ebad609b8a6d | 1:e3c4e5ff7f74 |
|---|---|
| 1 <tool id="fa_gc_content_1" name="Discover CTCF Sites for Forward Strand"> | |
| 2 <description></description> | |
| 3 <command interpreter="perl">check.pl $input $input2 $output</command> | |
| 4 <inputs> | |
| 5 <param format="fasta" name="input" type="data" label="Forward Strand Sequence File"/> | |
| 6 <param format="fasta" name="input2" type="data" label="Forward Strand Coordinate file"/> | |
| 7 </inputs> | |
| 8 | |
| 9 | |
| 10 <outputs> | |
| 11 <data format="tabular" name="output" /> | |
| 12 </outputs> | |
| 13 | |
| 14 <tests> | |
| 15 <test> | |
| 16 <param name="input" value="fa_gc_content_input.fa"/> | |
| 17 <param name="input2" value="fa_gc_content_input2.fa"/> | |
| 18 <output name="out_file1" file="concatenated.txt"/> | |
| 19 </test> | |
| 20 </tests> | |
| 21 | |
| 22 <help> | |
| 23 Background: | |
| 24 This tool computationally predicts CTCF sites for a nucleotide sequence located on the forward strand. The user is required to provide two files as inputs. The first is the nucleotide sequence of interest on the + strand in FASTA format (this can be obtained from UCSC genome browser or Ensembl). The second file must be a FASTA formatted file containing the chromosome number and the genomic position of the first nucleotide sequence (separated by a tab). For example, if the sequence of interest is located on chromosome 3 with a starting genomic position of 1850000, the first line of the second input file must start with a fasta tag, and the second line will be chr3 1850000 | |
| 25 | |
| 26 Details of Algorithm: | |
| 27 CTCF sites are predicted by applying the following equation | |
| 28 w(σ,j) = log2 (((f(σ,j) + sqrt(N) x b(σ)) / (N + sqrt(N))) / b(σ)) | |
| 29 | |
| 30 Where w(σ,j) is the weight of nucleotide σ at position j, N is the total number of binding sites or the sum of all nucleotide occurrences in the column, and b is the prior background frequency of the nucleotide σ. | |
| 31 | |
| 32 The sum of weights for corresponding nucleotides at each column of the matrix then estimates the likelihood of any sequence of length m to be an instance of a CTCF binding site and takes into account the GC content of the genomic region being scanned. | |
| 33 | |
| 34 | |
| 35 Citation and further help: For further details of the algorithm, please refer to | |
| 36 | |
| 37 Khan MA, Soto-Jimenez LM, Howe T, Streit A, Sosinsky A, Stern CD (2013). Computational tools and resources for prediction and analysis of gene regulatory regions in the chick genome.. Genesis, , - . doi:10.1002/dvg.22375 | |
| 38 | |
| 39 For queries/questions, email ucbtmaf@ucl.ac.uk | |
| 40 </help> | |
| 41 | |
| 42 | |
| 43 </tool> |
