changeset 1:e3c4e5ff7f74 draft

Uploaded
author mkhan1980
date Mon, 04 Mar 2013 06:37:58 -0500
parents ebad609b8a6d
children 2cceb9398d33
files check.xml
diffstat 1 files changed, 43 insertions(+), 0 deletions(-) [+]
line wrap: on
line diff
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/check.xml	Mon Mar 04 06:37:58 2013 -0500
@@ -0,0 +1,43 @@
+    <tool id="fa_gc_content_1" name="Discover CTCF Sites for Forward Strand">
+      <description></description>
+      <command interpreter="perl">check.pl $input $input2 $output</command>
+      <inputs>
+        <param format="fasta" name="input" type="data" label="Forward Strand Sequence File"/>
+ <param format="fasta" name="input2" type="data" label="Forward Strand Coordinate file"/>
+      </inputs>
+
+
+      <outputs>
+      <data format="tabular" name="output" />
+    </outputs>
+  
+    <tests>
+      <test>
+        <param name="input" value="fa_gc_content_input.fa"/>
+        <param name="input2" value="fa_gc_content_input2.fa"/>
+ <output name="out_file1" file="concatenated.txt"/>
+      </test>
+    </tests>
+  
+    <help>
+Background:
+This tool computationally predicts CTCF sites for a nucleotide sequence located on the forward strand. The user is required to provide two files as inputs. The first is the nucleotide sequence of interest on the + strand in FASTA format (this can be obtained from UCSC genome browser or Ensembl). The second file must be a FASTA formatted file containing the chromosome number and the genomic position of the first nucleotide sequence (separated by a tab). For example, if the sequence of interest is located on chromosome 3 with a starting genomic position of 1850000, the first line of the second input file must start with a fasta tag, and the second line will be chr3  1850000
+
+Details of Algorithm:
+CTCF sites are predicted by applying the following equation
+w(σ,j) = log2 (((f(σ,j) + sqrt(N) x b(σ)) / (N + sqrt(N))) / b(σ))
+
+Where w(σ,j) is the weight of nucleotide σ at position j, N is the total number of binding sites or the sum of all nucleotide occurrences in the column, and b is the prior background frequency of the nucleotide σ. 
+
+The sum of weights for corresponding nucleotides at each column of the matrix then estimates the likelihood of any sequence of length m to be an instance of a CTCF binding site and takes into account the GC content of the genomic region being scanned.
+
+
+Citation and further help: For further details of the algorithm, please refer to
+
+Khan MA, Soto-Jimenez LM, Howe T, Streit A, Sosinsky A, Stern CD (2013). Computational tools and resources for prediction and analysis of gene regulatory regions in the chick genome.. Genesis, , - . doi:10.1002/dvg.22375 
+ 
+For queries/questions, email ucbtmaf@ucl.ac.uk    
+    </help>
+
+  
+  </tool>