# HG changeset patch # User petr-novak # Date 1580471723 18000 # Node ID c2c69c6090f06bfcb56d4d835ba5a09801249646 # Parent 99569eccc58386209cc8407b7ea74d049ef81607 Uploaded diff -r 99569eccc583 -r c2c69c6090f0 ChipSeqRatioDef.xml --- a/ChipSeqRatioDef.xml Mon Dec 09 04:14:48 2019 -0500 +++ b/ChipSeqRatioDef.xml Fri Jan 31 06:55:23 2020 -0500 @@ -22,12 +22,12 @@ - - - - - + + + + + **What it does** -Analysis of NGS sequences from Chromatin Imunoprecipitation. ChiP -and Input reads are mapped to contigs obtained from graph based -repetitive sequence clustering(`Novak et al. 2013`__) to enriched repeats. Reads from input -and ChIP should be ideally short illumina reads with uniform length -above 80 nt. It is sufficiant to use about 1 milion of reads for both Input and Chip. +The ChIP-seq Mapper evaluates the enrichment of repetitive sequences in sequencing data from chromatin +immunoprecipitation experiments, using repeats identified by RepeatExplorer as the reference. The tool +performs BLASTN similarity search of the read sequences to the reference, +and the reads producing hits that passed the user-specified similarity threshold are assigned to the +repeat clusters. The assignment is made to the cluster that produced the best similarity hit, and every +read is assigned to only a single cluster. Following read mapping, the numbers of reads from the +INPUT and ChIP samples are evaluated, and ChIP/INPUT ratios of the normalized read counts are reported +for individual clusters. +ChIP and INPUT reads should be of uniform lengths of at least 40 nt. The bit score threshold value should be +adjusted based on the length of the analyzed reads (the value equal to the read length is recommended for a start). This method was first used in (`Neumann et al. 2012`__) for -identification of repetitive sequences associated with cetromeric -region. If you use this method, reference: +identification of repetitive sequences associated with centromeres: `PLoS Genet. Epub 2012 Jun 21. Stretching the rules: monocentric chromosomes with multiple centromere domains. Neumann P, Navrátilová A, Schroeder-Reiter E, Koblížková A, Steinbauerová V, Chocholová E, Novák P, Wanner G, Macas J.`__. -.. __: http://bioinformatics.oxfordjournals.org/content/29/6/792.full - .. __: http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1002777 .. __: http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1002777 diff -r 99569eccc583 -r c2c69c6090f0 RM_custom_search.xml --- a/RM_custom_search.xml Mon Dec 09 04:14:48 2019 -0500 +++ b/RM_custom_search.xml Fri Jan 31 06:55:23 2020 -0500 @@ -10,7 +10,7 @@ - + diff -r 99569eccc583 -r c2c69c6090f0 extract_contigs_from_archive.xml --- a/extract_contigs_from_archive.xml Mon Dec 09 04:14:48 2019 -0500 +++ b/extract_contigs_from_archive.xml Fri Jan 31 06:55:23 2020 -0500 @@ -10,7 +10,7 @@ - - + + diff -r 99569eccc583 -r c2c69c6090f0 fasta_affixer.xml --- a/fasta_affixer.xml Mon Dec 09 04:14:48 2019 -0500 +++ b/fasta_affixer.xml Fri Jan 31 06:55:23 2020 -0500 @@ -1,19 +1,19 @@ - Tool appending suffix and prefix to sequences names + Appending suffix and prefix to the read names fasta_affixer.py -f $input -p "$prefix" -s "$suffix" -n $nspace -o $output - - - - + + + + - + diff -r 99569eccc583 -r c2c69c6090f0 fasta_interlacer.xml --- a/fasta_interlacer.xml Mon Dec 09 04:14:48 2019 -0500 +++ b/fasta_interlacer.xml Fri Jan 31 06:55:23 2020 -0500 @@ -12,8 +12,8 @@ - - + + diff -r 99569eccc583 -r c2c69c6090f0 fasta_manual_input.xml --- a/fasta_manual_input.xml Mon Dec 09 04:14:48 2019 -0500 +++ b/fasta_manual_input.xml Fri Jan 31 06:55:23 2020 -0500 @@ -7,12 +7,12 @@ - + - + diff -r 99569eccc583 -r c2c69c6090f0 fastq_name_affixer.xml --- a/fastq_name_affixer.xml Mon Dec 09 04:14:48 2019 -0500 +++ b/fastq_name_affixer.xml Fri Jan 31 06:55:23 2020 -0500 @@ -5,15 +5,15 @@ - + - + - + diff -r 99569eccc583 -r c2c69c6090f0 pairScan.xml --- a/pairScan.xml Mon Dec 09 04:14:48 2019 -0500 +++ b/pairScan.xml Fri Jan 31 06:55:23 2020 -0500 @@ -1,6 +1,6 @@ - - Scan paired reads for overlap + + Scan paired-end reads for overlap python-levenshtein @@ -9,8 +9,8 @@ - - + + @@ -38,8 +38,8 @@ - - + + diff -r 99569eccc583 -r c2c69c6090f0 paired_fastq_filtering.xml --- a/paired_fastq_filtering.xml Mon Dec 09 04:14:48 2019 -0500 +++ b/paired_fastq_filtering.xml Fri Jan 31 06:55:23 2020 -0500 @@ -1,9 +1,9 @@ - + - Preprocessing of paired-end reads fastq files + Preprocessing of paired-end reads in FASTQ format including trimming, quality filtering, cutadapt filtering and interlacing. Broken pairs are discarded. @@ -40,41 +40,41 @@ - + - + - - + + - + - - + + - + - + - + > @@ -87,17 +87,17 @@ - + - + - " + " diff -r 99569eccc583 -r c2c69c6090f0 renameSequences.xml --- a/renameSequences.xml Mon Dec 09 04:14:48 2019 -0500 +++ b/renameSequences.xml Fri Jan 31 06:55:23 2020 -0500 @@ -5,21 +5,21 @@ - - - + + + - + **What is does** Use this tool to rename your sequences with numerical counter while keeping sequence name prefex as part of the name. -If paired sequences are used, last character in sequence name is used to distinguish pairs. +If paired-end reads are used, the last character in sequence name is used to distinguish pairs. diff -r 99569eccc583 -r c2c69c6090f0 sampleFasta.xml --- a/sampleFasta.xml Mon Dec 09 04:14:48 2019 -0500 +++ b/sampleFasta.xml Fri Jan 31 06:55:23 2020 -0500 @@ -1,5 +1,5 @@ - - Tool for creating samples of sequences from larger dataset + + Tool for random sampling subsets of reads from larger dataset seqkit @@ -7,24 +7,26 @@ + + ]]> + - - - - + + + + diff -r 99569eccc583 -r c2c69c6090f0 single_fastq_filtering.xml --- a/single_fastq_filtering.xml Mon Dec 09 04:14:48 2019 -0500 +++ b/single_fastq_filtering.xml Fri Jan 31 06:55:23 2020 -0500 @@ -1,9 +1,9 @@ - + - Preprocessing of fastq files + Preprocessing of FASTQ read files including trimming, quality filtering, cutadapt filtering and sampling @@ -35,43 +35,43 @@ - + - + - + - - + + - + - - + + - + - + - + > @@ -84,7 +84,7 @@ - + @@ -92,8 +92,8 @@ - - " + + "