Mercurial > repos > portiahollyoak > temp
changeset 0:28d1a6f8143f draft
planemo upload for repository https://github.com/portiahollyoak/Tools commit 132bb96bba8e7aed66a102ed93b7744f36d10d37-dirty
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/Manual Mon Apr 25 13:08:56 2016 -0400 @@ -0,0 +1,186 @@ +TEMP (Transposable Element Movement in Population) Manual + + +2015.01.09 + + +TEMP is a software designed to 1) detect transposable elements (TEs) insertions and absences relative to the reference genome, 2) define the genome-TE junctions up to base pair resolution when it is possible, and 3) estimate the population frequency of the detected insertions and absences. +This document provides information concerning how to run TEMP, what options to use, and how to interpret the outputs. If you have any questions or find any bugs please contact Jiali Zhuang through jiali.zhuang@umassmed.edu. + + + +Requirement and installation + + +TEMP runs on Linux x86_64 systems. +Following softwares are required by TEMP and should be included in the path: +Samtools (http://samtools.sourceforge.net/), +bedtools (http://code.google.com/p/bedtools/), +bwa (http://sourceforge.net/projects/bio-bwa/), +twoBitToFa (http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/twoBitToFa), +Perl package BioPerl is also required for running TEMP (http://www.bioperl.org/wiki/Main_Page). + +For installing TEMP just unzip and untar the file. +In the directory TEMP_v1.01/ there are two bash scripts TEMP_Insertion.sh and TEMP_Absence.sh for TE insertion and absence analysis, respectively. + + + + +Options + + +For TEMP_Insertion.sh the arguments to the options are explained below: + + + -i Input file in bam format with full path. The users need to map the reads to the reference genome using mapping softwares such as BWA (http://bio-bwa.sourceforge.net/). Please sort and index the bam files before calling TEMP. Sorting and indexing can be done by 'samtools sort' and 'samtools index'. + + + -s The full path to the scripts in directory TEMP_v1.0/. + + + -o The full path to output directory. Default is current directory. + + + -r Transposon consensus sequence fasta format with full path. Such files can be downloaded from Repbase (http://www.girinst.org/repbase/). + + + -t Annotated transposon positions in the genome (e.g., RepeakMasker) in bed6 format with full path. + + + -u Families of transposable elements in tab delimited format (with the first column the name of the elemenet and the second column family). Only use together with -t. + + + -x The minimum score difference between the best hit and the second best hit for considering a read as uniquely mapped. The higher the score the more strigent the criterion. For BWA mem, which does not produce the XT:A: tag. + + + -m Number of mismatches allowed when mapping to TE concensus sequences. + + + -f An integer specifying the length of the fragments (inserts) of the library. Default is 500. + + + -c An integer specifying the number of CUPs used. Default is 8. + + + -h Show help message. + + + + +For TEMP_Absence.sh the arguments to the options are explained below: + + + -i Input file in bam format with full path. The users need to map the reads to the reference genome using mapping softwares such as BWA (http://bio-bwa.sourceforge.net/). Please sort and index the bam files before calling TEMP. Sorting and indexing can be done by 'samtools sort' and 'samtools index'. + + + -s The full path to the scripts in directory TEMP_v1.0/. + + + -o Path to output directory. Default is current directory. + + + -r Annotated transposon positions in the genome (e.g., RepeakMasker) in bed6 format with full path. For major model organisms such file can be downloaded from UCSC Genome Browser. In Table Browser page just choose “variation and repeats” in the group tab and “RepeatMasker” in the track tab. + + + -t 2bit file for the reference genome. Such file can be downloaded from UCSC Genome Browser. In Downloads page choose the right genome, click on the “Full data set” link and download the *.2bit file. + + + -f An integer specifying the length of the fragments (inserts) of the library. Default is 500. + + + -c An integer specifying the number of CUPs used. Default is 4. + + + -h Show help message. + + + + +Output files + + +For TE insertion analysis, the summay output file has the suffix: .insertion.refined.bp.summary. + + +There are 14 columns in the summary file and their meanings are listed below: +Column 1: The chromosome where the detected insertion happens. +Column 2: The coordinate of the start position of the detected insertion. +Column 3: The coordinate of the end position of the detected insertion. +Column 4: The TE family that the detected insertion belongs to. +Column 5: The direction of the insertion. “Plus” means that the TE is integrated with the plus strand of the genome while “minus” means the TE is integrated with the minus strand. +Column 6: The class of the insertion. “1p1” means that the detected insertion is supported by reads at both sides. “2p” means the detected insertion is supported by more than 1 read at only 1 side. “Singleton” means the detected insertion is supported by only 1 read at 1 side. +Column 7: The total number of read pairs that support the detected insertion. +Column 8: The estimated population frequency of the detected insertion. +Columns 9 & 10: The coordinate of a junction and the number of the reads supporting it. If the junction is not found column 9 will be the arithmetic mean of the start and end coordinates and column 10 will have the value 0. +Columns 11 & 12: Same as Columns 9 & 10 except for the junction on the other strand. +Column 13: The number of reads supporting the detected insertion at the 5’ end of the TE (not including junction spanning reads). +Column 13: The number of reads supporting the detected insertion at the 3’ end of the TE (not including junction spanning reads). + + + + +For TE absence analysis, the summay output file has the suffix: .absence.refined.bp.summary. + + +There are 9 columns in the summary file and their meanings are listed below: +Column 1: The chromosome where the detected absence happens. +Column 2: The coordinate of the start position of the detected absence. +Column 3: The coordinate of the end position of the detected absence. +Column 4: The TE family that the detected insertion belongs to. +Column 5: Junctions at 5’ of the excised TE. The two numbers are the coordinates of the junctions on the two strands. +Column 6: Junctions at 3’ of the excised TE. The two numbers are the coordinates of the junctions on the two strands. +Column 7: The number of reads supporting the absence. +Column 8: The number of reads supporting the reference (no absence). +Column 9: Estimated population frequency of the detected absence event. + + + + + +Visualization + +Since v1.01, we added a new function to TEMP that enables the visualization of the distribution of predicted TE insertion across the genome using Dr. Xiaopeng Zhu's visualization tool "circosjs". + +The procedure involves two steps: +1) Generate the JSON objects file from the TEMP detected TE insertions. +This can be done easily by running the script "generate_density_json.pl": e.g. +perl generate_density_json.pl input.insertion.bp.summary ref.chromInfo 500000 + +This script takes 3 parameters: (1) the TE insertions predicted by TEMP (i.e., the output file produced by TEMP_Insertion.sh); + (2) the file contains the sizes of all the chromosomes in a reference genome (the chromInfo files for model organism genomes can be downloaded from UCSC Genome Browser); + (3) the size of genomic bins (500kb in the above example), total number predicted TE insertions in each will be calculated and plotted later. + +2) Visualization of the distribution of TE insertions across the genome. +Dr. Xiaopeng Zhu (https://twitter.com/nimezhu) at UMass Medical School developed a powserful web-based visualization tool that is available at: http://circos.zhu.land/ +The user only needs to upload the JSON file generated in step1 in the "read local file" section. + +Please forward any question and suggestion about the website to Dr. Zhu: xiaopeng.zhu@umassmed.edu + + + + + + +Test datasets + +We put together two datasets for testing TEMP. + +One is a simulated set generated using Drosophila Melanogaster Chromosome 2L as the template. It's distributed along with this package. + +The recommended commandline invokation for this testset is: +git clone https://github.com/JialiUMassWengLab/TEMP.git +cd TEMP +tar -xvzf test_dataset.tar.gz +cd test_dataset/ +bash ../scripts/TEMP_Insertion.sh -i ./test_chromosome.sorted.bam -s ../scripts -r ./test_concensus.fa -t ./test_TE_annotation.bed -m 3 -f 500 -c 8 +bash ../scripts/TEMP_Absence.sh -i ./test_chromosome.sorted.bam -s ../scripts -r ./test_TE_annotation.bed -t ./dm3_chr2L.2bit -f 500 -c 4 + +The other one is derived from chromosome 11 of 8 individuals from 1000 gnomes project. It's available at http://zlab.umassmed.edu/~zhuangj/TEMP_resources/Human_test_dataset.tar.gz. +The recommended commandline invokation for this testset is: +git clone https://github.com/JialiUMassWengLab/TEMP.git +cd TEMP +wget http://bib.umassmed.edu/~zhuangj/TEMP_resources/Human_test_dataset.tar.gz +tar -zxvf Human_test_dataset.tar.gz +cd Human_test_dataset +bash ../scripts/TEMP_Insertion.sh -i ./chrom11.test.sorted.bam -s ../scripts -r ./HomoSapienRepbaseTEConcensus.fa -t ./hg19_rpmk.bed -m 3 -f 500 -c 8 +bash ../scripts/TEMP_Absence.sh -i ./chrom11.test.sorted.bam -s ../scripts -r ./hg19_rpmk.bed -t ./hg19.2bit -f 500 -c 4
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/scripts/TEMP_Absence.sh Mon Apr 25 13:08:56 2016 -0400 @@ -0,0 +1,138 @@ +#!/bin/bash -x +# TEMP (Transposable Element Movement present in a Population) +# 2013-06-14 +# Jiali Zhuang(jiali.zhuang@umassmed.edu) +# Zhiping Weng Lab +# Programs in Bioinformatics and Integrative Biology +# University of Massachusetts Medical School + +#usage function +usage() { +echo -en "\e[1;36m" +cat <<EOF + +usage: $0 -i input_file.sorted.bam -s scripts_directory -o output_directory -r transposon_rpmk.bed -t reference.2bit -f fragment_size -c CPUs -h + +TEMP is a software package for detecting transposable elements (TEs) +insertions and excisions from pooled high-throughput sequencing data. +Please send questions, suggestions and bug reports to: +jiali.zhuang@umassmed.edu + +Options: + -i Input file in bam format with full path. Please sort and index the file before calling this program. + Sorting and indexing can be done by 'samtools sort' and 'samtools index' + -s Directory where all the scripts are + -o Path to output directory. Default is current directory + -r Annotated transposon positions in the genome (e.g., repeakMask) in bed6 format with full path + -t 2bit file for the reference genome (can be downloaded from UCSC Genome Browser) + -f An integer specifying the length of the fragments (inserts) of the library. Default is 500 + -c An integer specifying the number of CUPs used. Default is 4 + -h Show help message + +EOF +echo -en "\e[0m" +} + +# taking options +while getopts "hi:c:f:o:r:s:t:" OPTION +do + case $OPTION in + h) + usage && exit 1 + ;; + i) + BAM=$OPTARG + ;; + f) + INSERT=$OPTARG + ;; + o) + OUTDIR=$OPTARG + ;; + c) + CPU=$OPTARG + ;; + s) + BINDIR=$OPTARG + ;; + r) + TEBED=$OPTARG + ;; + t) + REF=$OPTARG + ;; + ?) + usage && exit 1 + ;; + esac +done + +if [[ -z $BAM ]] || [[ -z $BINDIR ]] || [[ -z $TEBED ]] || [[ -z $REF ]] +then + usage && exit 1 +fi +[ ! -z "${CPU##*[!0-9]*}" ] || CPU=4 +[ ! -z "${INSERT##*[!0-9]*}" ] || INSERT=500 +[ ! -z $OUTDIR ] || OUTDIR=$PWD + +mkdir -p "${OUTDIR}" || echo -e "\e[1;31mWarning: Cannot create directory ${OUTDIR}. Using the direcory of input fastq file\e[0m" +cd ${OUTDIR} || echo -e "\e[1;31mError: Cannot access directory ${OUTDIR}... Exiting...\e[0m" || exit 1 +touch ${OUTDIR}/.writting_permission && rm -rf ${OUTDIR}/.writting_permission || echo -e "\e[1;31mError: Cannot write in directory ${OUTDIR}... Exiting...\e[0m" || exit 1 + +function checkExist { + echo -ne "\e[1;32m\"${1}\" is using: \e[0m" && which "$1" + [[ $? != 0 ]] && echo -e "\e[1;36mError: cannot find software/function ${1}! Please make sure that you have installed the pipeline correctly.\nExiting...\e[0m" && exit 1 +} +echo -e "\e[1;35mTesting required softwares/scripts:\e[0m" +checkExist "echo" +checkExist "rm" +checkExist "mkdir" +checkExist "date" +checkExist "mv" +checkExist "sort" +checkExist "touch" +checkExist "awk" +checkExist "grep" +checkExist "bwa" +checkExist "samtools" +echo -e "\e[1;35mDone with testing required softwares/scripts, starting pipeline...\e[0m" + +name=`basename $BAM` +i=${name/.sorted.bam/} +echo $name +echo $i +if [[ ! -s $name ]] +then + cp $BAM ./ +fi +if [[ ! -s $name.bai ]] +then cp $BAM.bai ./ +fi + +#Detect excision sites +samtools view -XF 0x2 $name > $i.unpair.sam +awk -F "\t" '{OFS="\t"; if ($9 != 0) print $0}' $i.unpair.sam > temp1.sam +perl $BINDIR/pickUniqIntervalPos.pl temp1.sam $INSERT > $i.unproper.uniq.interval.bed + +rm temp1.sam $i.unpair.sam + +# Sometimes $i.unproper.uniq.interval.bed contains malformed bed entries +# These must be removed to prevent the script failing +awk '{if ($3>=$2 && $3 > 0 && $2 > 0) print $0}' $i.unproper.uniq.interval.bed > $i.unproper.uniq.interval.fixed.bed +mv $i.unproper.uniq.interval.fixed.bed $i.unproper.uniq.interval.bed + +# Map to transposons +bedtools intersect -a $TEBED -b $i.unproper.uniq.interval.bed -f 1.0 -wo > temp +perl $BINDIR/filterFalsePositive.ex.pl temp $INSERT $i.final.pairs.rpmk.bed +bedtools intersect -a $TEBED -b $i.final.pairs.rpmk.bed -f 1.0 -wo > temp2 + +perl $BINDIR/excision.clustering.pl temp2 $i.excision.cluster.rpmk +rm temp temp2 $i.unproper.uniq.interval.bed $i.final.pairs.rpmk.bed + +# Identify breakpoints using soft-clipping information +perl $BINDIR/pickSoftClipping.over.pl $i.excision.cluster.rpmk $REF > $i.excision.cluster.rpmk.sfcp +perl $BINDIR/refine_breakpoint.ex.pl + +# Estimate excision sites frequencies +perl $BINDIR/pickOverlapPair.ex.pl $i.excision.cluster.rpmk.refined.bp > $i.excision.cluster.rpmk.refined.bp.refsup +perl $BINDIR/summarize_excision.pl
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/scripts/TEMP_Insertion.sh Mon Apr 25 13:08:56 2016 -0400 @@ -0,0 +1,176 @@ +#!/bin/bash -x +# TEMP (Transposable Element Movement present in a Population) +# 2013-06-14 +# Jiali Zhuang(jiali.zhuang@umassmed.edu) +# Zhiping Weng Lab +# Programs in Bioinformatics and Integrative Biology +# University of Massachusetts Medical School + +#usage function +usage() { +echo -en "\e[1;36m" +cat <<EOF + +usage: $0 -i input_file.sorted.bam -s scripts_directory -o output_directory -r transposon_database.fa -t annotated_TEs.bed -m MISMATCH -f fragment_size -c CPUs -h + +TEMP is a software package for detecting transposable elements (TEs) +insertions and excisions from pooled high-throughput sequencing data. +Please send questions, suggestions and bug reports to: +jiali.zhuang@umassmed.edu + +Options: + -i Input file in bam format with full path. Please sort and index the file before calling this program. + Sorting and indexing can be done by 'samtools sort' and 'samtools index' + -s Directory where all the scripts are + -o Path to output directory. Default is current directory + -r Transposon sequence database in fasta format with full path + -t Annotated TEs in BED6 format with full path. Detected insertions that overlap with annoated TEs will be filtered. + -u TE families annotations. If supplied detected insertions overlap with annotated TE of the same family will be filtered. Only use with -t. + -m Number of mismatch allowed when mapping to TE concensus sequences. Default is 3 + -x The minimum score difference between the best hit and the second best hit for considering a read as uniquely mapped. For BWA mem. + -f An integer specifying the length of the fragments (inserts) of the library. Default is 500 + -c An integer specifying the number of CUPs used. Default is 8 + -h Show help message + +EOF +echo -en "\e[0m" +} + +# taking options +while getopts "hi:c:f:m:o:r:s:t:u:x:" OPTION +do + case $OPTION in + h) + usage && exit 1 + ;; + i) + BAM=$OPTARG + ;; + f) + INSERT=$OPTARG + ;; + m) + MM=$OPTARG + ;; + o) + OUTDIR=$OPTARG + ;; + c) + CPU=$OPTARG + ;; + s) + BINDIR=$OPTARG + ;; + r) + TESEQ=$OPTARG + ;; + t) + ANNO=$OPTARG + ;; + u) + FAMI=$OPTARG + ;; + x) + SCORE=$OPTARG + ;; + ?) + usage && exit 1 + ;; + esac +done + +if [[ -z $BAM ]] || [[ -z $BINDIR ]] || [[ -z $TESEQ ]] +then + usage && exit 1 +fi +[ ! -z "${CPU##*[!0-9]*}" ] || CPU=8 +[ ! -z "${INSERT##*[!0-9]*}" ] || INSERT=500 +[ ! -z "${MM##*[!0-9]*}" ] || MM=3 +[ ! -z "${SCORE##*[!0-9]*}" ] || SCORE=0 +[ ! -z $OUTDIR ] || OUTDIR=$PWD + +mkdir -p "${OUTDIR}" || echo -e "\e[1;31mWarning: Cannot create directory ${OUTDIR}. Using the direcory of input fastq file\e[0m" +cd ${OUTDIR} || echo -e "\e[1;31mError: Cannot access directory ${OUTDIR}... Exiting...\e[0m" || exit 1 +touch ${OUTDIR}/.writting_permission && rm -rf ${OUTDIR}/.writting_permission || echo -e "\e[1;31mError: Cannot write in directory ${OUTDIR}... Exiting...\e[0m" || exit 1 + +function checkExist { + echo -ne "\e[1;32m\"${1}\" is using: \e[0m" && which "$1" + [[ $? != 0 ]] && echo -e "\e[1;36mError: cannot find software/function ${1}! Please make sure that you have installed the pipeline correctly.\nExiting...\e[0m" && exit 1 +} +echo -e "\e[1;35mTesting required softwares/scripts:\e[0m" +checkExist "echo" +checkExist "rm" +checkExist "mkdir" +checkExist "date" +checkExist "mv" +checkExist "sort" +checkExist "touch" +checkExist "awk" +checkExist "grep" +checkExist "bwa" +checkExist "samtools" +echo -e "\e[1;35mDone with testing required softwares/scripts, starting pipeline...\e[0m" + +cp $TESEQ ./ +name=`basename $BAM` +te=`basename $TESEQ` +i=${name/.sorted.bam/} +echo $name +echo $i +if [[ ! -s $name ]] +then + cp $BAM ./ +fi +if [[ ! -s $name.bai ]] +then cp $BAM.bai ./ +fi + +# Get the mate seq of the uniq-unpaired reads +samtools view -XF 0x2 $name > $i.unpair.sam +if [[ $SCORE -eq 0 ]] +then + perl $BINDIR/pickUniqPairFastq.pl $i.unpair.sam $i.unpair.uniq + perl $BINDIR/pickUniqPos.pl $i.unpair.sam > $i.unpair.uniq.bed +else + perl $BINDIR/pickUniqPairFastq_MEM.pl $i.unpair.sam $i.unpair.uniq $SCORE + perl $BINDIR/pickUniqPos_MEM.pl $i.unpair.sam $SCORE > $i.unpair.uniq.bed +fi + +# Map to transposons +bwa index -a is $te +bwa aln -t $CPU -n $MM -l 100 -R 1000 $te $i.unpair.uniq.1.fastq > $i.unpair.uniq.1.sai +bwa aln -t $CPU -n $MM -l 100 -R 1000 $te $i.unpair.uniq.2.fastq > $i.unpair.uniq.2.sai +bwa sampe -P $te $i.unpair.uniq.1.sai $i.unpair.uniq.2.sai $i.unpair.uniq.1.fastq $i.unpair.uniq.2.fastq > $i.unpair.uniq.transposons.sam + + +#Summary +samtools view -hSXF 0x2 $i.unpair.uniq.transposons.sam > $i.unpair.uniq.transposons.unpair.sam +perl $BINDIR/pickUniqMate.pl $i.unpair.uniq.transposons.unpair.sam $i.unpair.uniq.bed > $i.unpair.uniq.transposons.bed +cp $i.unpair.uniq.transposons.bed $i.unpair.uniq.transposons.filtered.bed + + +#Prepare for insertion breakpoints identification +awk -F "\t" -v sample=$i '{OFS="\t"; print $1,$2,$3,sample,$5,$6}' $i.unpair.uniq.transposons.filtered.bed >> tmp +perl $BINDIR/mergeTagsWithoutGap.pl tmp > $i.uniq.transposons.filtered.woGap.bed +perl $BINDIR/mergeTagsWithGap.pl $i.uniq.transposons.filtered.woGap.bed $INSERT > $i.uniq.transposons.filtered.wGap.bed +rm tmp +perl $BINDIR/get_class.pl $i.uniq.transposons.filtered.wGap.bed $i > $i.uniq.transposons.filtered.wGap.class.bed +perl $BINDIR/make.bp.bed.pl $i.uniq.transposons.filtered.wGap.class.bed $ANNO $FAMI + +#rm $i.unpair.sam $i.unpair.uniq.bed $i.unpair.uniq.?.fastq $i.unpair.uniq.?.sai +rm $i.unpair.uniq.transposons.sam $i.unpair.uniq.transposons.unpair.sam $i.uniq.transposons.filtered.woGap.bed $i.uniq.transposons.filtered.wGap.bed + + +#Detect insertion breakpoints using soft-clipping information +perl $BINDIR/pickClippedFastq.pl $i $te +perl $BINDIR/refine_breakpoint.in.pl + + +#Estimate insertion frequencies +perl $BINDIR/pickOverlapPair.in.pl $i.insertion.refined.bp $INSERT > $i.insertion.refined.bp.summary + + +################################ +##End of processing insertions## +################################ +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/scripts/cmd.total.sh Mon Apr 25 13:08:56 2016 -0400 @@ -0,0 +1,26 @@ +rm tmp +#for i in flamBGFM flamKGFM flamEmbryo flamHets flamTranshets +#for i in harwich wXh24 wXh14 wXh21 whXw14 whXw21 +#for i in armiTranshetsOvary armiTranshetsSoma armiHetsOvary armiHetsSoma rhinoTranshetsOvary rhinoTranshetsSoma rhinoHetsOvary rhinoHetsSoma qinTranshetsOvary qinHetsOvary w1118Ovary w1118Soma orerOvary orerSoma orerEmbryo +#for i in w1118Ovary w1118Soma qinTranshetsOvary qinHetsOvary qintestTranshetsOvary qintestHetsOvary +for i in harwich.ovary harwichG20.ovary W1.ovary W1G20.ovary wXh1g.ovary whXh3g.ovary whXh5g.ovary whXh7g.ovary whXw3g.ovary whXw5g.ovary whXw7g.ovary +#for i in W1.ovary wXh1g.ovary whXh3g.ovary whXh5g.ovary whXw3g.ovary whXw5g.ovary armiHets.ovary armiHets.carcass armiTranshets.ovary armiTranshets.carcass flamBGFM.ovary flam.embryo flamHets.ovary flamKGFM.ovary flamTranshets.ovary harwich.ovary introgression2.ovary introgression2X3.ovary introgression3.ovary orer.embryo orer.ovary orer.carcass qinDf.ovary qinHets.ovary qinTMB.ovary qinTranshets.ovary rhinoHets.ovary rhinoHets.carcass rhinoTranshets.ovary rhinoTranshets.carcass w1.ovary w1.carcass whXw14d.ovary whXw21d.ovary wXh14d.ovary wXh21d.ovary wXh2_4d.ovary + +do + + awk -F "\t" -v sample=$i '{OFS="\t"; print $1,$2,$3,sample,$5,$6}' $i.downsample.bam.unpair.uniq.transposons.filtered.bed >> tmp + + ## Filter BS A{36} + grep FBgn0000224_BS tmp | egrep "\+51|\-51" > tmp.BS + + ## Merge Stalker + ediff tmp diff tmp.BS > tmp2 + +done + +perl /home/wangj2/jpp_findTransposonJumping/mergeTagsWithoutGap.pl tmp2 > dysgenic.uniq.transposons.filtered.woGap.bed +perl /home/wangj2/jpp_findTransposonJumping/mergeTagsWithGap.pl dysgenic.uniq.transposons.filtered.woGap.bed 500 > dysgenic.uniq.transposons.filtered.wGap.bed + +rm tmp2 tmp.BS tmp + +perl get_class.pl dysgenic.uniq.transposons.filtered.wGap.bed > dysgenic.uniq.transposons.filtered.wGap.class.bed
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/scripts/excision.clustering.pl Mon Apr 25 13:08:56 2016 -0400 @@ -0,0 +1,40 @@ +#! /usr/bin/perl + +use strict; + +my %position=(); +my %names=(); +open (input, "<$ARGV[0]") or die "Can't open $ARGV[0] since $!\n"; +while (my $line=<input>) { + chomp($line); + my @a=split(/\t/, $line); + my @b=split(/\#/, $a[8]); + + if (defined $position{$b[0]}) { + my @c=split(/\:/, $position{$b[0]}); + if (($c[0] eq $a[0])&&($a[1] < $c[1])) { + $position{$b[0]} =~ s/$c[1]/$a[1]/; + } + if (($c[0] eq $a[0])&&($a[2] > $c[2])) { + $position{$b[0]} =~ s/$c[2]/$a[2]/; + } + my $transposon=$a[3]; + if ($names{$b[0]} !~ /$transposon/) {$names{$b[0]}=$names{$b[0]}.",$transposon";} + } + else { + $position{$b[0]}="$a[0]\:$a[1]\:$a[2]"; + $names{$b[0]}=$a[3]; + } +} +close input; + +open (output, ">>temp_for_sort") or die "Can't open temp_for_sort since $!\n"; +while ((my $key, my $value) = each (%position)) { + my @z=split(/\:/, $value); + print output "$z[0]\t$z[1]\t$z[2]\t$names{$key}\n"; +} +close output; + +system("sort +0 -1 +1n -2 +2n -3 temp_for_sort > sorted"); +system("uniq -c sorted > $ARGV[1]"); +system("rm sorted temp_for_sort");
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/scripts/filterFalsePositive.ex.pl Mon Apr 25 13:08:56 2016 -0400 @@ -0,0 +1,31 @@ +#! /usr/bin/perl + +use strict; + +open (input, "<$ARGV[0]") or die "Can't open $ARGV[0] since $!\n"; +my %leng=(); +my %trans=(); +my %coordinate=(); +while (my $line=<input>) { + chomp($line); + my @a=split(/\t/, $line); + if (defined $leng{$a[9]}) { + $trans{$a[9]} += $a[2]-$a[1]; + } + else { + $leng{$a[9]}=$a[8]-$a[7]-10; + $trans{$a[9]}=$a[2]-$a[1]; + $coordinate{$a[9]}="$a[6]\:$a[7]\:$a[8]"; + } +} +close input; + +open (output, ">>$ARGV[2]") or die "Can't open $ARGV[2] since $!\n"; +while ((my $key, my $value) = each (%coordinate)) { + if ((($leng{$key}-$trans{$key}) <= $ARGV[1])&&(($leng{$key}-$trans{$key}) >= 0)) { +# if (($leng{$key}-$trans{$key}) <= 500) { + my @b=split(/\:/, $value); + print output "$b[0]\t$b[1]\t$b[2]\t$key\n"; + } +} +close output;
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/scripts/filterFalsePositive.in.pl Mon Apr 25 13:08:56 2016 -0400 @@ -0,0 +1,39 @@ +#!/share/bin/perl +use List::Util qw(max min); +#system("windowBed -a $ARGV[0] -b /home/wangj2/flycommon/all_transposons.dml.rmskCrossmatch.bed -sw -r 1000 -l 0 > tmp"); +#system("windowBed -a $ARGV[0] -b /home/wangj2/flycommon/soo.trnalnpos.map2.sort.bed -sw -r 1000 -l 0 > tmp"); +system("bedtools window -a $ARGV[0] -b $ARGV[1] -sw -r 1000 -l 0 > tmp"); + +open in,"tmp"; +my %read; +while(<in>) +{ + chomp; + split/\t/; + + ## if the same tpye of transposons + my @loc=map { [/(.*?),(\+|-)(.*)/] } split/;/,$_[4]; + foreach my $l (@loc) + { + if($$l[0] eq $_[9]) + { + ## if the same strand of transposons + if((($_[5] eq $$l[1]) && ($_[11] eq "-")) || (($_[5] ne $$l[1]) && ($_[11] eq "+"))) + { + ## if the fragments of the exists transposons + { + my $s=max($$l[2],$_[12]); + my $e=min(($$l[2]+$_[2]-$_[1]),$_[13]); + if($s<$e) + { + print join("\t",@_[0..5]),"\n" if not exists $read{$_[3]}; + $read{$_[3]}=1; + } + } + } + } + } +} +close in; + +system("rm tmp");
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/scripts/generate_density_json.pl Mon Apr 25 13:08:56 2016 -0400 @@ -0,0 +1,80 @@ +#! /usr/bin/perl + +use strict; +die "perl $0 <input.insertion.bp.summary> <chromInfo file> <genomic bin size>\n" if @ARGV<2; + +my @colors=("blue","green","red","yellow","grey","orange","purple","brown", "black"); + +my $op_title=$ARGV[0]; +$op_title =~ s/summary/json/; + +my %chrs=(); +system("cut -f1 $ARGV[0] | uniq > chr"); +open (input, "<chr") or die "Can't open chr since $!\n"; +while (my $line=<input>) { + chomp($line); + $chrs{$line}=1; +} +close input; +system("rm chr"); + +open (output, ">>$op_title") or die "Can't open $op_title since $!\n"; +print output "{\"ideograms\":[\n"; + +my $i=0; +open (input, "<$ARGV[1]") or die "Can't open $ARGV[1] since $!\n"; +while (my $line=<input>) { + chomp($line); + my @a=split(/\t/, $line); + if ($chrs{$a[0]}==1) { + my $len=int($a[1]/$ARGV[2])+1; + if ($len < 5) { + $chrs{$a[0]}=0; + next; + } + if ($i > 0) {print output ",\n";} + print output "{\"id\":\"$a[0]\",\"length\":$len,\"color\":\"$colors[$i % 9]\"}"; + $i++; + } +} +close input; + +print output "\n],\n\"tracks\":[\n{\n"; +print output "\"name\": \"Density\",\n"; +print output "\"type\": \"plot\",\n"; +print output "\"values\":\n[\n"; + +my @hist=(); +my $last_chr=""; +my $i=0; +my $k=0; +open (input, "<$ARGV[0]") or die "Can't open $ARGV[0] since $!\n"; +#my $header=<input>; +while (my $line=<input>) { + chomp($line); + my @a=split(/\t/, $line); + if ($a[0] eq $last_chr) { + my $mid=int(($a[1]+$a[2])/2); + if (int($mid/$ARGV[2]) > $i) { + $i++; + $hist[$i]=1; + } + else {$hist[$i]++;} + } + else { + if (($last_chr ne "") && ($chrs{$last_chr} == 1)) { + if ($k > 0) {print output ",\n";} + print output "{\"color\":\"$colors[$k % 9]\",\"chr\":\"$last_chr\",\"values\":["; + for my $j (0..$i-1) {print output "$hist[$j],";} + print output "$hist[$i]]}"; + $k++; + } + $i=0; + $hist[0]=1; + $last_chr=$a[0]; + } +} +close input; + +print output "\n]}\n]\n}\n"; +close output;
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/scripts/get_class.pl Mon Apr 25 13:08:56 2016 -0400 @@ -0,0 +1,45 @@ +#!/share/bin/perl + +# chr2L 19384 20049 FBgn0001283_jockey wXh24,-1;harwich,-8,+5;whXw21,-4,+1;whXw14,-5;wXh14,+2; +- sense chr2L:19562.19645 + +my @sample=("$ARGV[1]"); + +print "chr\tstart\tend\ttransposonName\tstrand\ttransposonStrand\tbreak\tclass"; +print "\t$_\_class\t$_\_plus\t$_\_minus" foreach @sample; +print "\n"; +open in,$ARGV[0]; +while(<in>) +{ + chomp; + my($chrom,$start,$end,$transposonName,$class,$strand,$transposonStrand,$break)=split/\t/; + my %classCounts; + my ($tcplus,$tcminus)=(0,0); + foreach $s (split/;/,$class) + { + my ($name,@counts)=split/,/,$s; + foreach my $c (@counts) + { + my $strand=($c>0)?"+":"-"; + $classCounts{$name}{$strand}=$c; + $tcplus+=$c if $c>0; + $tcminus+=$c if $c<0; + } + } + print "$chrom\t$start\t$end\t$transposonName\t$strand\t$transposonStrand\t$break"; + print "\t1p1" if $tcplus>0 && $tcminus<0; + print "\t2p" if ($tcplus>1 && $tcminus==0) || ($tcplus==0 && $tcminus<-1); + print "\tsingleton" if ($tcplus<=1 && $tcminus==0 && $tcplus>0) || ($tcplus==0 && $tcminus>=-1 && $tcminus<0); + print "\tNone" if ($tcminus==0 && $tcplus==0); + foreach my $s (@sample) + { + $classCounts{$s}{"+"}=0 if not exists $classCounts{$s}{"+"}; + $classCounts{$s}{"-"}=0 if not exists $classCounts{$s}{"-"}; + print "\t1p1" if $classCounts{$s}{"+"}>0 && $classCounts{$s}{"-"}<0; + print "\t2p" if ($classCounts{$s}{"+"}>1 && $classCounts{$s}{"-"}==0) || ($classCounts{$s}{"+"}==0 && $classCounts{$s}{"-"}<-1); + print "\tsingleton" if ($classCounts{$s}{"+"}<=1 && $classCounts{$s}{"-"}==0 && $classCounts{$s}{"+"}>0) || ($classCounts{$s}{"+"}==0 && $classCounts{$s}{"-"}>=-1 && $classCounts{$s}{"-"}<0); + print "\tNone" if $classCounts{$s}{"+"}==0 && $classCounts{$s}{"-"}==0; + print "\t",$classCounts{$s}{"+"},"\t",$classCounts{$s}{"-"}; + } + print "\n"; +} +close in;
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/scripts/make.bp.bed.pl Mon Apr 25 13:08:56 2016 -0400 @@ -0,0 +1,110 @@ +#! /usr/bin/perl + +use strict; + +my @sample=(); +open (in, "<$ARGV[0]") or die "Can't open $ARGV[0] since $!\n"; +my $line=<in>; +close in; + +my %chrs=(); +my @a=split(/\t/, $line); +for my $i (0..$#a) { + if ($a[$i] =~ /_class$/) { + my $name=$a[$i]; + $name =~ s/_class//; + my $j=$i+1; + my $k=$i+2; + my $l=$i+3; + system("cut -f7,4,6,$j,$k,$l $ARGV[0] > temp"); + open (input, "<temp") or die "Can't open temp since $!\n"; + open (output, ">>$name.insertion.bp.bed") or die "Can't open $name.insertion.bp.bed since $!\n"; + my $header=<input>; + while (my $line=<input>) { + chomp($line); + my @b=split(/\t/, $line); + if (($b[4] ne "0")||($b[5] ne "0")) { + my @c=split(/\:/, $b[2]); + my @d=split(/\./, $c[1]); + if ($d[0] > $d[1]) { + my $temp=$d[0]; + $d[0]=$d[1]; + $d[1]=$temp; + } + my $lower=$d[0]; + my $upper=$d[1]; + if (($lower >= 0) && ($upper >= 0)) { + print output "$c[0]\t$lower\t$upper\t$b[0]\t$b[1]\t$b[3]\t$b[4]\t$b[5]\n"; + } + $chrs{$c[0]}=1; + } + } + close input; + close output; + system("rm temp"); + + if ($ARGV[1] ne "") { + open (input, "<$name.insertion.bp.bed") or die "Can't open $name.insertion.bp.bed since $!\n"; + open (output, ">tmp") or die "Can't tmp since $!\n"; + while (my $line=<input>) { + chomp($line); + my @a=split(/\t/, $line); + if (($a[0] =~ /^\d{1,2}$/) || ($a[0] eq "X") || ($a[0] eq "Y")) {$a[0]="chr$a[0]";} + my $strand="+"; + if ($a[4] eq "antisense") {$strand="-";} + print output "$a[0]\t$a[1]\t$a[2]\t$a[3]\t\.\t$strand\t$a[5]\t$a[6]\t$a[7]\n"; + } + close input; + close output; + + system("bedtools intersect -a tmp -b $ARGV[1] -f 0.1 -wo -s > tmp1"); + if ($ARGV[2] eq "") { + system("awk -F \"\\t\" '{OFS=\"\\t\"; if ((\$4==\$13)&&(\$6==\$15)) print \$1,\$2,\$3,\$4,\$5,\$6}' tmp1 > tmp2"); + } + else { + my %family=(); + open (input, "<$ARGV[2]") or die "Can't open $ARGV[2] since $!\n"; + while (my $line=<input>) { + chomp($line); + my @a=split(/\t/, $line); + $family{$a[0]}=$a[1]; + } + close input; + + open (input, "<tmp1") or die "Can't open tmp1 since $!\n"; + open (output, ">>tmp2") or die "Can't open tmp2 since $!\n"; + while (my $line=<input>) { + chomp($line); + my @a=split(/\t/, $line); + if (($family{$a[3]} eq $family{$a[12]}) && ($a[5] eq $a[14])) { + print output "$a[0]\t$a[1]\t$a[2]\t$a[3]\t$a[4]\t$a[5]\n"; + } + } + close input; + close output; + } + + if (-s "tmp2") { + system("bedtools subtract -a tmp -b tmp2 -f 1.0 > tmp3"); + open (input, "<tmp3") or die "Can't open tmp3 since $!\n"; + open (output, ">$name.insertion.bp.bed") or die "Can't open $name.insertion.bp.bed since $!\n"; + while (my $line=<input>) { + chomp($line); + my @a=split(/\t/, $line); + my $direction="sense"; + if ($a[5] eq "-") {$direction="antisense";} + my $chr_num=$a[0]; + $chr_num =~ s/chr//; + if (($chrs{$a[0]} == 1) && (! defined $chrs{$chr_num})) {$chr_num=$a[0];} + print output "$chr_num\t$a[1]\t$a[2]\t$a[3]\t$direction\t$a[6]\t$a[7]\t$a[8]\n"; + } + close input; + close output; + } + } + + system("rm tmp*"); + + } +} +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/scripts/mergeTagsWithGap.pl Mon Apr 25 13:08:56 2016 -0400 @@ -0,0 +1,196 @@ +#!/share/bin/perl +#chr2L 114333 114409 FBgn0003055_P-element harwich,1; + antisense +#chr2L 114443 114567 FBgn0003055_P-element harwich,3; + antisense +#chr2L 114636 114712 FBgn0003055_P-element harwich,1; - antisense +#chr2L 131640 131929 FBgn0003055_P-element harwich,42; + sense +#chr2L 131948 132274 FBgn0003055_P-element harwich,18; - sense +#chr2L 132027 132103 FBgn0003055_P-element harwich,1; - antisense + +use warnings; +use strict; +use List::Util qw(max min); + +if(scalar(@ARGV)<2 || grep {/^-h/} @ARGV) +{ + die " +usage: mergeOverlapBed4.pl inputFile +Expects BED input with at least 4 fields. For each {chr,name} pair, +merges overlapping ranges and prints out sorted BED4 to stdout. +inputFile can be - or stdin to read from stdin. +"; +} + +my $input=shift @ARGV; +my $maxgap=shift @ARGV; +grep {s/^stdin$/-/i} $input; + +my %item2coords; +open IN,$input; +while (<IN>) +{ + chomp; + my ($chrom,$start,$end,$transposonName,$count,$strand,$transposonStrand)=split/\t/; + push @{$item2coords{"$chrom;$transposonName;$transposonStrand"}},[$start,$end,$count,$strand]; +} +close IN; + +my @results; +foreach my $item (keys %item2coords) +{ + my @sortedCoords=sort{ $a->[0]<=>$b->[0] } @{$item2coords{$item}}; + my ($chrom,$tName,$tStrand)=split(/;/,$item); + my ($mergeStart,$mergeEnd,$mergeCounts,$mergeStrand)=@{shift @sortedCoords}; + my %sampleCounts=(); + my ($breakStart,$breakEnd)=0; + foreach my $sa (split/;/,$mergeCounts) + { + my ($s,$c)=split/,/,$sa; + $sampleCounts{$s}{$mergeStrand}=$c; + } + foreach my $rangeRef (@sortedCoords) + { + my ($rangeStart,$rangeEnd,$rangeCounts,$rangeStrand)=@{$rangeRef}; + if($mergeStrand=~/\Q$rangeStrand\E$/) + { + if($rangeStart>=$mergeEnd+$maxgap) + { + $mergeCounts=""; + foreach my $s (keys %sampleCounts) + { + $mergeCounts.=$s; + $mergeCounts.=",".$_.$sampleCounts{$s}{$_} foreach keys %{$sampleCounts{$s}}; + $mergeCounts.=";"; + } + if($mergeStrand eq "+") + { + $breakStart=$mergeEnd; + $breakEnd=$mergeEnd+$maxgap; + } + if($mergeStrand eq "-") + { + $breakStart=$mergeStart-$maxgap; + $breakEnd=$mergeStart; + } + push @results,[$chrom,$mergeStart,$mergeEnd,$tName,$mergeCounts,$mergeStrand,$tStrand,"$chrom:$breakStart.$breakEnd"]; + ($mergeStart,$mergeEnd,$mergeStrand)=($rangeStart,$rangeEnd,$rangeStrand); + %sampleCounts=(); + foreach my $sa (split/;/,$rangeCounts) + { + my ($s,$c)=split/,/,$sa; + $sampleCounts{$s}{$rangeStrand}=$c; + } + } + else + { + $mergeEnd=max($rangeEnd,$mergeEnd); + foreach my $sa (split/;/,$rangeCounts) + { + my ($s,$c)=split/,/,$sa; + $sampleCounts{$s}{$rangeStrand}+=$c; + } + } + } + elsif($rangeStrand eq "+") + { + $mergeCounts=""; + foreach my $s (keys %sampleCounts) + { + $mergeCounts.=$s; + $mergeCounts.=",".$_.$sampleCounts{$s}{$_} foreach keys %{$sampleCounts{$s}}; + $mergeCounts.=";"; + } + if($mergeStrand eq "+") + { + $breakStart=$mergeEnd; + $breakEnd=$mergeEnd+$maxgap; + } + if($mergeStrand eq "-") + { + $breakStart=$mergeStart-$maxgap; + $breakEnd=$mergeStart; + } + push @results,[$chrom,$mergeStart,$mergeEnd,$tName,$mergeCounts,$mergeStrand,$tStrand,"$chrom:$breakStart.$breakEnd"]; + ($mergeStart,$mergeEnd,$mergeStrand)=($rangeStart,$rangeEnd,$rangeStrand); + %sampleCounts=(); + foreach my $sa (split/;/,$rangeCounts) + { + my ($s,$c)=split/,/,$sa; + $sampleCounts{$s}{$rangeStrand}=$c; + } + } + else + { + if($rangeStart>=$mergeEnd+$maxgap*2) + { + $mergeCounts=""; + foreach my $s (keys %sampleCounts) + { + $mergeCounts.=$s; + $mergeCounts.=",".$_.$sampleCounts{$s}{$_} foreach keys %{$sampleCounts{$s}}; + $mergeCounts.=";"; + } + if($mergeStrand eq "+") + { + $breakStart=$mergeEnd; + $breakEnd=$mergeEnd+$maxgap; + } + if($mergeStrand eq "-") + { + $breakStart=$mergeStart-$maxgap; + $breakEnd=$mergeStart; + } + push @results,[$chrom,$mergeStart,$mergeEnd,$tName,$mergeCounts,$mergeStrand,$tStrand,"$chrom:$breakStart.$breakEnd"]; + ($mergeStart,$mergeEnd,$mergeStrand)=($rangeStart,$rangeEnd,$rangeStrand); + %sampleCounts=(); + foreach my $sa (split/;/,$rangeCounts) + { + my ($s,$c)=split/,/,$sa; + $sampleCounts{$s}{$rangeStrand}=$c; + } + } + else + { + $breakStart=$mergeEnd; + $mergeEnd=max($rangeEnd,$mergeEnd); + $breakEnd=$rangeStart; + foreach my $sa (split/;/,$rangeCounts) + { + my ($s,$c)=split/,/,$sa; + $sampleCounts{$s}{$rangeStrand}+=$c; + } + $mergeStrand.=$rangeStrand; + } + } + } + $mergeCounts=""; + foreach my $s (keys %sampleCounts) + { + $mergeCounts.=$s; + $mergeCounts.=",".$_.$sampleCounts{$s}{$_} foreach keys %{$sampleCounts{$s}}; + $mergeCounts.=";"; + } + if($mergeStrand eq "+") + { + $breakStart=$mergeEnd; + $breakEnd=$mergeEnd+$maxgap; + } + if($mergeStrand eq "-") + { + $breakStart=$mergeStart-$maxgap; + $breakEnd=$mergeStart; + } + push @results,[$chrom,$mergeStart,$mergeEnd,$tName,$mergeCounts,$mergeStrand,$tStrand,"$chrom:$breakStart.$breakEnd"] if $mergeEnd; +} + +sub bed4Cmp +{ + # For sorting by chrom, chromStart, and names -- reverse order for names + return $a->[0] cmp $b->[0] || + $a->[1] <=> $b->[1] || + $b->[3] cmp $a->[3]; +} + +foreach my $r (sort bed4Cmp @results) +{ + print join("\t",@{$r}),"\n"; +}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/scripts/mergeTagsWithoutGap.pl Mon Apr 25 13:08:56 2016 -0400 @@ -0,0 +1,91 @@ +#!/share/bin/perl +#chr2L 735929 736005 HWUSI-EAS1533_0002:1:73:4665:12371#0/2 FBgn0000155_roo,-58;FBgn0000155_roo,-8722; - +use warnings; +use strict; + +if(scalar(@ARGV)<1 || grep {/^-h/} @ARGV) +{ + die " +usage: mergeOverlapBed4.pl inputFile +Expects BED input with at least 4 fields. For each {chr,name} pair, +merges overlapping ranges and prints out sorted BED4 to stdout. +inputFile can be - or stdin to read from stdin. +"; +} + +my $input=shift @ARGV; +grep {s/^stdin$/-/i} $input; + +my %item2coords; +open IN,$input; +while (<IN>) +{ + chomp; + my ($chrom,$start,$end,$sample,$class,$strand)=split/\t/; + die "Sorry, input must have at least 4 fields of BED.\n" if ! $class; + # random choose one +# my @loc=$class=~/(.*?),(\+|-)(.*)/; +# my $transposonStrand=($strand eq $loc[1])?"antisense":"sense"; +# push @{$item2coords{"$chrom;$strand;$loc[0];$transposonStrand"}},[$start,$end,$sample] + + # norm by class + my @loc=map { [/(.*?),(\+|-)(.*)/] } split/;/,$class; + my %transposonName; + foreach my $l (@loc) + { + my $transposonStrand=($strand eq $$l[1])?"antisense":"sense"; + $transposonName{$$l[0]}=$transposonStrand; + } + my $c=1/scalar(keys %transposonName); + push @{$item2coords{"$chrom;$strand;$_;$transposonName{$_}"}},[$start,$end,$sample,$c] foreach keys %transposonName; +} +close IN; + +my @results; +foreach my $item (keys %item2coords) +{ + my @sortedCoords=sort{ $a->[0]<=>$b->[0] } @{$item2coords{$item}}; + my ($chrom,$strand,$tName,$tStrand)=split(/;/,$item); + my ($mergeStart,$mergeEnd,$mergeSample,$mergeCounts)=@{shift @sortedCoords}; + my %sampleCounts; + $sampleCounts{$mergeSample}=$mergeCounts; + foreach my $rangeRef (@sortedCoords) + { + my ($rangeStart,$rangeEnd,$rangeSample,$rangeCounts)=@{$rangeRef}; + if($rangeEnd<=$mergeEnd) + { + $sampleCounts{$rangeSample}+=$rangeCounts; + next; + } + if($rangeStart>=$mergeEnd) + { + my $count=""; + $count.=$_.",".$sampleCounts{$_}.";" foreach keys %sampleCounts; + push @results,[$chrom,$mergeStart,$mergeEnd,$tName,$count,$strand,$tStrand]; + ($mergeStart,$mergeEnd,$mergeSample,$mergeCounts)=($rangeStart,$rangeEnd,$rangeSample,$rangeCounts); + %sampleCounts=(); + $sampleCounts{$mergeSample}=$mergeCounts; + } + else + { + $mergeEnd=$rangeEnd; + $sampleCounts{$rangeSample}+=$rangeCounts; + } + } + my $count=""; + $count.=$_.",".$sampleCounts{$_}.";" foreach keys %sampleCounts; + push @results,[$chrom,$mergeStart,$mergeEnd,$tName,$count,$strand,$tStrand] if $mergeEnd; +} + +sub bed4Cmp +{ + # For sorting by chrom, chromStart, and names -- reverse order for names + return $a->[0] cmp $b->[0] || + $a->[1] <=> $b->[1] || + $b->[3] cmp $a->[3]; +} + +foreach my $r (sort bed4Cmp @results) +{ + print join("\t",@{$r}),"\n"; +}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/scripts/pickClippedFastq.pl Mon Apr 25 13:08:56 2016 -0400 @@ -0,0 +1,193 @@ +#!/share/bin/perl +use List::Util qw(sum); +use Bio::Seq; + +die "perl $0 <input_prefix> <TE sequence database>\n" if @ARGV<1; + +my %transposon_seq=(); +my %transposon_revcom_seq=(); +my $curr_seq=""; +my $curr_transposon=""; +open (input, "<$ARGV[1]") or die "Can't open $ARGV[1] since $!\n"; +while (my $line=<input>) { + chomp($line); + if ($line =~ /^\>/) { + if ($curr_transposon ne "") { + $transposon_seq{$curr_transposon}=uc($curr_seq); + my $seq=Bio::Seq->new(-seq=>$curr_seq, -alphabet => 'dna'); + $curr_seq=$seq->revcom->seq; + $transposon_revcom_seq{$curr_transposon}=uc($curr_seq); + } + my @a=split(/\s+/, $line); + $a[0] =~ s/\>//; + $curr_transposon=$a[0]; + $curr_seq=""; + } + else {$curr_seq=$curr_seq.$line;} +} +close input; + +open m1,">>$ARGV[0].clipped.reads.aln"; + +open (input, "<$ARGV[0].insertion.bp.bed") or die "Can't open $ARGV[0].insertion.bp.bed since $!\n"; +while (my $line=<input>) { + chomp($line); + my @a=split(/\t/, $line); + + my $lower=$a[1]-15; + my $upper=$a[2]+15; + if (($lower > 0)&&($upper > 0)) + { + system("samtools view -hXf 0x2 $ARGV[0].sorted.bam $a[0]\:$lower\-$upper > temp.sam"); + + open in,"temp.sam"; + my %pe1; + my %pe2; + while(<in>) + { + chomp; + my @f=split/\t/,$_,12; + ## read number 1 or 2 + my ($rnum)=$f[1]=~/(\d)$/; + + ## XT:A:* + my ($xt)=$f[11]=~/XT:A:(.)/; + + if ($f[5]=~/S/) { + + ## Coordinate + my $coor=-10; + my $strand=""; + my $final=""; + my $clipseq=""; + my @z=split(/M/, $f[5]); + + if (($f[5]=~/S$/)&&($f[1]=~/r/)) + { + my (@cigar_m)=$f[5]=~/(\d+)M/g; + my (@cigar_d)=$f[5]=~/(\d+)D/g; + my (@cigar_s)=$f[5]=~/(\d+)S/g; + my (@cigar_i)=$f[5]=~/(\d+)I/g; + my $aln_ln=sum(@cigar_m,@cigar_d); + $coor=$f[3]+$aln_ln-1; + $strand="-"; + + my (@clipped)=$z[1]=~/(\d+)S/g; + my $cliplen=sum(@clipped); + if ($cliplen >= 15) { + $clipseq=substr($f[9], length($f[9])-$cliplen, $cliplen); + } + } + + elsif (($f[1]=~/R/)&&($z[0]=~/S/)) + { + $coor=$f[3]; $strand="+"; + + my (@clipped)=$z[0]=~/(\d+)S/g; + my $cliplen=sum(@clipped); + if ($cliplen >= 15) { + $clipseq=substr($f[9], 0, $cliplen); + } + } + + if ($clipseq ne "") { + my $flag=0; + while ((my $key, my $value) = each (%transposon_seq)) { + my $seq=$value; + if ($a[4] eq "antisense") { + $seq=$transposon_revcom_seq{$key}; + } + if (($seq =~ /$clipseq/)&&($a[3] eq $key)&&($coor >= $lower)&&($coor <= $upper)) { +# print "$clipseq\n"; + $final=$coor."\($strand\)"; + if (defined $pe1{$final}) { + if (length($clipseq) > length($pe1{$final})) { + $pe1{$final}=$clipseq; + } + } + else {$pe1{$final}=$clipseq; $pe2{$final}=0;} + $flag=1; + last; + } + } + } + + } + }#while; + close in; + + open in,"temp.sam"; + while(<in>) + { + chomp; + my @f=split/\t/,$_,12; + my ($rnum)=$f[1]=~/(\d)$/; + my ($xt)=$f[11]=~/XT:A:(.)/; + + if ($f[5]=~/S/) { + + my $coor=-10; + my $strand=""; + my $final=""; + my $clipseq=""; + my @z=split(/M/, $f[5]); + + if (($f[5]=~/S$/)&&($f[1]=~/r/)) + { + my (@cigar_m)=$f[5]=~/(\d+)M/g; + my (@cigar_d)=$f[5]=~/(\d+)D/g; + my (@cigar_s)=$f[5]=~/(\d+)S/g; + my (@cigar_i)=$f[5]=~/(\d+)I/g; + my $aln_ln=sum(@cigar_m,@cigar_d); + $coor=$f[3]+$aln_ln-1; + $strand="-"; + + my (@clipped)=$z[1]=~/(\d+)S/g; + my $cliplen=sum(@clipped); + if ($cliplen >= 6) { + $clipseq=substr($f[9], length($f[9])-$cliplen, $cliplen); + } + } + + elsif (($f[1]=~/R/)&&($z[0]=~/S/)) + { + $coor=$f[3]; $strand="+"; + + my (@clipped)=$z[0]=~/(\d+)S/g; + my $cliplen=sum(@clipped); + if ($cliplen >= 6) { + $clipseq=substr($f[9], 0, $cliplen); + } + } + + if ($clipseq ne "") { + foreach my $coor (keys %pe1) { + if (($coor =~ /\+/) && (substr($pe1{$coor}, length($pe1{$coor})-length($clipseq), length($clipseq)) eq $clipseq)) { + $pe2{$coor}++; + } + elsif (($coor =~ /\-/) && (substr($pe1{$coor}, 0, length($clipseq)) eq $clipseq)) { + $pe2{$coor}++; + } + } + } + } + } + close in; + + my $clip_site=""; + + foreach my $coor (keys %pe2) + { + $clip_site=$clip_site."$coor\:$pe2{$coor}\;"; + } + chop($clip_site); + print m1 "$a[0]\t$lower\t$upper\t$a[3]\t$clip_site\n"; + system("rm temp.sam"); + } + else + { + print m1 "$line\n"; + } +} +close input; +close m1;
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/scripts/pickOverlapPair.ex.pl Mon Apr 25 13:08:56 2016 -0400 @@ -0,0 +1,120 @@ +#!/share/bin/perl +use Bio::Seq; +use List::Util qw(sum); + +die "perl $0 <*.excision.cluster.rpmk.refined.bp>\n" if @ARGV<0; + +my $title=$ARGV[0]; +if ($title =~ /annotation/) { + $title =~ s/excision.cluster.annotation.refined.bp/sorted.bam/; +} +else {$title =~ s/excision.cluster.rpmk.refined.bp/sorted.bam/;} +#system("samtools index /home/wangj2/scratch/bill/bill_genomic/$title"); + +my %chrs=(); +system("samtools view -H $title > header"); +open (input, "<header") or die "Can't open header since $!\n"; +while (my $line=<input>) { + if ($line =~ /^\@SQ/) { + my @a=split(/\t/, $line); + for my $j (0..$#a) { + if ($a[$j] =~ /^SN:/) { + $a[$j] =~ s/^SN://; + $chrs{$a[$j]}=1; + } + } + } +} +close input; +system("rm header"); + +open (input, "<$ARGV[0]") or die "Can't open $ARGV[0] since $!\n"; +my $header=<input>; +while (my $line=<input>) { + chomp($line); + my @a=split(/\t/, $line); + + my $left=0; + my $right=0; + if ($a[4] eq "") {$left=$a[1];} + else { + my @t=split(/\,/, $a[4]); + my @p=split(/\(/, $t[$#t]); + $left=$p[0]; + } + if ($a[5] eq "") {$right=$a[2];} + else { + my @t=split(/\,/, $a[5]); + my @p=split(/\(/, $t[0]); + $right=$p[0]; + } + + my $leftlower=$left-500; + my $leftupper=$left+500; + my $rightlower=$right-500; + my $rightupper=$right+500; + my $chr_num=$a[0]; + $chr_num =~ s/chr//; + if (($chrs{$a[0]} == 1) && (! defined $chrs{$chr_num})) {$chr_num=$a[0];} + system("samtools view -Xf 0x2 $title $chr_num\:$leftlower\-$leftupper $chr_num\:$rightlower\-$rightupper > temp.sam"); + + open in,"temp.sam"; + my %ps=(); + my %me=(); + my %uniqp=(); + my %uniqm=(); + my $ref_sup=0; + + while(<in>) + { + chomp; + my @f=split/\t/,$_,12; + ## read number 1 or 2 + my ($rnum)=$f[1]=~/(\d)$/; + + ## XT:A:* + my ($xt)=$f[11]=~/XT:A:(.)/; + + ## Coordinate + my $coor=$f[3]; + if ($f[1]=~/r/) + { + if ($xt eq "U") {$uniqm{$f[0]}=1;} + my (@cigar_m)=$f[5]=~/(\d+)M/g; + my (@cigar_d)=$f[5]=~/(\d+)D/g; + my (@cigar_s)=$f[5]=~/(\d+)S/g; + my (@cigar_i)=$f[5]=~/(\d+)I/g; + my $aln_ln=sum(@cigar_m,@cigar_d); + $me{$f[0]}=$f[3]+$aln_ln-1; + } + elsif ($f[1]=~/R/) { + $ps{$f[0]}=$f[3]; + if ($xt eq "U") {$uniqp{$f[0]}=1;} + } + +# ${$pe{$f[0]}}[$rnum-1]=[$xt,$coor]; + } + close in; + + foreach my $id (keys %ps) + { +# my @rid=@{$pe{$id}}; + +# if(($rid[0][0] eq "U" && $rid[1][0] eq "M") || ($rid[0][0] eq "M" && $rid[1][0] eq "U")) +# { +# $soft_clip++; +# print "$id\n"; +# } + + if ((defined $me{$id})&&((defined $uniqp{$id})||(defined $uniqm{$id}))) + { + if (((($ps{$id}+5)<=$right)&&($me{$id}>$right)&&($uniqm{$id}==1)) || ((($me{$id}-5)>=$left)&&($ps{$id}<$left)&&($uniqp{$id}==1))) { + $ref_sup++; + } + } + } + + print "$a[0]\t$a[1]\t$a[2]\t$a[3]\t$left\t$right\t$ref_sup\n"; + system("rm temp.sam"); +} +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/scripts/pickOverlapPair.in.pl Mon Apr 25 13:08:56 2016 -0400 @@ -0,0 +1,92 @@ +#!/share/bin/perl +use Bio::Seq; +use List::Util qw(sum); + +die "perl $0 <input.insertion.refined.bp> <fragment size>\n" if @ARGV<1; + +my $title=$ARGV[0]; +$title =~ s/insertion.refined.bp/sorted.bam/; +my $frag=$ARGV[1]; + +open (input, "<$ARGV[0]") or die "Can't open $ARGV[0] since $!\n"; +print "Chr\tStart\tEnd\tTransposonName\tTransposonDirection\tClass\tVariantSupport\tFrequency\tJunction1\tJunction1Support\tJunction2\tJunction2Support\t5\'_Support\t3\'_Support\n"; +while (my $line=<input>) { + chomp($line); + my @a=split(/\t/, $line); + + my @b=split(/\(/, $a[6]); + my @c=split(/\(/, $a[7]); + my $terminal="5\'"; + my $reverse=0; + my $positive=$b[0]; + my $negative=$c[0]; + if ($b[0] > $c[0]) { + $terminal="3\'"; + $reverse=1; + my $swap=$b[0]; + $b[0]=$c[0]; + $c[0]=$swap; + } + my $lower=$b[0]-$frag; + my $upper=$c[0]+$frag; + system("samtools view -Xf 0x2 $title $a[0]\:$lower\-$upper > temp.sam"); + + open in,"temp.sam"; + my %ps=(); + my %me=(); + my $ref_sup=0; + my $soft_clip=0; + while(<in>) + { + chomp; + my @f=split/\t/,$_,12; + ## read number 1 or 2 + my ($rnum)=$f[1]=~/(\d)$/; + + ## XT:A:* + my ($xt)=$f[11]=~/XT:A:(.)/; + + ## Coordinate + if ($f[1]=~/r/) + { + my (@cigar_m)=$f[5]=~/(\d+)M/g; + my (@cigar_d)=$f[5]=~/(\d+)D/g; + my (@cigar_s)=$f[5]=~/(\d+)S/g; + my (@cigar_i)=$f[5]=~/(\d+)I/g; + my $aln_ln=sum(@cigar_m,@cigar_d); + $me{$f[0]}=$f[3]+$aln_ln-1; + } + else + { + $ps{$f[0]}=$f[3]; + } + +# ${$pe{$f[0]}}[$rnum-1]=[$xt,$coor]; + } + close in; + + + foreach my $id (keys %ps) + { + if (defined $me{$id}) + { + if (((($ps{$id}+5)<=$positive)&&($me{$id}>$negative)) || ((($me{$id}-5)>=$negative)&&($ps{$id}<$positive))) { +# if (((($ps{$id}+5)<=$b[0])&&($me{$id}>$c[0])) || ((($me{$id}-5)>=$c[0])&&($ps{$id}<$b[0]))) { + $ref_sup++; +# print "$id\n"; + } + } + } + + my $variant=$a[8]+$a[9]+$a[10]+$a[11]; + my $ratio=sprintf("%.4f", $variant/($variant+$ref_sup)); + if (($a[0] =~ /^\d{1,2}$/) || ($a[0] eq "X") || ($a[0] eq "Y")) {$a[0]="chr$a[0]";} + if ($reverse == 0) { + print "$a[0]\t$a[1]\t$a[2]\t$a[3]\t$a[4]\t$a[5]\t$variant\t$ratio\t$b[0]\t$a[8]\t$c[0]\t$a[9]\t$a[10]\t$a[11]\n"; + } + elsif ($reverse == 1) { + print "$a[0]\t$a[1]\t$a[2]\t$a[3]\t$a[4]\t$a[5]\t$variant\t$ratio\t$b[0]\t$a[9]\t$c[0]\t$a[8]\t$a[10]\t$a[11]\n"; + } + system("rm temp.sam"); +} +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/scripts/pickSoftClipping.over.pl Mon Apr 25 13:08:56 2016 -0400 @@ -0,0 +1,163 @@ +#!/share/bin/perl +use Bio::Seq; +use List::Util qw(sum); + +die "perl $0 <*.excision.cluster.rpmk> <Reference.2bit>\n" if @ARGV<1; + +my $title=$ARGV[0]; +if ($title =~ /annotation/) { + $title =~ s/excision.cluster.annotation/sorted.bam/; +} +else {$title =~ s/excision.cluster.rpmk/sorted.bam/;} + +my %chrs=(); +system("samtools view -H $title > header"); +open (input, "<header") or die "Can't open header since $!\n"; +while (my $line=<input>) { + if ($line =~ /^\@SQ/) { + my @a=split(/\t/, $line); + for my $j (0..$#a) { + if ($a[$j] =~ /^SN:/) { + $a[$j] =~ s/^SN://; + $chrs{$a[$j]}=1; + } + } + } +} +close input; +system("rm header"); + +open (input, "<$ARGV[0]") or die "Can't open $ARGV[0] since $!\n"; +while (my $line=<input>) { + chomp($line); + my @a=split(/\s+/, $line); + + my $lower=$a[3]-100; + my $upper=$a[4]+100; + my $chr_num=$a[2]; + $chr_num =~ s/chr//; + if (($chrs{$a[2]} == 1) && (! defined $chrs{$chr_num})) {$chr_num=$a[2];} + system("samtools view -bu $title $chr_num\:$lower\-$upper > temp.bam"); + system("samtools view -Xf 0x2 temp.bam > temp.sam"); + + my $leftseq=""; + my $rightseq=""; + + my $ll=$a[3]-150; + my $lu=$a[3]+150; + system("twoBitToFa $ARGV[1] -seq=$a[2] -start=$ll -end=$lu left.fa"); + open (seq, "<left.fa") or die "Can't open left.fa since $!\n"; + my $head=<seq>; + for my $k (0..5) { + $head=<seq>; + chomp($head); + $leftseq=$leftseq."$head"; + } + $leftseq=uc($leftseq); + close seq; + system("rm left.fa"); + + my $rl=$a[4]-150; + my $ru=$a[4]+150; + system("twoBitToFa $ARGV[1] -seq=$a[2] -start=$rl -end=$ru right.fa"); + open (seq, "<right.fa") or die "Can't open right.fa since $!\n"; + my $head=<seq>; + for my $k (0..5) { + $head=<seq>; + chomp($head); + $rightseq=$rightseq."$head"; + } + $rightseq=uc($rightseq); + close seq; + system("rm right.fa"); + + + open in,"temp.sam"; + my %pe=(); + while(<in>) + { + chomp; + my @f=split/\t/,$_,12; + ## read number 1 or 2 + my ($rnum)=$f[1]=~/(\d)$/; + + ## XT:A:* + my ($xt)=$f[11]=~/XT:A:(.)/; + + my $CIGAR=$f[5]; + $CIGAR =~ s/S//g; + if ($f[5]=~/S/) { + + ## Coordinate + my $coor=-10; + my $overcoor=-10; + my $strand=""; + my @z=split(/M/, $f[5]); + + if (($f[5]=~/S$/)&&($f[1]=~/r/)) + { + my (@cigar_m)=$f[5]=~/(\d+)M/g; + my (@cigar_d)=$f[5]=~/(\d+)D/g; + my (@cigar_s)=$f[5]=~/(\d+)S/g; + my (@cigar_i)=$f[5]=~/(\d+)I/g; + my $aln_ln=sum(@cigar_m,@cigar_d); + $coor=$f[3]+$aln_ln-1; + $strand="-"; + + my (@clipped)=$z[1]=~/(\d+)S/g; + my $cliplen=sum(@clipped); +# print "$f[0]\n"; +# print "$cliplen\t"; + if ($cliplen >= 10) { + my $clipseq=substr($f[9], length($f[9])-$cliplen, $cliplen); + $overcoor = index($rightseq, $clipseq); +# print "$clipseq\t$rightseq\t$overcoor\t"; + if ($overcoor > -1) {$overcoor += ($a[4] - 149);} + } +# print "\n"; + } + elsif (($f[1]=~/R/)&&($z[0]=~/S/)) { + $coor=$f[3]; $strand="+"; + my (@clipped)=$z[0]=~/(\d+)S/g; + my $cliplen=sum(@clipped); +# print "$f[0]\n"; +# print "$cliplen\t"; + if ($cliplen >= 10) { + my $clipseq=substr($f[9], 0, $cliplen); + $overcoor = index($leftseq, $clipseq); +# print "$clipseq\t$leftseq\t$overcoor\t"; + if ($overcoor > -1) {$overcoor += ($a[3] - 150 + $cliplen);} + } +# print "\n"; + } + + if ($coor > 0) { + my $final=""; + if ($overcoor > 0) { + if ($strand eq "-") {$final="$coor\-$overcoor"."\($strand\)";} + else {$final="$overcoor\-$coor"."\($strand\)";} + } + else {$final=$coor."\($strand\)";} + if (defined $pe{$final}) {$pe{$final}++;} + else {$pe{$final}=1;} + } + + } + } + close in; + + my $clip_site=""; + + foreach my $coor (keys %pe) + { + if ($pe{$coor} >= 2) { + $clip_site=$clip_site."$coor\:$pe{$coor}\;"; + } + } + + chop($clip_site); + print "$a[2]\t$a[3]\t$a[4]\t$a[5]\t$clip_site\n"; + system("rm temp.sam temp.bam"); +# last; +} +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/scripts/pickUniqIntervalPos.pl Mon Apr 25 13:08:56 2016 -0400 @@ -0,0 +1,36 @@ +#!/share/bin/perl +use Bio::Seq; +use List::Util qw(sum); + +die "perl $0 <sam> <fragment_size>\n" if @ARGV<1; +open in,$ARGV[0]; +my %pe; +while(<in>) +{ + chomp; + my @f=split/\t/,$_,12; + ## read number 1 or 2 + my ($rnum)=$f[1]=~/(\d)$/; + + ## XT:A:* + my ($xt)=$f[11]=~/XT:A:(.)/; + + my $strand="+"; + + ## parse CIGAR + if(($f[1]=~/R/)&&($f[8] > $ARGV[1])&&($f[8] <= 10000)) + { + # CIGAR + my (@cigar_m)=$f[5]=~/(\d+)M/g; + my (@cigar_d)=$f[5]=~/(\d+)D/g; + my (@cigar_s)=$f[5]=~/(\d+)S/g; + my (@cigar_i)=$f[5]=~/(\d+)I/g; + my $aln_ln=sum(@cigar_m,@cigar_d); + +# print $f[2],"\t",$f[3]-1+$aln_ln,"\t",$f[3]+$f[8],"\t$f[0]/$rnum\t","\n"; + if ($f[2] =~ /^\d{1,2}$/) {$f[2]="chr$f[2]";} + print $f[2],"\t",$f[3]-6+$aln_ln,"\t",$f[7]+5,"\t$f[0]/$rnum\t","\n"; + } +} +close in; +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/scripts/pickUniqMate.pl Mon Apr 25 13:08:56 2016 -0400 @@ -0,0 +1,94 @@ +#!/share/bin/perl +use List::Util qw(sum); +use Bio::Seq; + +die "perl $0 <mate sam with header> <uniq bed>\n" if @ARGV<1; + +open in,$ARGV[1]; +my %uniq; +while(<in>) +{ + chomp; + my @f=split; + $uniq{$f[3]}=[@f]; +} +close in; + +open in,$ARGV[0]; +my (%te,@ref,%ref); +while(<in>) +{ + chomp; + my @f=split/\t/,$_,12; + # headers + if(/^\@SQ/) + { + my ($sn,$ln)=/SN:(.*?)\tLN:(\d+)/; + push @ref,[$sn,$ln]; + $ref{$sn}=$#ref; + next; + } + + # unmapped + next if $f[2] eq "*"; + + # alignments + if($f[11]=~/XT:A:/) + { + my ($rnum)=$f[1]=~/(\d)$/; + # CIGAR + my (@cigar_m)=$f[5]=~/(\d+)M/g; + my (@cigar_d)=$f[5]=~/(\d+)D/g; + my (@cigar_s)=$f[5]=~/(\d+)S/g; + my (@cigar_i)=$f[5]=~/(\d+)I/g; + my $aln_ln=sum(@cigar_m,@cigar_d); + + my $strand="+"; + if($f[1]=~/r/) + { + my $seq=Bio::Seq->new(-seq=>$f[9]); + $f[9]=$seq->revcom->seq; + $strand="-"; + } + + # align to the junctions + if(($f[3]+$aln_ln-1)>${$ref[$ref{$f[2]}]}[1]) + { + if(($f[3]+($aln_ln-1)/2)>${$ref[$ref{$f[2]}]}[1]) + { + $f[2]=${$ref[$ref{$f[2]}+1]}[0]; + $f[3]=1; + $aln_ln=$aln_ln-(${$ref[$ref{$f[2]}]}[1]-$f[3]+1); + } + else + { + $aln_ln=${$ref[$ref{$f[2]}]}[1]-$f[3]+1; + } + } + + $pe{$f[0]}{$rnum}=$f[2].",".$strand."$f[3]".";"; + + # XA tag + if($f[11]=~/XA:Z:/) + { + my ($xa)=$f[11]=~/XA:Z:(.*);$/; + my @xa=split(";",$xa); + $pe{$f[0]}{$rnum}.=join(",",(split/,/)[0,1]).";" foreach @xa; + } + } +} +close in; + +foreach my $id (keys %pe) +{ + next if exists $pe{$id}{1} && exists $pe{$id}{2} && exists $uniq{$id."/1"} && exists $uniq{$id."/2"}; + foreach my $rid (keys %{$pe{$id}}) + { + my $mate_id=($rid==1)?2:1; + if(exists $uniq{$id."/".$mate_id}) + { + ${$uniq{$id."/".$mate_id}}[4]=$pe{$id}{$rid}; + print join("\t",@{$uniq{$id."/".$mate_id}}),"\n"; + } + } +}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/scripts/pickUniqPairFastq.pl Mon Apr 25 13:08:56 2016 -0400 @@ -0,0 +1,45 @@ +#!/share/bin/perl +use Bio::Seq; + +die "perl $0 <sam> <output prefix>\n" if @ARGV<1; + +open m1,">$ARGV[1].1.fastq"; +open m2,">$ARGV[1].2.fastq"; + +open in,$ARGV[0]; +my %pe; +while(<in>) +{ + chomp; + my @f=split/\t/,$_,12; + ## read number 1 or 2 + my ($rnum)=$f[1]=~/(\d)$/; + + ## XT:A:* + my ($xt)=$f[11]=~/XT:A:(.)/; + + ## revcom the read mapped to the reverse strand + if($f[1]=~/r/) + { + my $seq=Bio::Seq->new(-seq=>$f[9]); + $f[9]=$seq->revcom->seq; + $f[10]=reverse $f[10]; + } + if (($rnum == 1) || ($rnum == 2)) + { + ${$pe{$f[0]}}[$rnum-1]=[$xt,$f[9],$f[10]]; + } +} +close in; + +foreach my $id (keys %pe) +{ + my @rid=@{$pe{$id}}; + if (($rid[0][1] ne "") && ($rid[1][1] ne "") && (($rid[0][0] eq "U" || $rid[1][0] eq "U"))) + { + print m2 "@"."$id/2","\n",$rid[1][1],"\n","+$id/2","\n",$rid[1][2],"\n"; + print m1 "@"."$id/1","\n",$rid[0][1],"\n","+$id/1","\n",$rid[0][2],"\n"; + } +} +close m1; +close m2;
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/scripts/pickUniqPairFastq_MEM.pl Mon Apr 25 13:08:56 2016 -0400 @@ -0,0 +1,60 @@ +#!/share/bin/perl +use Bio::Seq; + +die "perl $0 <sam> <output prefix>\n" if @ARGV<1; + +open m1,">$ARGV[1].1.fastq"; +open m2,">$ARGV[1].2.fastq"; + +open in,$ARGV[0]; +my %pe; +while(<in>) +{ + chomp; + my @f=split/\t/,$_,12; + ## read number 1 or 2 + my ($rnum)=$f[1]=~/(\d)$/; + + ## XT:A:* + my $xt=""; + my @a=split(/\s+/, $_); + my $as=0; + my $xs=0; + for my $i (11..$#a) { + if ($a[$i] =~ /^AS:i:/) { + $a[$i] =~ s/AS:i://; + $as=$a[$i]; + } + elsif ($a[$i] =~ /^XS:i:/) { + $a[$i] =~ s/XS:i://; + $xs=$a[$i]; + } + if (($xs > 0) && ($as-$xs <= $ARGV[2])) {$xt="R";} + else {$xt="U";} + } + + ## revcom the read mapped to the reverse strand + if($f[1]=~/r/) + { + my $seq=Bio::Seq->new(-seq=>$f[9]); + $f[9]=$seq->revcom->seq; + $f[10]=reverse $f[10]; + } + if (($rnum == 1) || ($rnum == 2)) + { + ${$pe{$f[0]}}[$rnum-1]=[$xt,$f[9],$f[10]]; + } +} +close in; + +foreach my $id (keys %pe) +{ + my @rid=@{$pe{$id}}; + if (($rid[0][1] ne "") && ($rid[1][1] ne "") && (($rid[0][0] eq "U" || $rid[1][0] eq "U"))) + { + print m2 "@"."$id/2","\n",$rid[1][1],"\n","+$id/2","\n",$rid[1][2],"\n"; + print m1 "@"."$id/1","\n",$rid[0][1],"\n","+$id/1","\n",$rid[0][2],"\n"; + } +} +close m1; +close m2;
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/scripts/pickUniqPos.pl Mon Apr 25 13:08:56 2016 -0400 @@ -0,0 +1,41 @@ +#!/share/bin/perl +use Bio::Seq; +use List::Util qw(sum); + +die "perl $0 <sam>\n" if @ARGV<1; +open in,$ARGV[0]; +my %pe; +while(<in>) +{ + chomp; + my @f=split/\t/,$_,12; + ## read number 1 or 2 + my ($rnum)=$f[1]=~/(\d)$/; + + ## XT:A:* + my ($xt)=$f[11]=~/XT:A:(.)/; + + my $strand="+"; + ## revcomp + if($f[1]=~/r/) + { + my $seq=Bio::Seq->new(-seq=>$f[9]); + $f[9]=$seq->revcom->seq; + $strand="-"; + } + + ## parse CIGAR + if($xt eq "U") + { + # CIGAR + my (@cigar_m)=$f[5]=~/(\d+)M/g; + my (@cigar_d)=$f[5]=~/(\d+)D/g; + my (@cigar_s)=$f[5]=~/(\d+)S/g; + my (@cigar_i)=$f[5]=~/(\d+)I/g; + my $aln_ln=sum(@cigar_m,@cigar_d); + + print $f[2],"\t",$f[3]-1,"\t",$f[3]-1+$aln_ln,"\t$f[0]/$rnum\t",$f[9],"\t",$strand,"\n"; + } +} +close in; +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/scripts/pickUniqPos_MEM.pl Mon Apr 25 13:08:56 2016 -0400 @@ -0,0 +1,56 @@ +#!/share/bin/perl +use Bio::Seq; +use List::Util qw(sum); + +die "perl $0 <sam>\n" if @ARGV<1; +open in,$ARGV[0]; +my %pe; +while(<in>) +{ + chomp; + my @f=split/\t/,$_,12; + ## read number 1 or 2 + my ($rnum)=$f[1]=~/(\d)$/; + + ## XT:A:* + my $xt=""; + my @a=split(/\s+/, $_); + my $as=0; + my $xs=0; + for my $i (11..$#a) { + if ($a[$i] =~ /^AS:i:/) { + $a[$i] =~ s/AS:i://; + $as=$a[$i]; + } + elsif ($a[$i] =~ /^XS:i:/) { + $a[$i] =~ s/XS:i://; + $xs=$a[$i]; + } + if (($xs > 0) && ($as-$xs <= $ARGV[1])) {$xt="R";} + else {$xt="U";} + } + + my $strand="+"; + ## revcomp + if($f[1]=~/r/) + { + my $seq=Bio::Seq->new(-seq=>$f[9]); + $f[9]=$seq->revcom->seq; + $strand="-"; + } + + ## parse CIGAR + if($xt eq "U") + { + # CIGAR + my (@cigar_m)=$f[5]=~/(\d+)M/g; + my (@cigar_d)=$f[5]=~/(\d+)D/g; + my (@cigar_s)=$f[5]=~/(\d+)S/g; + my (@cigar_i)=$f[5]=~/(\d+)I/g; + my $aln_ln=sum(@cigar_m,@cigar_d); + + print $f[2],"\t",$f[3]-1,"\t",$f[3]-1+$aln_ln,"\t$f[0]/$rnum\t",$f[9],"\t",$strand,"\n"; + } +} +close in; +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/scripts/refine_breakpoint.ex.pl Mon Apr 25 13:08:56 2016 -0400 @@ -0,0 +1,168 @@ +#! /usr/bin/perl + +use strict; + +my @files=<*.excision.cluster.*>; +foreach my $file (@files) { + if (($file !~ /sfcp/)&&($file !~ /refsup/)) { + my $sfcp=$file.".sfcp"; + my $title=$file.".refined.bp"; + + open (input, "<$file") or die "Can't open $file since $!\n"; + open (input1, "<$sfcp") or die "Can't open $sfcp since $!\n"; + open (output, ">>$title") or die "Can't open $title since $!\n"; + print output "Chr\tStart\tEnd\tTransposonName\t5\'_Junction\t3\'_Junction\n"; + while (my $line=<input>) { + chomp($line); + my @a=split(/\s+/, $line); + my $line1=<input1>; + chomp($line1); + my @b=split(/\t/, $line1); + my @pos=split(/\;/, $b[4]); + my $plusnext=""; my $minusnext=""; + my $plusover=0; my $minusover=0; + my $lpcoor=""; my $lmcoor=""; my $rpcoor=""; my $rmcoor=""; + my $lp=0; my $lm=0; my $rp=0; my $rm=0; + my %plus=(); my %minus=(); + foreach my $site (@pos) { + my @x=split(/\:/, $site); + my @y=split(/\(/, $x[0]); + chop($y[1]); + if (($y[0] =~ /\-/)&&($y[1] eq "+")&&($x[1] >= $plusover)) { + if ($plusover >= 2) {$plusnext="$lpcoor\-$rpcoor\:$plusover";} + $plusover=$x[1]; + my @z=split(/\-/, $y[0]); + $lpcoor=$z[0]; $lp=$x[1]; + $rpcoor=$z[1]; $rp=$x[1]; + } + elsif (($y[0] =~ /\-/)&&($y[1] eq "-")&&($x[1] >= $minusover)) { + if ($minusover >= 2) {$minusnext="$lmcoor\-$rmcoor\:$minusover";} + $minusover=$x[1]; + my @z=split(/\-/, $y[0]); + $lmcoor=$z[0]; $lm=$x[1]; + $rmcoor=$z[1]; $rm=$x[1]; + } + elsif (($y[0] !~ /\-/)&&($y[1] eq "+")) { + $plus{$y[0]}=$x[1]; + } + elsif (($y[0] !~ /\-/)&&($y[1] eq "-")) { + $minus{$y[0]}=$x[1]; + } + } + + if (($plusnext ne "")&&($minusover == 0)) { + my @m=split(/\:/, $plusnext); + if (($m[1] >= 2)&&($m[1] == $plusover)) { + my $count1=$m[1]; my $count2=$plusover; + foreach my $id (keys %plus) { + if ($m[0] =~ /$id/) {$count1 += $plus{$id};} + elsif ($rpcoor == $id) {$count2 += $plus{$id};} + } + if ($count1 > $count2) { + my @n=split(/\-/, $m[0]); + $lpcoor=$n[0]; $lp=$m[1]; + $rpcoor=$n[1]; $rp=$m[1]; + } + } + } + + if (($minusnext ne "")&&($plusover == 0)) { + my @m=split(/\:/, $minusnext); + if (($m[1] >= 2)&&($m[1] == $minusover)) { + my $count1=$m[1]; my $count2=$minusover; + foreach my $id (keys %minus) { + if ($m[0] =~ /$id/) {$count1 += $minus{$id};} + elsif ($lmcoor == $id) {$count2 += $minus{$id};} + } + if ($count1 > $count2) { + my @n=split(/\-/, $m[0]); + $lmcoor=$n[0]; $lm=$m[1]; + $rmcoor=$n[1]; $rm=$m[1]; + } + } + } + + if (($plusover >= 2)&&($minusover >= 2)&&(($lpcoor-$rpcoor) != ($lmcoor-$rmcoor))) { + if ($plusnext ne "") { + my @m=split(/\:/, $plusnext); + my @n=split(/\-/, $m[0]); + if ((($n[1]-$n[0]) == ($rmcoor-$lmcoor))&&($m[1] >= 2)) { + $rpcoor=$n[1]; + $lpcoor=$n[0]; + $plusover=$m[1]; + $lp=$m[1]; + $rp=$m[1]; + } + } + if ($minusnext ne "") { + my @m=split(/\:/, $minusnext); + my @n=split(/\-/, $m[0]); + if ((($n[1]-$n[0]) == ($rpcoor-$lpcoor))&&($m[1] >= 2)) { + $rmcoor=$n[1]; + $lmcoor=$n[0]; + $minusover=$m[1]; + $lm=$m[1]; + $rm=$m[1]; + } + } + } + + my $plusc=0; my $pluscoor=""; + my $minusc=0; my $minuscoor=""; + foreach my $id (keys %plus) { + if ($id eq $rpcoor) { + $rp=$plusover+$plus{$id}; + } + if ($plus{$id} > $plusc) { + $plusc=$plus{$id}; + $pluscoor=$id; + } + elsif (($plus{$id} == $plusc)&&(abs($id-$b[2]) < abs($pluscoor-$b[2]))) { + $plusc=$plus{$id}; + $pluscoor=$id; + } + } + foreach my $id (keys %minus) { + if ($id eq $lmcoor) { + $lm=$minusover+$minus{$id}; + } + if ($minus{$id} > $minusc) { + $minusc=$minus{$id}; + $minuscoor=$id; + } + elsif (($minus{$id} == $minusc)&&(abs($id-$b[1]) < abs($minuscoor-$b[1]))) { + $minusc=$minus{$id}; + $minuscoor=$id; + } + } + if ($plusover < 2) { + $lpcoor=""; + if ($plusc >= 3) {$rpcoor=$pluscoor; $rp=$plusc;} + else {$rpcoor="";} + } + if ($minusover < 2) { + $rmcoor=""; + if ($minusc >= 3) {$lmcoor=$minuscoor; $lm=$minusc;} + else {$lmcoor="";} + } + + my $bp1=""; my $bp2=""; + if (($lpcoor ne "")&&($lmcoor ne "")) { + $bp1="$lpcoor\(\+\)\:$lp,$lmcoor\(\-\)\:$lm"; + } + elsif ($lpcoor ne "") {$bp1="$lpcoor\(\+\)\:$lp";} + elsif ($lmcoor ne "") {$bp1="$lmcoor\(\-\)\:$lm";} + if (($rpcoor ne "")&&($rmcoor ne "")) { + $bp2="$rpcoor\(\+\)\:$rp,$rmcoor\(\-\)\:$rm"; + } + elsif ($rpcoor ne "") {$bp2="$rpcoor\(\+\)\:$rp";} + elsif ($rmcoor ne "") {$bp2="$rmcoor\(\-\)\:$rm";} + + print output "$a[2]\t$a[3]\t$a[4]\t$a[5]\t$bp1\t$bp2\n"; + } + + close input; + close input1; + close output; + } +}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/scripts/refine_breakpoint.in.pl Mon Apr 25 13:08:56 2016 -0400 @@ -0,0 +1,73 @@ +#! /usr/bin/perl + +use strict; + +#my @files=<*.bp.bed.sfcp>; +my @files=<*.clipped.reads.aln>; +for my $file (@files) { + + my $title=$file; + $title =~ s/clipped.reads.aln/insertion.refined.bp/; + my $title2=$file; + $title2 =~ s/clipped.reads.aln/insertion.bp.bed/; + + open (input, "<$file") or die "Can't open $file since $!\n"; + open (input2, "<$title2") or die "Can't open $title2 since $!\n"; + open (output, ">>$title") or die "Can't open $title since $!\n"; + while (my $line=<input>) { + chomp($line); + my @a=split(/\t/, $line); + my @b=split(/\;/, $a[4]); + my $plusmax=""; + my $minusmax=""; + my $plus=0; + my $minus=0; + my $bp=""; + + my $line2=<input2>; + my @z=split(/\t/, $line2); + my $psup=abs($z[6]); + my $msup=abs($z[7]); + my $strand=$z[4]; + my $class=$z[5]; + + for my $element (@b) { + my @c=split(/\:/, $element); + chop($c[0]); + my @d=split(/\(/, $c[0]); + if (($d[1] eq "+") && ($c[1] > $plus)) { + $plusmax=$d[0]; + $plus=$c[1]; + } + elsif (($d[1] eq "-") && ($c[1] > $minus)) { + $minusmax=$d[0]; + $minus=$c[1]; + } + } + + if ($a[1] > 0) { + $a[1] += 15; + $a[2] -= 15; + print output "$a[0]\t$a[1]\t$a[2]\t$a[3]\t$strand\t$class\t"; + if (($minus >= 1)&&($plus >= 1)&&(abs($plusmax-$minusmax) <= 25)) { + print output "$plusmax\(\+\)\t$minusmax\(\-\)\t$plus\t$minus\t"; + } + elsif (($plus >= $minus)&&($plus >= 2)&&($plusmax >= $a[1])&&($plusmax <= $a[2])) { + print output "$plusmax\(\+\)\t$plusmax\(\-\)\t$plus\t0\t"; + } + elsif (($minus >= 2)&&($minusmax >= $a[1])&&($minusmax <= $a[2])) { + print output "$minusmax\(\+\)\t$minusmax\(\-\)\t0\t$minus\t"; + } + else { + my $mid=int(($a[1] + $a[2])/2); + print output "$mid\(\+\)\t$mid\(\-\)\t0\t0\t"; + } + print output "$psup\t$msup\n"; + } + } + close input; + close input2; + close output; + system("uniq $title > temp"); + system("mv temp $title"); +}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/scripts/summarize_excision.pl Mon Apr 25 13:08:56 2016 -0400 @@ -0,0 +1,55 @@ +#! /usr/bin/perl + +use strict; + +my @files=<*.excision.cluster.*.refined.bp>; +foreach my $file (@files) { + my $rfsp=$file.".refsup"; + my $count=$file; + my $title=$file; + $count =~ s/.refined.bp//; + $title =~ s/excision/absence/; + $title =~ s/.cluster.rpmk//; + $title .= ".summary"; + + open (input, "<$file") or die "Can't open $file since $!\n"; + open (input1, "<$count") or die "Can't open $count since $!\n"; + open (input2, "<$rfsp") or die "Can't open $rfsp since $!\n"; + open (output, ">>$title") or die "Can't open $title since $!\n"; + my $header=<input>; + chomp($header); + print output "$header\tVariant\tReference\tFrequency\n"; + while (my $line=<input>) { + chomp($line); + my @a=split(/\t/, $line); + my $line1=<input1>; + my $line2=<input2>; + chomp($line1); + chomp($line2); + my @b=split(/\s+/, $line1); + my @c=split(/\t/, $line2); + + my $variant=$b[1]; + my @x=split(/\:/, $a[4]); + my @y=split(/\:/, $a[5]); + if ($a[4] =~ /\,/) { + my @m=split(/\,/, $x[1]); + $variant += $m[0]+$x[2]; + } + else {$variant += $x[1];} + if ($a[5] =~ /\,/) { + my @n=split(/\,/, $y[1]); + $variant += $n[0]+$y[2]; + } + else {$variant += $y[1];} + my $ratio=sprintf("%.4f", ($variant*2)/($variant*2+$c[6])); + + $line =~ s/:\d+//g; + print output "$line\t$variant\t$c[6]\t$ratio\n"; + } + + close input; + close input1; + close input2; + close output; +}
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/temp.xml Mon Apr 25 13:08:56 2016 -0400 @@ -0,0 +1,69 @@ +<tool id ="run_TEMP" name="Run TEMP" version=" 0.1.0"> + <description></description> + <requirements> + <requirement type="package" version="1.6.924">perl-bioperl</requirement> + <requirement type="package" version="0.7.13">bwa</requirement> + <requirement type="package" version="2.25.0">bedtools</requirement> + <requirement type="package" version="0.1.19">samtools</requirement> + <requirement type="package" version="324">ucsc-twobittofa</requirement> + </requirements> + <stdio> + <exit_code range="1:" /> + </stdio> + <command><![CDATA[ + + ln -f -s "${alignment.metadata.bam_index}" "${alignment.element_identifier}.bai" && + ln -f -s "${alignment}" "${alignment.element_identifier}.bam" && + bash $__tool_directory__/scripts/TEMP_Insertion.sh -i "$alignment" -s $__tool_directory__/scripts -r "$consensus_te_seqs" -t "$bed_te_locations" -m 3 -f "$median_insertsize" -c \${GALAXY_SLOTS:-2} && + bash $__tool_directory__/scripts/TEMP_Absence.sh -i "$alignment" -s $__tool_directory__/scripts -r "$bed_te_locations" -t "$reference2bit" -f 500 -c \${GALAXY_SLOTS:-2} && + mv ${alignment.element_identifier}.insertion.bp.bed $insertion_bed && + mv ${alignment.element_identifier}.insertion.refined.bp $insertion_bed_refined && + mv ${alignment.element_identifier}.insertion.refined.bp.summary $insertion_summary && + mv ${alignment.element_identifier}.absence.refined.bp.summary $absence_summary && + zip $archive *insertion* *excision* *absence* + ]]></command> + <inputs> + <param format="bam" name="alignment" type="data" label="Alignment bam file"/> + <param format="twobit" name="reference2bit" type="data" label="Reference twobit file"/> + <param format="fasta" name="consensus_te_seqs" type="data" label="Consensus TE Seqs fasta file"/> + <param format="bed" name="bed_te_locations" type="data" label="TE Locations bed file"/> + <!-- + <param format="tabular" name="te_families" type="data" label="TE Families"/> + <param format="gff" name="gff_te_locations" type="data" label="Reference TE insertion Locations with Family ID names GFF file"/> + --> + <param format="txt" name="median_insertsize" type="data" label="Median Insert Length"/> + </inputs> + <outputs> + <data format="bed" type="data" name="insertion_bed" Label="Insertion BED file" /> + <data format="bed" type="data" name="insertion_bed_refined" Label="Insertion BED file (refined)" /> + <data format="bed" type="data" name="insertion_summary" Label="Insertion summary file" /> + <data format="bed" type="data" name="absence_summary" Label="Absence summary file" /> + <data format="zip" type="data" name="archive" Label="Compressed output files" /> + </outputs> + <tests> + <test> + <param name="alignment" value="test_chromosome.sorted.bam" ftype="bam"/> + <param name="reference2bit" value="dm3_chr2L.2bit" ftype="twobit"/> + <param name="consensus_te_seqs" value="test_consensus.fa" ftype="fasta"/> + <param name="bed_te_locations" value="test_TE_annotation.bed" ftype="bed"/> + <output name="insertion_bed" file="test_chromosome.insertion.bp.bed" ftype="bed" /> + <output name="insertion_bed_refined" file="test_chromosome.insertion.refined.bp" ftype="bed"/> + <output name="insertion_summary" file="test_chromosome.insertion.refined.bp.summary" ftype="bed"/> + <output name="absence_summary" file="test_chromosome.absence.refined.bp.summary" ftype="bed"/> + </test> + </tests> + <help> <![CDATA[ + + +TEMP is a software package for detecting transposable elements (TEs) insertions and absences from pooled high-throughput sequencing data + +Current version v1.04 + +Author: Jiali Zhuang (jiali.zhuang@umassmed.edu) and Jie Wang (jie.wangj@umassmed.edu) Weng Lab, University of Massachusetts Medical School, Worcester, MA, USA + +For TE insertion analysis run TEMP_Insertion.sh in script. +For TE absence analysis run TEMP_Absence.sh in script. + + + ]]> </help> +</tool> \ No newline at end of file
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/README Mon Apr 25 13:08:56 2016 -0400 @@ -0,0 +1,7 @@ +This is a small simulated dataset for testing if TEMP is properly installed and demonstrating how it works. + +10 TE insertions and 5 TE excisions were generated in the chr2L:2000000-3000000 region of the Drosophila Melanogaster Reference Genome (dm3), and pair-end Illumina reads were simulated with insert size 500, read length 90 and error-rate of 0.0005. Those reads were mapped to the dm3 reference genome and the alignments were in the file "test_chromosome.sorted.bam". + +The 10 simulated insertions and 5 excisions were listed in the file "test_chromosome.sites". +The concensus sequences for those 10 TEs were in the file "test_concensus.fa". +The annotated TE insertions in the reference genome were listed in the file "test_TE_annotation.bed".
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/test_TE_annotation.bed Mon Apr 25 13:08:56 2016 -0400 @@ -0,0 +1,115 @@ +chr2L 1301606 1302488 FBgn0001167_gypsy . - +chr2L 2094501 2094580 FBgn0000155_roo . - +chr2L 2100429 2109522 FBgn0000155_roo . - +chr2L 2112167 2118361 FBgn0003007_opus . - +chr2L 2118446 2119772 FBgn0003007_opus . - +chr2L 2159453 2159556 FBgn0000155_roo . - +chr2L 2267349 2267457 FBgn0000155_roo . + +chr2L 2294096 2299243 FBgn0000349_copia . - +chr2L 2378805 2378893 FBgn0000155_roo . + +chr2L 2530303 2530389 FBgn0000155_roo . + +chr2L 2565592 2569028 FBgn0000005_297 . + +chr2L 2565667 2565886 FBgn0000004_17.6 . + +chr2L 2565869 2566006 FBgn0063450_Tom1 . + +chr2L 2565871 2566024 FBgn0061485_rover . + +chr2L 2565920 2566024 FBgn0063917_McClintock . + +chr2L 2566158 2569026 FBgn0000004_17.6 . + +chr2L 2566674 2566848 FBgn0061485_rover . + +chr2L 2567367 2569022 FBgn0061485_rover . + +chr2L 2567598 2567816 FBgn0063917_McClintock . + +chr2L 2567665 2569027 FBgn0044355_Quasimodo . + +chr2L 2568060 2569027 FBgn0026065_Idefix . + +chr2L 2568062 2569018 FBgn0063917_McClintock . + +chr2L 2568070 2569004 FBgn0063447_accord . + +chr2L 2568121 2568988 FBgn0004082_Tirant . + +chr2L 2568137 2569006 FBgn0063432_gypsy5 . + +chr2L 2568137 2568942 FBgn0040267_Transpac . + +chr2L 2568153 2569001 FBgn0063782_accord2 . - +chr2L 2568154 2568990 FBgn0023131_ZAM . + +chr2L 2568193 2568993 FBgn0003007_opus . + +chr2L 2568251 2568697 FBgn0000006_412 . + +chr2L 2568264 2568985 FBgn0063434_gypsy3 . + +chr2L 2568264 2568985 FBgn0003490_springer . + +chr2L 2568308 2568520 FBgn0067387_gypsy10 . + +chr2L 2568308 2568517 FBgn0067384_gypsy7 . + +chr2L 2568308 2568878 FBgn0063431_gypsy6 . + +chr2L 2568308 2568703 FBgn0001167_gypsy . + +chr2L 2568313 2568828 FBgn0002697_mdg1 . + +chr2L 2568313 2568526 FBgn0000199_blood . + +chr2L 2568329 2568982 FBgnnnnnnnn_HMS-Beagle2 . + +chr2L 2568329 2568982 FBgn0001207_HMS-Beagle . + +chr2L 2568378 2568648 FBgn0063897_Stalker4 . + +chr2L 2568378 2568878 FBgn0063433_gypsy4 . + +chr2L 2568384 2568646 FBgn0063435_gypsy2 . + +chr2L 2568384 2568796 FBgn0002698_mdg3 . + +chr2L 2569006 2569756 FBgn0063917_McClintock . + +chr2L 2569007 2571200 FBgn0000004_17.6 . + +chr2L 2569010 2571603 FBgn0000005_297 . + +chr2L 2569018 2569804 FBgn0061485_rover . + +chr2L 2569064 2570806 FBgn0044355_Quasimodo . + +chr2L 2569064 2569752 FBgn0026065_Idefix . + +chr2L 2569859 2571024 FBgn0061485_rover . + +chr2L 2569987 2570809 FBgn0026065_Idefix . + +chr2L 2570511 2570703 FBgn0063917_McClintock . + +chr2L 2571048 2571200 FBgn0063917_McClintock . + +chr2L 2571264 2571483 FBgn0000004_17.6 . + +chr2L 2571466 2571603 FBgn0063450_Tom1 . + +chr2L 2571468 2571592 FBgn0061485_rover . + +chr2L 2661257 2663012 FBgn0001249_I-element . + +chr2L 2713413 2713444 FBgn0063371_transib2 . - +chr2L 2772652 2776969 FBgn0000005_297 . + +chr2L 2772727 2772946 FBgn0000004_17.6 . + +chr2L 2772929 2773066 FBgn0063450_Tom1 . + +chr2L 2772931 2773084 FBgn0061485_rover . + +chr2L 2772980 2773084 FBgn0063917_McClintock . + +chr2L 2773736 2773910 FBgn0061485_rover . + +chr2L 2774429 2776968 FBgn0061485_rover . + +chr2L 2774429 2776968 FBgn0000004_17.6 . + +chr2L 2774660 2774878 FBgn0063917_McClintock . + +chr2L 2774727 2776980 FBgn0044355_Quasimodo . + +chr2L 2775122 2776985 FBgn0026065_Idefix . + +chr2L 2775124 2776969 FBgn0063917_McClintock . + +chr2L 2775132 2776531 FBgn0063447_accord . + +chr2L 2775183 2776509 FBgn0004082_Tirant . + +chr2L 2775199 2776553 FBgn0063432_gypsy5 . + +chr2L 2775199 2776494 FBgn0040267_Transpac . + +chr2L 2775215 2776321 FBgn0063782_accord2 . - +chr2L 2775216 2776513 FBgn0023131_ZAM . + +chr2L 2775255 2776055 FBgn0003007_opus . + +chr2L 2775313 2775759 FBgn0000006_412 . + +chr2L 2775326 2776047 FBgn0063434_gypsy3 . + +chr2L 2775326 2776047 FBgn0003490_springer . + +chr2L 2775370 2775579 FBgn0067384_gypsy7 . + +chr2L 2775370 2775765 FBgn0001167_gypsy . + +chr2L 2775375 2775890 FBgn0002697_mdg1 . + +chr2L 2775375 2775588 FBgn0000199_blood . + +chr2L 2775391 2776044 FBgnnnnnnnn_HMS-Beagle2 . + +chr2L 2775391 2776044 FBgn0001207_HMS-Beagle . + +chr2L 2775429 2775767 FBgn0010302_Burdock . + +chr2L 2775440 2775710 FBgn0063897_Stalker4 . + +chr2L 2775440 2776515 FBgn0063433_gypsy4 . + +chr2L 2775442 2775582 FBgn0067387_gypsy10 . + +chr2L 2775446 2775858 FBgn0002698_mdg3 . + +chr2L 2776093 2776340 FBgn0000199_blood . + +chr2L 2776099 2776519 FBgnnnnnnnn_HMS-Beagle2 . + +chr2L 2776156 2776324 FBgn0063436_gtwin . + +chr2L 2776156 2776516 FBgn0063431_gypsy6 . + +chr2L 2776179 2776389 FBgn0003007_opus . + +chr2L 2776938 2777318 FBgn0063917_McClintock . + +chr2L 2776958 2777320 FBgn0000004_17.6 . + +chr2L 2776962 2777324 FBgn0061485_rover . + +chr2L 2776962 2777324 FBgn0000005_297 . + +chr2L 2776975 2777315 FBgn0044355_Quasimodo . + +chr2L 2777321 2779175 FBgn0000005_297 . + +chr2L 2777323 2778772 FBgn0000004_17.6 . + +chr2L 2777510 2778596 FBgn0061485_rover . + +chr2L 2777559 2778381 FBgn0026065_Idefix . + +chr2L 2777565 2778378 FBgn0044355_Quasimodo . + +chr2L 2778083 2778275 FBgn0063917_McClintock . + +chr2L 2778620 2778772 FBgn0063917_McClintock . + +chr2L 2778836 2779055 FBgn0000004_17.6 . + +chr2L 2779038 2779175 FBgn0063450_Tom1 . + +chr2L 2779040 2779164 FBgn0061485_rover . + +chr2L 2933353 2935475 FBgn0003122_pogo . - +chr2L 2945631 2945785 FBgn0000155_roo . + +chr2L 2963474 2963538 FBgn0000155_roo . +
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/test_chromosome.sites Mon Apr 25 13:08:56 2016 -0400 @@ -0,0 +1,15 @@ +chr2L 2966910 2966911 FBgn0000481_Doc Insertion +chr2L 2965226 2965227 FBgn0001167_gypsy Insertion +chr2L 2933354 2935475 FBgn0003122_pogo Excision +chr2L 2920517 2920518 FBgn0010302_Burdock Insertion +chr2L 2763540 2763541 FBgn0000199_blood Insertion +chr2L 2714436 2714437 FBgn0000349_copia Insertion +chr2L 2661258 2663012 FBgn0001249_I-element Excision +chr2L 2569343 2569344 FBgn0004141_HeT-A Insertion +chr2L 2412907 2412908 FBgn0003055_P-element Insertion +chr2L 2397941 2397942 FBgn0000155_roo Insertion +chr2L 2294097 2299243 FBgn0000349_copia Excision +chr2L 2131306 2131307 FBgn0001283_jockey Insertion +chr2L 2112168 2119772 FBgn0003007_opus Excision +chr2L 2100430 2109522 FBgn0000155_roo Excision +chr2L 2003871 2003872 FBgn0003122_pogo Insertion
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/test_concensus.fa Mon Apr 25 13:08:56 2016 -0400 @@ -0,0 +1,1143 @@ +>FBgn0010302_Burdock +AGTTAACACAATCACAAAACACCCGAAATATAGTCGTAAGCCTCAAGTGC +TTTTCCCATCTATAGATCGAGCTTTACCTATAAGAAACTGTAACTTGTTA +AGCTTTAGAGATAAGAACTCTTGCTATACTTAAGTCAGTCGATTTTGGAA +GATTAGAAGCGTCGGTCATCGCCACGTACTTACTATTCGTCTCATTAAGT +GCAGACCGCGCAAGCCTATTGTAATTAATAAACTTACGCTAATAAATATA +TGGAAAATCTACTAAAATGATAATTGGCGCCCAAACGGATATAAAAACCT +ACGATAACTGAATAATTATAAATAAATAACAAAAGGAGGATCCGGAGACA +AAACCAGCGGCTTTGGCTAATTAACTCTAACCTAAGAAATAAAAATTTGC +TGATTACATAAAATATAATATTAATTACTAAGACCATCTACCTTAAAATT +GTTTGTTAATCACTATTATTATATTGTAAGTATAACGCTTATTGAACGAA +TTAAAAATATTATTATTATTATTATATTATAACCTATGCAAAGAGTATTG +ATAATAAAAATACATGAGTGACAGTGATAACCTTTTAGACAACCTAGTGT +CAAGCTTAAATAAATGGTCAGCGCACCAGGCAAGTAGGCAAAACAGTGCA +GAAAAAAATAATAAGTCATCAGATAATTGGTGGTCAAAAACAAAGACAAC +TAGCGAAATGGAATTTGAAGCTCAGTTAAAAGCGATCGTAGAGAGTGCTG +TTGCCGGTGCGCTCGCAGTCCAAAAACAATCATTTGAAAAGCAATTGCAG +GAGATGAATGAGCGAATCGGGAAATTAACAGTGAACACCCCAGAGGTGGA +AACTTATGTAGATGCTGAAATTAGACCAGGTGTTGTCTGTAGCGAGCCTC +TAGATATACTTAAATCTCTGCCAGATTTTGATGGCAAAAGTGAAACATAT +GTGTCGTGGAGAAAAGCGGCTCATGTCGCTTTTAAAGTTTTCAAAGATTA +CGAGGGAAGTTCAACATTTTACCAAGCTCTTGGTATTATGCGAAATAAAA +TAAAAGGTCCAGCGAATACAGTATTGGCTTCTTTTAATACTCCGTTACAT +TTCAAAGCAATGATCAGCCGTCTTGATTTCACATATTCTGACAAAAGGCC +GATCTATCTAATCGAACAAGAGCTATCAACTTTGCGACAGGGAGACATGA +CTCTTACTGAATTCTACGATGAAGTCGAGAAAAAACTGACCCTACTTACC +AACAAGACAATAATGACATTTGATAGTGCCTTGGCGATGTCACTGAATGA +AAAGTACAGGACGGACGCGTTACGTGTATTTGTAACCGGAGCTAAGAAAT +CGTTGAGCGACATTCTTTTTGCAAAAGGTCCAAAAGATTTACCAACTGCT +CTCGCTTTAGCGCAAGAGGTCGAGTCGAACCATGAGCGTTACCAATTCGC +CCTTATTTATTCTAAAAATATTGGAGACAGGGGTCAGAAAATCGAACAAA +GGCACAGCGATAAGGATAGAAACTCAATCATGCCCATGCAAACTAAAAAC +CCATATTTTAGCAAGCGTCAGGTGCATACTTATGATAACCAGGAAAGACA +AGATCCAGTCCAGTTAACAAATCCTGATGTATCCATGCGATCTAGAAGAA +CTGGAAATTTTGGACAAACTCCATTTCCGACTCAGGGAAATATTTGGCCA +TCCCAACAGCAAAATTCTTGGCCATCTCAACAACAATATTCTTGGCCATC +CCAACAACAAAATTCATTTCGAACACAAAATCAATTCGCATCGCAACCCC +AACAGCAAAACACAAGTCAGGCTCAGGGACATTTTGGGTATGCGCAAGCA +TCAAAAAGACCAACGAGTGGCAGTGCAAGGTTTACAGGGCCAAAACAGCA +GAGGATCAACTACTTACCTCATGAGAAAGGTCAATGTGAGGAAGATACAG +ACGGTTATCAAAAGGAGGCAGAAGCGGAGGTTGATGATTATGAGGACGAA +CTAGTGAATTACGATCATGTTCATTTTTTAGCCACAAATCCCTGCTACCG +TACATAGAAAGAGAGATAGCAGGGAGAACCATAAAACTTTTGATTGACAC +CGGGGCTTCGAAAAATTACATACAGCCCCTCCCTGAATTAAAAAACATAA +TGCCGGTACAAAATAAATTCACGGTAAAATCGCTTCATGGTTGCAACACC +GTCAAACAGAAATGCTTTATTAAGCTATTTAACACATCTGTTCAATTCTT +TATTCTTCCAAGTCTCTCTAGTTTTGACGCAATAATAGGACTTGACCTTT +TGAAACAGGGAAATGCAACGTTAGATTTTAAGAACAAAACGTTGAATATC +AACAATGAAGTGGAATCTATTCAGTTTTTGAGATGTGACAGCGTAAATTT +CGCCAACATAGAGAATATTGTGGTTCCAAATCAGATATCTAATAAATTCC +ATACAATGCTTCGAAACCGATTGGCCGTCTTTGCGGAACCGGAAGAAGCA +CTGCCGTATAATACCAACATTGTTGCCACAATACGTACTGAGGACGACCA +ACCCATTTACTCAAAACTCTATCCGTACCCCATGGGCGTATCGGATTTTG +TGAATAAGGAGACACATGCTTTGTTAAAGGACGGAATTATCAGGCCCTCG +TCGTCACCTTACAACAATCCGGTTTGGGTAGTCGATAAAAAAGGTACAGA +TGAAGAGGGAAATACTAAGAAAAGGTTGGTTATAGATTTTAGAAAACTAA +ATTTAAAAACAATCGACGACAAGTACCCTATACCAAACGTAGTATGGATC +TTGTCAAATTTGGGAAAAGCCAGATTCTTTACAACCCTTGACCTTAAATC +GGCGTTTCACCAAATTCTGCTCGCAGAAAAGGATAGAGCGAAAACTGCCT +TTTCAGTAGGAAATGGAAAATACGAGTTTTGCCGTTTGCCGTTTGGCTTG +AAAAATGCCCCAAGTATTTTTCAACGTGCTATTGATGATGTTGTTAGGGA +CCGTATAGGAAAGTCATGTTACGTTTACGTTGACGACGTAATAATATTTT +CAAACGGAATTGAGGACCACGTAAACGACGTTGCTTGGGTACTAGACAGA +CTGTCTGGGGCAAACATGAGGGTTTCTAAAGAGAAATCGTTTTTCTTCAA +GGAAAGCGTCGAGTATCTCGGATTCATGGTGTCAAGTGGAGGTATCACAA +CCAGTCCTAGCAAAGTAGAGGCTATTCAGAAATATAATCAACCTACTAAT +CTGTTTAGTGTTCGATCGTTTTTAGGGCTAGCAAGTTATTACCGCTGCTT +TATTAAGGACTTCGCCTCTATTGCTAGACCACTCACTGACATTCTGAAGG +GTGAAAACGGAAAGGTTTCCGCAAGCCAGTCTAAAAAGATACCAATTTCT +TTCGATGAAAGACAATGTTCTGCTTTTGAGAAGCTTAAAAATGTTCTTGT +CTCCGAAAATGTAATGTTATTGTATCCCGATTATAGAAAAGCCTTTGACT +TAACAACAGACGCTTCGGCTTTTGGCCTGGGGGCAGTCTTATCACAGGAT +GGCAAGCCTGTTACAATGATTTCGAGAACTTTACAGGATAGAGAACTTAA +TTTCGCAACAAATGAACGAGAACTTTTGGCCATCGTTTGGGCTTTAAAGT +CTCTTAGGAACTATCTATATGGTGTCAAAAACTTAAACATTTTTACAGAT +CACCAGCCGTTAACATACGCCGTGTCAGATAGGAATCCAAATGCAAAAAT +CAAGAGATGGAAGGCGTTTATAGACGAACATAATGCTAAAATTTTCTATA +AACCTGGCAAGGAGACCTATGTTGCCGATGCACTATCCAGGCAGGCTATT +CATGTCCTAGAGGACGAACCCCAGTCAGACATTGCAACAATACATAGCGA +AATTTCATTGACTTTTACAATCGAAACTATCGACAAGCCGGTTAACTGTT +TTAGAAACCAAATTGTGATAGATGAGGGCACCGCAGACTCAACTCGAACT +TTTGTTATTTTCGGAAGCAAGACAAGGCATCTAATACAGTTTCTAGACAA +AGAGACCTTAATCGGAAGAATTCGTGATGTGGTTAAGCCGGATGTAGTGA +ATGCGATACACTGCGAATTACCTGTACTAGCTTTCATTCAAAACAGTCTT +GTAAATGACTTTCCAGCAACAACCTTCCGACACACTATGAAAATGGTCAG +CGACATTTTTAATCAAACTGAGCAACGGGAAATAGTGTCTTTGGAGCACA +ACAGAGCGCATAGGGCAGCACAGGAGAATGTAAAACAAATTCTTCAATAC +TACTTTTTCCCTAAAATGTCACAAATAGCCGCTACCTTTGTTTCTAACTG +CTTGGTTTGTCAAAAAGCCAAATACGACCGCCATCCGCAAAAGCAAATCC +TCGGGAGAACACCTATTCCGTCACATGTAGGCGAGACATTGCATATTGAT +ATATTTTCTACGGGCAGGAATTACTTTTTGACATGTATTGACAAATTTTC +CAAATTCGCTATTGTGCAACCAATCGGCTCTCGAACGATAACTGATTTAG +AACCTGCAATTATGCAACTAATGAACTTTTTTCCCCATTCAAAGACAATA +TTTTGTGACAATGAACCGTCCATAAATTCCGAGTCAATCAAGTCACTTTT +GAAAAATCGTTTTAATGTTGACATAGCGAACGCACCTCCACTTCATAGTA +CCTCAAACGGACAGGTTGAAAGGTTTCACAGCACGCTTTTAGAAATAGCT +CGATGCCTGAAACTTGACAGTGGAATGAATGATACAGTCAACCTTATTCT +TCAGGCAACAATAGAATACAATAAGACGGTGCACTCAGTCACCAATAGAA +GACCGATCGACATTATTCATTCAACTCCTCCCGAATTGGCTAACGAGATA +GTAGAAATGGTTAACGAAGCTCAGGAAAAACAGCTAAGAAGAGAAAATGT +AACAAGACGAGACAGAACCTTTGAGGTGGGAGAAACCGTCATGGTAAAAC +AAAACAATCGCTTGGGAAATAAACTAACCCCACGGTATAGGGAAGAACTA +ATCGAAGCAGACCTCGGGACAACGGTCCTCATAAAAGGGAGGGTCGTTCA +TAAAGATAATCTACGCTAGGTTTAGTATTTCTTTTCCTTTTGTGACCATC +GCCAAGTTAGCAAAATACAAACGTGAAATCTGAACACTAGTAAAAGAGTT +TGCAAACATTTTTCAATTAAATATTTGTCAAATCCTTCTTATTTAATCTT +TAAACATTTTGTATTATTTCCGCTTCATCCTCTTTAGAAAATTTTAAAGG +TATGTGATGAAATGCTAGACCCGAATGATTTGAAAACTTAAAGTCCACGC +AACCACAAATATTTCCTGAAACTACCATAGAAAATAAATGCATTACCAAA +ACGGCATAATAACAGTATAGCGCACTCACTCTAATTAGATTTCAAATTCC +CGATTAAAAAAAAAATAAAACACTAATGTTATCAATACCCTTTCCTGATT +CTGTTCAACTAAAATAGGAAAATCAATACTTGCAATCAATAAGCGTTTTA +CTACATACTTTAATATCAAAATATCTGAATGAACTTTATTATAAAATTAT +AATTGTTATACTTAATTATTGTCAAAACTTTAGTATTAAAACTGTAACTA +CCTCTTAAGTAGATGAGAAGAGTAGAAGAGGGAATTAAGATCTATCAACG +TAGTATCTGCTAAAGACGTAAAGATGCGGCAACTATTTCTGCGCCTGGGT +ACTGAAACGACGAACTGAATAATATCTGCCATCAGACGCCAACCAGAGTG +CGTTCAACACATACGTTTTGATGGTCAACTAGTTCAACCAACATCAGCAT +CATCGTCGTCAACAAGTCGACGGTTACAATAAAGATTTTTTCCAAGTTCG +CTACGATCATCTCCAGAACCTTGTTGCGAACCCATGACATGGAGAATCAG +CAGCATTTACGAACTTCTCGGATCATCCAGACACGCAGAGCTGCCTTCCC +TTCGATGGTTTAACGCAGTACCAGGTTGGCAGTATGGGAACTTAGTGCAC +AACCAATGTTACCCGTAAGATCCGCTTTCAAATAGATTTGCCAATTGTAA +AAAGTCTGTGGACAGCCTTCGTCTTAGAAGGGGAGGAGTTAACACAATCA +CAAAACACCCGAAATATAGTCGTAAGCCTCAAGTGCTTTTCCCATCTATA +GATCGAGCTTTACCTATAAGAAACTGTAACTTGTTAAGCTTTAGAGATAA +GAACTCTTGCTATACTTAAGTCAGTCGATTTTGGAAGATTAGAAGCGTCG +GTCATCGCCACGTACTTACTATTCGTCTCATTAAGTGCAGACCGCGCAAG +CCTATTGTAATTAATAAACTTACGCTAATAAATATATGGAAAATCTACTA +AAATGATAATT +>FBgn0000349_copia +TGTTGGAATATACTATTCAACCTACAAAAATAACGTTAAACAACACTACT +TTATATTTGATATGAATGGCCACACCTTTTATGCCATAAAACATATTGTA +AGAGAATACCACTCTTTTTATTCCTTCTTTCCTTCTTGTACGTTTTTTGC +TGTGAGTAGGTCGTGGTGCTGGTGTTGCAGTTGAAATAACTTAAAATATA +AATCATAAAACTCAAACATAAACTTGACTATTTATTTATTTATTAAGAAA +GGAAATATAAATTATAAATTACAACAGGTTATGGGCCCAGTCCATGCCTA +ATAAACAATTAAATTGTGAATTAAAGATTGTGAAAATAAATTGTGAAATA +GCATTTTTTCACATTCTTGTGAAATAGCTTTTTTTTTCACATTCTTGTGA +AATTATTTCCTTCTCAGAATTTGAGTGAAAAATGGACAAGGCTAAACGTA +ATATTAAGCCGTTTGATGGCGAGAAGTACGCGATTTGGAAATTTAGAATT +AGGGCTCTTTTAGCCGAGCAAGATGTGCTTAAAGTAGTTGATGGTTTAAT +GCCTAACGAGGTAGATGACTCCTGGAAAAAGGCAGAGCGTTGTGCAAAAA +GTACAATAATAGAGTACCTAAGCGACTCGTTTTTAAATTTCGCAACAAGC +GACATTACGGCGCGTCAGATTCTTGAGAATTTGGACGCCGTTTATGAACG +AAAAAGTTTGGCGTCGCAACTGGCGCTGCGAAAACGTTTGCTTTCTCTGA +AGCTATCGAGTGAGATGTCACTATTAAGCCATTTTCATATTTTTGACGAA +CTTATAAGTGAATTGTTGGCAGCTGGTGCAAAAATAGAAGAGATGGATAA +AATTTCTCATCTACTGATCACATTGCCTTCGTGTTACGATGGAATTATTA +CAGCGATAGAGACATTATCTGAAGAAAATTTGACATTGGCGTTTGTGAAA +AATAGATTGCTGGATCAAGAAATTAAAATTAAAAATGACCACAACGATAC +AAGCAAGAAAGTTATGAACGCGATCGTGCACAACAATAATAACACTTATA +AAAATAATTTGTTTAAAAATCGGGTAACTAAACCAAAGAAAATATTCAAG +GGAAATTCAAAGTATAAAGTCAAGTGTCACCACTGTGGCAGAGAAGGCCA +CATTAAAAAAGATTGTTTCCATTATAAAAGAATATTAAATAATAAAAATA +AAGAAAATGAAAAACAAGTTCAAACTGCAACATCACACGGCATTGCGTTT +ATGGTAAAAGAAGTGAATAATACTTCAGTGATGGACAACTGCGGGTTTGT +CCTTGATTCTGGTGCTAGTGACCATCTTATAAATGATGAGTCGCTGTATA +CCGACAGTGTGGAGGTTGTGCCTCCACTTAAGATTGCAGTGGCCAAGCAA +GGCGAATTTATTTATGCCACTAAGCGTGGTATTGTCCGACTACGGAATGA +CCATGAGATTACACTGGAGGATGTACTCTTTTGTAAGGAAGCTGCTGGTA +ATTTGATGTCCGTAAAGCGTCTCCAAGAGGCAGGAATGTCGATCGAATTT +GACAAAAGCGGTGTAACCATTTCGAAAAATGGGTTAATGGTTGTCAAAAA +TTCAGGTATGTTAAACAATGTACCTGTGATCAATTTTCAAGCATATTCTA +TAAATGCTAAGCATAAAAATAATTTTCGTTTATGGCATGAGAGGTTTGGC +CATATAAGCGATGGCAAATTATTAGAAATAAAACGAAAGAATATGTTTAG +TGATCAAAGTCTTCTAAACAACTTAGAGTTATCATGTGAAATTTGTGAAC +CCTGTTTAAATGGTAAACAGGCAAGACTTCCTTTTAAACAATTGAAAGAT +AAGACCCATATTAAAAGACCACTTTTTGTAGTACACTCAGATGTCTGTGG +GCCTATTACTCCAGTTACTTTAGATGATAAAAATTATTTTGTGATCTTTG +TTGATCAGTTTACACATTATTGTGTAACTTATTTAATTAAATATAAATCT +GATGTGTTTAGCATGTTTCAAGATTTTGTAGCCAAGAGTGAAGCTCATTT +TAATTTAAAGGTTGTGTACTTATACATTGACAATGGTAGAGAATACTTGT +CAAATGAGATGAGACAATTTTGTGTTAAGAAAGGAATTTCTTATCACTTA +ACAGTGCCACATACACCTCAGTTAAATGGTGTTTCTGAGAGAATGATAAG +AACCATTACGGAAAAAGCTCGAACCATGGTTAGTGGTGCAAAGCTAGATA +AAAGCTTTTGGGGCGAAGCAGTATTAACTGCTACTTATTTAATCAACAGA +ATTCCTAGTAGAGCACTTGTTGATAGTTCAAAGACCCCATATGAGATGTG +GCACAATAAGAAGCCATACTTAAAACATTTGAGAGTGTTTGGTGCAACTG +TTTATGTGCATATTAAAAACAAACAAGGAAAGTTTGATGATAAATCATTT +AAAAGTATTTTTGTGGGCTATGAACCCAATGGTTTTAAGTTGTGGGATGC +TGTAAATGAAAAATTTATTGTCGCAAGAGATGTTGTTGTCGATGAAACCA +ATATGGTTAATTCTAGAGCTGTTAAATTTGAAACAGTGTTCCTGAAAGAT +AGTAAGGAAAGTGAAAATAAAAATTTTCCGAATGACAGTAGGAAAATAAT +ACAAACAGAATTCCCGAATGAGAGTAAGGAATGCGACAACATACAATTCC +TGAAAGATAGTAAGGAAAGTGAAAATAAAAATTTTCCGAATGACAGTAGG +AAAATAATACAAACAGAATTCCCGAATGAGAGTAAGGAATGCGACAACAT +ACAATTCCTGAAAGATAGTAAGGAAAGTAATAAATATTTTCTGAATGAGA +GTAAGAAAAGAAAGCGAGATGATCACCTGAATGAAAGTAAGGGATCAGGC +AACCCGAATGAGAGTAGGGAAAGTGAAACAGCAGAGCACTTAAAAGAAAT +TGGAATTGATAATCCAACTAAAAATGATGGCATAGAAATTATTAATAGAA +GAAGTGAGAGATTAAAGACTAAGCCTCAGATATCCTATAATGAAGAGGAT +AATAGTCTAAATAAAGTTGTTCTAAATGCTCACACTATATTTAACGATGT +CCCAAATTCATTTGATGAAATTCAATATAGGGATGATAAATCTTCTTGGG +AAGAAGCCATCAATACAGAGTTAAATGCTCATAAAATTAATAATACTTGG +ACAATTACAAAAAGGCCTGAAAACAAAAATATTGTAGATAGCAGATGGGT +ATTTTCTGTTAAATATAATGAACTTGGAAATCCAATTAGATACAAAGCTA +GATTGGTTGCACGAGGATTCACTCAAAAATACCAAATAGACTATGAAGAG +ACATTTGCTCCTGTAGCTAGAATTTCAAGTTTCCGATTTATATTGTCATT +AGTAATACAGTATAACTTGAAAGTCCATCAAATGGATGTAAAAACAGCTT +TCTTAAATGGCACGTTAAAAGAGGAAATTTATATGAGACTTCCTCAAGGT +ATATCGTGTAATAGTGACAATGTGTGTAAATTGAATAAGGCAATTTACGG +ACTCAAGCAAGCGGCTAGATGCTGGTTTGAAGTATTTGAGCAAGCATTGA +AAGAGTGTGAGTTTGTAAACTCTTCAGTTGATCGCTGTATATATATTTTA +GACAAAGGTAACATCAATGAAAACATATATGTATTATTATATGTAGATGA +TGTGGTTATAGCTACAGGAGATATGACAAGAATGAATAACTTCAAAAGGT +ATTTAATGGAAAAGTTTAGGATGACTGACCTAAATGAAATAAAACATTTT +ATTGGAATTAGGATAGAGATGCAGGAAGATAAAATCTATTTAAGCCAATC +TGCATATGTTAAAAAAATTTTAAGTAAATTTAACATGGAAAATTGTAATG +CAGTTAGTACTCCTTTACCTAGTAAAATAAATTATGAATTACTTAATTCA +GATGAAGACTGCAATACCCCATGCCGTAGCCTCATAGGATGTTTAATGTA +CATAATGCTTTGTACACGCCCAGATTTAACTACTGCAGTAAATATCTTGA +GCAGATATAGTAGCAAAAATAACTCCGAATTATGGCAGAACTTAAAAAGA +GTTCTTAGATATTTGAAGGGCACTATCGATATGAAATTGATTTTTAAAAA +GAACTTGGCATTTGAAAATAAAATTATTGGTTATGTGGATTCTGATTGGG +CTGGTAGTGAAATTGATAGAAAAAGTACAACAGGGTATTTATTCAAAATG +TTTGATTTTAATCTCATTTGTTGGAATACAAAGAGACAGAACTCAGTAGC +AGCCTCATCAACTGAAGCTGAGTATATGGCCCTATTTGAAGCCGTGAGAG +AAGCTCTATGGCTTAAATTTTTATTAACTAGTATTAACATTAAACTAGAA +AACCCCATTAAAATTTACGAAGACAATCAAGGCTGTATTAGCATAGCAAA +CAACCCCTCATGTCATAAACGAGCTAAACATATTGATATTAAATATCATT +TTGCCAGAGAGCAAGTTCAGAATAATGTGATTTGTCTTGAGTATATTCCT +ACAGAGAATCAACTGGCTGACATATTTACAAAACCGTTGCCTGCTGCGAG +ATTTGTGGAGTTACGAGACAAATTGGGTTTGCTGCAAGACGACCAATCGA +ATGCTGAATGAAATTTTTATATATATTTTTCAAATTTAAATTCCTGTAAA +CATATTTTGTTACAATGATCTGATCGGGTTTTTCTGGGTTTTCCCCGTAT +CCTCGCAGCAAATGCTGGATCAGTTAACACTTCCCAGAATGCACACCACC +CACATTTGATAGTTACTAATGAATATTATTGTTATGTTTTTAATTATAGA +CGTTATTTTTGAGGGGGCGTGTTGGAATATACTATTCAACCTACAAAAAT +AACGTTAAACAACACTACTTTATATTTGATATGAATGGCCACACCTTTTA +TGCCATAAAACATATTGTAAGAGAATACCACTCTTTTTATTCCTTCTTTC +CTTCTTGTACGTTTTTTGCTGTGAGTAGGTCGTGGTGCTGGTGTTGCAGT +TGAAATAACTTAAAATATAAATCATAAAACTCAAACATAAACTTGACTAT +TTATTTATTATTAAGAAAGGAAAATAAATTATAAATTACAACA +>FBgn0000481_Doc +GACATTCGGCATTCCACAGTCTTCGGGTGGAGACGTGTTTCTTTCAAGCT +ACGAATAGCAAGTTCTAAAAACTACAACAGTATAGTGAAAGTTAAACACA +AAGTGTAAAGTGCAGTTTGCACAACTAACAATTATTGACTATAGTAATTA +TTTACTAAAATAAATAATTATTCCATATTGTTCTGGTAATTGTTATATGT +GGACTTAGAACAATGAATCAAAACGACATACGTTCTCAGCGACAATGTGA +ACAAGACGAGCGCCGGCTCTCTTTACAACGCAACAATGCATACTTTTCTT +TCGTCTCACCGCAAATCGGTGATCGAGCACCCTCACCTTCAACTAACTCG +AAACTTTTGCCCTCAGCGAACGACAGACCGCGTTCTTGCTCTCCCTCTCT +GCCTGCTTCGGCTCACAAGTCGTGGAGCGAAGAGACCGCCTCTCCTACCC +CGCTCCTCTCGCAGCGCCAAACGACCGTCCCGGGTAACTGTAACACTGCA +ATAACGAGTGCAGTGACCTCACTGGCAACTGCCACAACATCAACTTCGTC +AGCGGCCCAACTAATTATCGCTGTGCCAGCTGTAAATAATTCAGCAGCAC +TGACCGTTTGCAACAACAATAATGCACGTAAAGAAGAATCAAAACAAAAG +CAGAAGTCGATTTCGACTGTGCAGACTGGCATGGATCGCTACATCCAAAT +CAAGAGAAAGCTCAGCCCTCAAAACAATAAGGCAGGTAATCAACCCAAAA +TCAATCGAACCAACAACGGCAATGAAAACTCTGCAGTAAATAATTCAAAC +CGATATGCTATCTTGGCTGATTCTGCGACCGAACAACCCAACGAAAAAAC +GGTAGGGGAACCAAAAAAGACCAGGCCTCCACCAATTTTCATACGAGAAC +AAAGTACAAATGCACTTGTAAATAAACTCGTTGCTTTGATTGGTGACAGC +AAGTTCCACATTATCCCACTTAAAAAAGGAAATATTCATGAAATAAAACT +ACAGATCCAAACAGAAGCAGACCACCGTATAGTGACTAAATACCTAAATG +ATGCTGGTAAAAACTACTACACATACCAATTAAAAAGTTGCAAAGGGCTA +CAGGTAGTACTTAAGGGCATTGAAGCAACAGTGACACCAGCTGAGATAAT +TGAGGCTCTGAAGGCCAAAAACTTTTCTGCAAAGACAGCTATTAATATTT +TAAACAAAGACAAAGTTCCGCAGCCACTATTCAAAATAGAACTCGAACCA +GAGCTCCAGGCACTAAAGAAAAACGAAGTGCACCCAATATACAATTTACA +GTACTTGCTACATCGGAGGATCACCGTGGAGGAGCCGCACAAACGTATCA +ATCCAGTTCAATGTACTAATTGCCAAGAATACGGCCACACCAAGGCATAC +TGCACCCTTAAGTCCGTATGTGTTGTCTGTAGCGAACCTCATACTACCGC +AAACTGCCCCAAAAACAAGGACGATAAGTCTGTGAAGAAATGCAGTAACT +GCGGGGAAAAACATACTGCAAACTACAGAGGCTGTGTGGTGTACAAAGAA +TTGAAGAGCCGCCTAAACAAACGTATTGCCACAGCACATACATACAACAA +AGTCAATTTCTACTCTCCGCAACCGATTTTTCAACCACCCCTAACTGTCC +CAAGCACTACTCCAACAATTTCTTTCGCTAGCGCCCTAAAATCCGGACTA +GAAGTGCCCGCCCCACCGACAAGAACTGCTCATTCCGAACATACACCGAC +AAACATCCAACAAACACAACAAAGTGGCATCGAAGCTATGATGCTATCCC +TACAGCAAAGCATGAAAGACTTCATGACGTTCATGCAAAATACTTTGCAA +GAGCTCATGAAAAACCAAAATATCCTGATTCAACTTCTTGTATCTTCAAA +ATCCCCATAATGGCTTCCCTACGGATATCTCTGTGGAACGCAAATGGCGT +TTCACGGCATACACAAGAGCTCACACAGTTCATTTACGAAAAAAACATCG +ACGTAATGCTACTATCAGAAACGCACCTCACAAATAAAAACAATTTTCAT +ATACCAGGATACTTGTTCTATGGTACAAATCATCCAGATGGTAAAGCTCA +TGGAGGCACTGGAATACTCATCAGAAATCGCATAAAACACCACCACTTAA +ACAATTTTGACAAAAACTACTTACAATCTACGTCCATAGCCTTACAACTC +AACAATGGTTCAACGACTCTAGCCGCAGTCTACTGCCCACCGCGCTTTCC +AATCTCTGAGGATCAATTCATGGAATTCTTTAACACACTAGGTGACAGGT +TCATCGCAGCGGGTGACTATAACGCCAAGCACACCCATTGGGGATCTCGA +CTTGTGTCGCCAAAGGGTAAGCAATTGTACAATGCGCTTACGAAGCCAGA +AAACAAGCTAGACTATGTATCCCCGGGTAAGCCTACATACTGGCCAGCAG +ACCCAAGAAAAATCCCAGACCTGATCGATTTTGCAATTACTAAACATGTC +CCCCGCAACATGGTCACCGCCGAAGCACTAGCAGATTTATCATCAGATCA +CTCACCTGTTTTTCTAAATATGCTAACTCGCCCCCACATCGTCGACCCAC +CGTATAGACTCACAAATTTTAGAACAAACTGGCCAAGGTATCAAAAGTAT +GTCTGTTCACACATAGAACTAACGACGGCATTATCTACAAAGGAGGATAT +AGACAAGTCAACGGAAACTCTTGAAAACATTTTAGTTTCGGCTGCAAAGG +CTTCAACCCCGCCAGTGACGTATGCAAAACCAAACTACATCAAAACTAAT +CGCGAAATCGAGCGGCTGGTATTAGATAAACGACGCCTACGAAGGGATTG +GCAGTCTAATAGATCACCAATTACTAAGCACATGCTTAAGATAGCCACAC +GCAGGCTTACCAATGCTCTCAAACAAGAGGAAAAAAACAGCCAACGTTCA +TATATCGAGCAACTCTCTCCCACCAGCACTAAGTACCCTCTTTGGAGAGC +TCACAGAAACCTAAAGACTCCAATAGCGCCAATTATGCCACTCCGAAGTC +CCTCTGGCACCTGGTTTCGAAGTGATGAAGAAAGAGCCAGTGCTTTCGCT +GACCATTTACAAAATGTATTCCGACCAAATCCCTCTACCAACACATTTAT +TCTCCCTCCTTTAATAGCAGCCAATCTAGATCCTCAAGAACCCTTTGAAT +TCCGACCATGTGAACTAGCAAAGGTTATCAAAGAGCAACTGAACCCAAGA +AAATCGCCTGGCTACGACCTAATAACTCCAAGAATGCTCATTGAACTCCC +AAAGTGTGCTATTCTTCACATCTGCCTGTTGTTCAACGCAATCGCCAAGC +TTGGATACTTCCCTCAAAAATGGAAAAAGTCGACCATAGTAATGATTCCA +AAGCCAGGAAAAGATAAAACGCAGCCATCATCATATAGACCGATAAGCTT +ACTAACATGTCTTTCAAAGCTGTTTGAAAAAATGCTACTCCTTCGGATTA +GCCCTCATCTTAGAATAAACAACACACTTCCAACACATCAATTTGGCTTT +AGAGAAAAACATGGAACCATCGAACAGGTCAACCGAATCACGTCAGAAAT +TCGTACTGCTTTTGAACATCGAGAATACTGCACAGCCATTTTTCTAGACG +TCGCGCAGGCATTTGACAGAGTGTGGCTCGATGGACTTTTGTTTAAAATA +ATCAAGCTGTTGCCCCAAAACACACATAAGCTACTGAAGTCATACCTATA +TAACAGAGTGTTTGCAATAAGATGCGATACAAGCACTTCACGCGATTGCG +CAATCGAAGCTGGAGTGCCGCAAGGCAGTGTACTGGGTCCAATCTTATAC +ACCCTGTATACGGCGGATTTCCCCATAGACTACAATCTAACAACCTCCAC +GTTCGCTGATGATACCGCGATACTCAGTCGCTCGAAATGCCCAATAAAAG +CCACGGCACTCCTATCCCGACACTTAACATCTGTAGAACGATGGCTTGCC +GACTGGAGAATTTCAATAAATGTTCAAAAATGCAAGCAGGTTACCTTTAC +CTTAAACAAACAAACATGCCCACCACTGGTCTTGAATAACATATGCATTC +CACAAGCCGACGAGGTAACATATCTGGGAGTTCATCTGGACAGGCGGCTC +ACTTGGCGCAAACATATAGAAGCCAAATCGAAACATCTTAAACTTAAAGC +AAGGAACCTCCACTGGCTCATAAATGCTCGCTCTCCACTTAGTCTGGAGT +TCAAAGCTCTTCTATACAACTCCGTCTTAAAACCTATCTGGACTTATGGC +TCCGAGCTGTGGGGCAACGCATCCAGAAGTAACATAGACATTATTCAGCG +AGCACAGTCAAGAATTCTGAGAATTATCACTGGAGCGCCGTGGTACCTTC +GAAACGAAAACATACACAGAGACCTAAAAATCAAATTAGTAATCGAAGTA +ATAGCTGAGAAAAAAACGAAGTATAACGAAAAGCTGACCACCCATACAAA +TCCCCTCGCAAGAAAACTAATCCGAGTATGCAGTCAAAGCCGGCTGCACC +GCAACGACCTCCCAGCCCAGCAATAAACTTATTAGGGCATTAATGAAAAA +AAAAAACTATCACTAAGTGAAAGTTAATTAAGTTAGATTAAGATTTGAAC +ACTTATTGTTAGTCTCTTAACACAAAGGGAAGATTCAATAAATAATAAAA +ATTAAAAAAAAAAAAAAAAAAAAAA +>FBgn0001167_gypsy +AGTTAACAACTAACAATGTATTGCTTCGTAGCAACTAAGTAGCTTTGTAT +GAACAATGCTGACGCGCCAGAATTGGGTTCAACGCTCCACGCGAAGAATG +CCTGGCAGCGGAAAGCTGACACTTCCTACCGGGAGTGTTGCTTCACGCTG +CAAGAAATGCTGAGTCGGCTTGCCGACTTGTGGCGGCGCGATGCATTGCT +CGAGGGTAAACTTAGTTTTCAATATTGTCTTCTACTCAGTTCAAATCTTG +TGTCGAAATAAACCACAGCTTGCTCCGGCTCATTGCCGTTAAACATCATT +GTTCTTATTTACAATCAAATCGCTATCGCCACAAGGCTAGTGATAATAAC +TAAGGGGGCGAAGTCAAGCCCTCCAACCTAATCTCCATAAACAGTGTCTA +AGACGAACCTCAGCGAAAGAAGGAAGATCTCTAGACCTACTGGAAATAAC +ATAACTCTGGACCTATTGGAACTTATATAATTGGCGCCCAACCAACAATC +TGAACCCACCAATCTAATTTAACACACTTTGTCAGGCGACAAACAGGGTA +GTTAAGTTAGAAAAGCATGTAAGTTTTACAAGACACTTCTTTGACGCAAT +CAAGAAATTTACGAGTGAAAAAAAAAAAAAAAAAAAGTTGTGTATCTGGC +CACGTAATAAGTGTGCGTTGAATTTATTCGCAAAAACATTGCATATTTTC +GGCAAAGTAAAATTTTGTTGCATACCTTATCAAAAAATAAGTGCTGCATA +CTTTTTAGAGAAACCAAATAATTTTTTATTGCATACCCGTTTTTAATAAA +ATACATTGCATACCCTCTTTTAATAAAAAATATTGCATACTTTGACGAAA +CAAATTTTCGTTGCATACCCAATAAAAGATTATTATATTGCATACCCGTT +TTTAATAAAATACATTGCATACCCTCTTTTAATAAAAAATATTGCATACG +TTGACGAAACAAATTTTCGTTGCATACCCAATAAAAGATTATTATATTGC +ATACCTTTTCTTGCCATACCATTTAGCCGATCAATTGTGCTCGGCAACAG +TATATTTGTGGTGTGCCAACCAACAACCAATGAGTTGGGCACATAACTAC +AGAAAGGTTAAGGTCGAATACGAAAGCGAGGATAGCTGGGAGGAGGAGCA +AGTAGGCCAAGCATTAGGTCGGCCGTTAGATAGTGCCACGGTAGATATTA +CCATGGACCCCAATCAGATTCAAGCTCTTATCGACAATGCTGTCAGACAG +GCATTGTCGCAACAGCAATCCCAATTTCAGACACAACTCAATTCCCTAGC +TGCGCGGGTACAGAGTTTGCAGGTGGAAGCACCGCAAATCAAGATTTACG +AAAAAGTCTCTGTTAACCCCGATGTTAGGTGCGACATTCCCCTTGACATA +ATAAAGTCTGTACCAGAGTTCTCCGGTACCCAAGACGAGTATGTGGCCTG +GAGACAATCGGCCATATACGCCTACGAGCTCTTCAAACCATACAATGGCA +GCAGTGCCCATTATCAGGCTGTTGCCATATTAAGGAATAAAATCCGTGGC +GCAGCCGGGGCTTTACTGGTCTCCCACAATACGGTATTGAACTTCGATGC +TATTTTGGCCAGACTAGACTGCACGTACTCGGACAAAACATCCTTACGCC +TGTTGAGGCAAGGATTGGAAATGGTTAGGCAAGGAGACCTACCACTAATG +CAATACTACGATGAAGTTGAAAAGAAGCTAACGCTTGTCACTAACAAAAT +CGTAATGACGCATGAACAAGAGGGTGCTGACCTGCTTAACGCTGAGGTCA +GAGCCGACGCCCTGCATGCTTTTATTTCGGGGCTCAAAAAGGCCCTCAGA +GCTGTGGTCTTCCCGGCCCAACCAAAAGACCTGCCATCTGCACTGGCTTT +AGCTAGAGAAGCAGAGGCAAGCATAGAGAGAAGCATGTTCGCTAACTCCT +ACGCCAAGGCCGTAGAGGAGCGAGCGCATTCGGGGGCAAACGGCAAGAGC +CGTTTCCAGGGGAAGCCAAATAAAGAAGAACAGGGACAGGACAGGAATCC +CCACTTCACCAAACGCCCCAAAAATAACGGACAAACCAACAAGGACACTC +AGGCGCAAGCACCCCAGCCAATGGAGGTCGATTCATCCTCCAGGTTTAGG +CAGCGTACTGAACATTATCAGAATCATCCTAACGAGTCGAACGCGTTTAA +GAGGAGAAATTCCTCAGAACGCTCAACAGGACCGAGACGACAACGTCTGA +ATAACGTTGTCCAAGAGGCCCCTAAACAAAAGGACCCCAAAGAAGAGTAT +GAAAAAACAGCAAAGGCTGCAGTCGAGGAAATCGACAGCGAAAATGAGTA +CGCTCCCAGTGACGACTCGTTGAATTTTTTAGGGGGCGCTCCCGGTTGCC +GTTCATTGAACGACGGCTGGCTGGGAGAACCTTAAAGATGCTAATCGATA +CCGACGCGGCAAAAAACTACATTAGGCCCGTAAAGGAGCTGAAAAATGTA +ATGCCGGTCGCCAGCCCTTTCTCGGTGAGCTCAATACACGGCTCCACCGA +AATCAAACACAAATGCTTGATGAAAGTCTTCAAGCACATCTCCCCATTTT +TTCTTTTGGATTCTCTCAATGCGTTCGACGCTATCATAGGCTTGGACCTG +TTAACACAGGCCGGGGTAAAACTCAACCTTGCAGAGGACTCCTTAGAATA +CCAGGGCATCGCTGAAAAGCTTCATTATTTCAGCTGCCCCAGTGTAAATT +TCACTGATGTAAACGATATTGTTGTACCTGACTCCGTTAAAAAGGAGTTC +AAGGACACAATAATAAGGAGGAAGAAAGCTTTCTCCACAACAAATGAAGC +TCTTCCTTTTAACACCGCTGTCACTGCCACAATTCGGACAGTTGACAATG +AACCGGTGTACTCAAGAGCGTACCCAACTCTTATGGGTGTCTCCGACTTT +GTGAACAACGAGGTCAAACAACTGCTGAAAGACGGCATTATCAGGCCCTC +AAGGTCTCCCTATAACAGCCCGACCTGGGTTGTTGACAAAAAGGGGACCG +ACGCCTTCGGGAACCCAAACAAGAGGTTGGTCATTGACTTCAGGAAGCTA +AATGAGAAAACTATTCCTGACCGGTACCCGATGCCTAGCATTCCCATGAT +TCTAGCGAATCTGGGCAAGGCAAAGTTCTTCACTACCCTTGATCTTAAGT +CAGGGTATCATCAAATTTACCTCGCGGAACACGACCGCGAGAAGACATCG +TTCTCGGTGAATGGTGGTAAATACGAGTTTTGCCGTCTACCGTTCGGCTT +GAGAAATGCAAGCAGCATTTTTCAAAGAGCCCTAGACGATGTGCTTAGAG +AGCAAATCGGGAAGATATGTTACGTCTATGTAGATGACGTCATAATTTTC +TCTGAAAACGAGTCCGACCATGTCCGCCACATCGATACAGTACTAAAATG +CCTGATCGATGCCAACATGAGAGTAAGCCAGGAGAAAACTAGATTCTTTA +AAGAGAGTGTAGAATACCTCGGCTTTATTGTCAGTAAGGACGGAACTAAA +TCCGATCCAGAGAAGGTGAAGGCCATTCAGGAGTACCCTGAACCAGACTG +CGTTTACAAGGTTAGGTCCTTCCTTGGTTTAGCCAGCTACTACAGAGTCT +TCATCAAAGACTTTGCTGCCATAGCCCGCCCGATCACCGATATCCTAAAA +GGGGAAAATGGTTCGGTGAGCAAACACATGTCTAAAAAAATTCCTGTTGA +GTTTAATGAAACTCAACGCAACGCGTTCCAAAGACTGCGAAACATACTAG +CATCCGAGGATGTCATACTCAAATACCCCGACTTTAAAAAGCCTTTTGAC +CTTACTACAGATGCTTCGGCAAGTGGTATCGGTGCAGTCCTATCCCAGGA +GGGCAGGCCAATCACCATGATATCGCGTACCCTTAAACAGCCCGAGCAGA +ACTACGCCACAAACGAAAGGGAATTGCTGGCGATTGTATGGGCCCTAGGT +AAGTTGCAGAACTTCCTGTATGGCTCTAGGGAGATTAATATATTTACCGA +CCATCAACCCCTCACTTTCGCTGTTGCCGACAGGAACACGAATGCCAAGA +TAAAGAGGTGGAAATCTTACATAGACCAGCATAATGCCAAGGTTTTCTAC +AAACCTGGCAAAGAAAATTTCGTGGCAGACGCCCTCTCTAGGCAGAATCT +GAATGCCTTACAAAACGAACCCCAATCAGACGCTGCGACCATTCACAGTG +AGCTCTCCCTGACCTACACGGTCGAGACAACAGACAAACCGTTAAATTGC +TTCAGGAACCAGATCATTCTGGAGGCAGCACGTTTTCCGCTCAAACGAAA +CCTGGTGCTCTTTCGAAGCAAATCTCGCCACTTAATCAGCTTTACTGATA +AAAGTTGGCTATTAAAAACACTTAAGGAGGTGGTAAACCCTGACGTCGTG +AACGCTATTCACTGCGACCTGCCCACTCTGGCAAGCTTCCAACACGACCT +CATTGCCCACTTTCCAGCCACCCAATTTCGTCACTGTAAGAATGTCGTGT +TAGACATAACCGACAAAAACGAACAGATCGAAATCGTCACTGCCGAGCAC +AACCGCGCTCACAGAGCCGCACAAGAAAACATTAAACAAGTCCTTCGGGA +TTATTACTTTCCCAAAATGGGCAGTTTAGCTAAAGAAGTAGTAGCTAATT +GTAGGGTCTGCACCCAAGCAAAGTATGACAGGCACCCGAAAAAGCAAGAG +CTCGGGGAAACGCCCATACCCAGCTATACAGGTGAGATGGTGCATATTGA +CATATTCTCAACCGACAGGAAGCTATTCCTGACGTGTATTGACAAATTTT +CTAAATATGCAATAGTGCAACCAGTGGTGTCTAGAACAATAGTGGACATC +ACAGCACCCCTGTTGCAGATCATTAACCTGTTCCCCAATATCAAAACGGT +CTATTGTGACAATGAGCCCGCATTTAACTCAGAAACTGTCACCTCAATGC +TCAAGAACAGCTTCGGCATTGACATAGTAAATGCGCCCCCACTCCACAGC +TCATCCAATGGCCAAGTTGAACGGTTCCACAGCACATTGGCAGAAATCGC +CAGGTGCCTGAAGTTGGACAAAAAAACGAATGACACAGTAGAACTAATCT +TGAGGGCGACGATAGAATATAACAAAACCGTGCACTCAGTTACTCGTGAG +AGACCAATTGAGGTGGTTCACCCAGGGGCCCACGAGCGCTGCCTAGAAAT +CAAGGCAAGATTAGTAAAGGCTCAGCAAGACAGCATCGGAAGAAACAACC +CTTCCCGACAAAACCGCGTGTTTGAGGTGGGAGAACGCGTGTTTGTAAAA +AACAACAAGAGGTTAGGAAATAAGCTAACTCCACTATGCACCGAGCAAAA +AGTGCAGGCAGACTTGGGAACGTCTGTTCTTATTAAGGGGAGGGTGGTCC +ACAAGGACAACCTCAAGTAGACATTCCCTCTACAGTTAGGTAGTAAGTTA +TGTCAAGGAAAATCCGAGCACTGTAGTATCACCTTGTCTTTAATTTCCAG +GTTCACCCTCATGATGTTCATACCCTTGGTAGTAGCGAATGCTCGGATCA +CCGACTTTTCGCATGCCAACTACATTCCTGTGTTAGATGGGGATGTGCTG +GTGTTTGAACAGCGTGACCTCTTGAAACATTCGAGTAACCTTTCCGAGTA +CGCTAGTATGATAGATGAAACACAGAAACTGTCCGAGTCCTTTCCCCACT +CACATATGCGTAAGTTGCTAGAGGTCGATACTGACCATCTTAGAACCTTG +TTGTCCGTTCTCAAAGTCCACCATAGGATAGCTAGGAGTCTAGATTTCTT +AGGTACAGCCTTAAAGGTTGTGGCGGGTACTCCCGATGCCACGGACCTCT +TTAAAATTAAGATCACAGAGGCCCAACTAGTAGAATCTAATTCCAGGCAG +ATAGCTATAAACTCCGAAACCCAGAAACAGATAAATAAGTTAACTGACAC +CATCAATAAGGTGATCAATGCCCGTAAAGGCGACTTGGTTGACACTCCAC +ACTTATATGAAGCACTACTAGCAAGAAATAGGATGCTGTCTACAGAAATT +CAAAATTTAATTCTCACTATTACTTTGGTCAAATCAAACATTATAAATCC +CACAATTCTTGATCATGCCGACTTGAAGCCTCTTGTAGAACAGGATACCC +CAATTGTCAGCTTAATAGAAGCATCTAAGATCAGGGTCCTCCAGTCCGAG +AATAGCATTCATATTTTAATTGCCTATCCTAGAGTCAAGTTCAGTTGCAA +GAAAGTCGCCGTCTACCCTGTATCTCACCAACACACCATCTTGCGCCTCG +ACGAAGACACTTTGGCCGAATGCGAACATGACACCTTTGCGGTCACCGGA +TGCACAGACACCACACACTTCACGTTCTGCGAGCGGTCTCGGCGCGAAAC +TTGCGTGCGCTCACTCCATGCTGGAAACGCTGCTCAATGCCACACTCAAC +CCAGCCACTTGCGAGAAATAAACCCCGTAGATGATGGCGTTGTGATTATC +AACGAAGCCGCAGCTCACGTTAGCACTGATGGCAGCCCCGAAACACTGAT +AGAGGGAACCTACCTGGTAACCTTCGAGCGAACGGCAACCATCAACGGCT +CTGAATTCGTAAATCTAAGGAAAACACTAAGCAAGCAGCCAGGCATCGTG +CGTTCACCACTACTTAACATCGTCGGCCACGACCCTGTGCTCAGTATACC +TCTGCTACACCGGATGAGTAACGAAAACCTACATTCCATCCAAAACCTTA +TGGATGACGTGGAATCTGAAGGCTCGCCCAGACTCTGGTTCGTGGCTGGT +GTGGTCCTAAACTTCGGCTTGATTGGCTCTCTCGCCCTTTATCTGGCATT +AAGGAGAAGACGAGCCTCTAGGGAGATACAGCGCACCATCGATACTTTCA +ACATGACCGAGGACGGTCATAAACTTGAGGGGGGAGTAGTTAACAACTAA +CAATGTATTGCTTCGTAGCAACTAAGTAGCTTTGTATGAACAATGCTGAC +GCGCCAGAATTGGGTTCAACGCTCCACGCGAAGAATGCCTGGCAGCGGAA +AGCTGACACTTCCTACCGGGAGTGTTGCTTCACGCTGCAAGAAATGCTGA +GTCGGCTTGCCGACTTGTGGCGGCGCGATGCATTGCTCGAGGGTAAACTT +AGTTTTCAATATTGTCTTCTACTCAGTTCAAATCTTGTGTCGAAATAAAC +CACAGCTTGCTCCGGCTCATTGCCGTTAAACATCATTGTTCTTATTTACA +ATCAAATCGCTATCGCCACAAGGCTAGTGATAATAACTAAGGGGGCGAAG +TCAAGCCCTCCAACCTAATCTCCATAAACAGTGTCTAAGACGAACCTCAG +CGAAAGAAGGAAGATCTCTAGACCTACTGGAAATAACATAACTCTGGACC +TATTGGAACTTATATAATT +>FBgn0004141_HeT-A +TAAATAAATAAAATAAATTAAACAATTAACTAAATAATTAAATAACTAAA +ATTAATAATATAATCCGTTCGCTTGCCAAAGACTCTCACGCGCATAACTA +ATTAAAATCGATTTTCAAGTTGACAAATAAATGGTTTAAAATTGTCCTCA +GGCTGCAAAGAAAAGCCGCGGCAACAATAAACATTTAGTGACACGCGAAA +AGCGAACATTTGATTAGTGTAATACTTGTGCAAACCGACAAGCTGCCGCC +ATAACAAAACGGAGACGAAGAATCATAAAGAACAAAAGCTAAATCCACCA +GCATAGCAAAAATAAATTAACAAATAAAATAAAAGCAAATTTAAATAACA +TAATAAATTAAACTTATTTAATAAACCAATTAATTTTAATTAATTCAATT +AAACGCTAAATCTACATAATACTCCACGCGCAAATTAATTGAAATCGTCT +TTCTAGTTAATAAATTAAAAGTTTAAAAATTGTCTCCGGCCGCAAAATTT +GAACCGCGACGATAAAAACATTTAATTGACAAACAAAAAGCGAACAATTA +TTCAGTGAACTATTTGTGCAAAATTGACAAGCAGACGCCATAATTAAAAG +GAGAAGAAGCCAAAAGACGAAGAGAAGAAAGCAACCAGAAGAACTCAAAG +AAGAAAAGGAGGAAAGCCCAATTAAAGAAAGCCAGGGTATTTATACCTTA +CACTTATCGTTTAATATAACAAAAACCCAACATGTCCATGTCCGACAACC +TTTTTTCTGACGATGAGGTACTTTCAATTTCCTCAAGCCCAGAACAGCGA +TCTTCTCCGTTCTACCTCAATATATCGCCCATGTCCCACGGATCAGACAA +TTCTCAGATTAATACAGTCATCATTAATTCGAAGAAATTGCCCTCAAATC +AAGCAGACATAAGTTTAAAAAACTCTTCTGGGGCTGCTATAAAAATTGTT +AATTCCCTTTCACACAAGAAGAAAGAGAACACAAACGTTAATAATGCCCA +AAAAGACCCCCTCTCACTCACCAATACTACTGCAAGCACTTGTGGCGCCA +AAAGCAGCATCTCAGAGGGGAAATTGTCTTCTCCTCCGTCCACCTCACAC +ACATATGAGGGGAAATTACTCACAAAACTTACTCACACACACACAGACTT +TAGAGGCGCCAAAACGAGCGATGCAATGGGAAGTTTCCCCTCTCTCTCGC +ACAGCGACAATAGCATAGAGAAAAATCTGAGTTCTTCCACCAAAATTGGA +CCAAACGCTTCTTCCCCTCCTTCTCATGCACACACTCACACTAGCAAATC +CACTGATATAAGCTTAGAAAGCCGCTCAAAACATCCCGCGCTTGCCAATA +CGGACGCACGCTCTATAAAAGCCAATGCTAATGACAATGGGGAAATTTTC +TCCTCACTTATACAAATTGACGAACGCAAGCAAGAGGAAAGGCCTTGCAC +AACTATCAACGCTTTTTGGTCTATTTTTAAACCCAAGCCGGACGTTACTA +AACTAAGTCTAAAGAGGAAACCCACCAATCCCACTAAAAACACTGGGAAA +AAATGCATCTCCCCTCATAAAAAGAGCGCTTATTTATGCCCTTCCGCTCA +GGATGATTTAAATTTAAATTTAAACCCCAAATCTAGCGCCAAGCCCACTG +TGGTGAATTTACCAGCTGCCCGCATCCTAAGCCGGCCTGCAGCCAAGCGG +GATTTATTTAAATCATCATCCTCCCGAAGCCCAGACGAGCAGCCTATGAG +TTTTTCGGAAGTGGTCGCTGGCACGGGTTCAATTTTTGCGGCACCCTGTG +TCCCGGCACCTTTAACGAAAACTCCAGGCAAGCGGACAAACGACGATCTG +GACTGCTCCAACTTTAAGACGCCCAATAAAAAATTATGCGCGACTTCCAA +CTTTGTAACTCCCAGCATTTTTCCGCCGCTCATCACTCCCGTTTTCAAGA +GCAAGGCAGCTCAATCTGTTTACGAGGAATCCAAAGCCAGAAATGGACCC +CCCCCGCCGGCCCTCGCCTGCAGCATCAATGCCTCTGCTCGCAGCGCAGC +GGCGCCACCCGGGATCGCCCCCCTACCCCCTCATAATACAGATGCAGAGC +TGCCTCCATGGAAAATCGTGCCCCAGAGCCGTAGAGCACCTCCTATACTC +GTCAATGATGTAAAGGAAATTGTACCTCTACTGGAAAAGCTGAACTACAC +AGCAGGAGTCTCCAGCTATACTACTAGGGCTATAGAAGGAAACGGGGTCA +GGATACAGGCAAAGGACATGACCGCCTATAACAAAATTAAAGAAGTCCTG +GTGGCCAACGGACTTCCTTTATTCACCAACCAGCCCAAGTCCGAGAGAGG +CTTCCGAGTCATCATCAGACATCTCCACCACTCCACACCATGCTCGTGGA +TAGTCGAGGAACTGCTGAAGCTCGGATTCCAAGCGCGATTCGTCAGAAAT +ATGACGAATCCGGCTACAGGTGGCCCCATGCGAATGTTTGAAGTGGAGAT +CGTCATGGCCAAAGACGGCAGTCATGACAAAATACTCTCACTCAAACAAA +TCGGTGGGCAAAGGGTGGACATTGAAAGGAAAAACAGGACACGGGAGCCA +GTCCAGTGCTACAGATGCCAAGGCTTCAGGCATGCCAAAAACTCTTGCAT +GAGGCCGCCAAGATGCATGAAATGCGCTGGCGAACACCTGTCTTCCTGTT +GCACCAAACCAAGAACCACCCCCGCCACCTGCGTAAATTGCTCTGGGCAG +CATATTAGCGCGTACAAAGGATGCCCTGCATATAAGGCGGAAAAACAAAA +GCTGGCGGCAAACAACGTTGACATAAACAAAATAAGAACAATCAAAGACG +CAACAAATAACTTTTATAAACGTCAAGGCCCCCCTCTACGCAACAACACC +CCTCGGCTACCGCACAGCTCAGCAATCCTGAGCAAATCAATTGCCGAAGC +TCGCCAGGAGGCAGCCAGAAAGTCGATGTTAAATCCATTCCGACAAAATA +TAAACGACAGAAGACCACGATTCTCCTCCCACGACACGGCCATTCAGAAG +CGTCTGAATAAATGGCGCCGAAACACCAACAAAATACCCAAAAAGGGTAG +GATAGCCTTAAAGGATAATGCAAAGCCACGACCGGCACATAGGACAAGTA +ACCCAGCGCAAAGACATCTGGAGGACTACCAGGACATGCTCCGAAGGGAA +AGGAGTGAAGAAAACGACCAGGAATCTGAGAAGGGCACCCCCAATACCAA +GCAGGTCGGCAATGACAGCCCTCCGACCACGAGCAGAGCAGCCAGAGCCA +GCTTTAAGCCAAGAATCATTGACGATACCACGCCATCGCCAAAAATCTGC +AATCCCAACTCACAAAAAGGCCTCTTGGACGACCCCACAACAAGCTTAGC +TAATAGAGTCGACAATTTAGAAAAGAAAATTGACATTTTAATGGCCTTAA +TCATACAAGGAAGAAATAACAATCTTGACATGGATACATCCAATTAATCT +TACAACTACTTATATATTCTTTAATAAATATATCCAATAGAAAAGCGCAC +GTCGGTCTGCTTTTAAAATCCTTCACCGTCATCACCTTCCTCGACGGAGC +CTAATTTATTGGAAAAATAAATCAATTATATGTTGGCACAAAAATGTAAA +CACACACTCACCTAAACGCACCCGGACGAACAAGCCTATGACAACGCACT +CCAGCTGATCTGTAAGAAACAAAAAATATGAATAGATAGATCGATATGAA +AAGGATATGTGCGGCAGAAACATGATGAGCAAAAGGCGACTCGCTGCAGC +AACTTATGCACAACGTCACTTACCTGAAATTTCTTGCCGTACGATCTCCT +GTAGTATCCCTTATCACAGCTGCAATCTACTTGCAATGCTGCACTGCAAT +AAACGTACTACAAAAGCTGCATACGTTTTGATCAGGACACCTCGTGCGGA +CGTGCTAAAAAAAATTTCCTTTCTGCTGCTCTTATTGACGCTAAAACCTT +AAAACCTACAAACAAAACAATTAAATAATAACAAATCAAATAAGACAACC +AAATAATACACTTACCTCATTGACTGCAGCTAAATCGCTGACCCACATTC +AGTGCAGCCGACAGCAGGAGACGGGCCCGCAAAAGCAAAACAAAATCGCC +AATTTTGCGATTATAAACACGAAAAATTGACAATTTTGCGATGCCGTCTC +CGCCTCCTGATGCCACTGCATTGACAAGCATCACTAGCGAGGAGCTGACA +CCACACCAAAAAGCTGTAAAATCCGTCCACAAATTGTATATTTTGCCTCA +GTGTCGTATCTGCAATGTTTTTCCGATAACCTGTAAGGAAAGAAAAATTA +ATAAGAAAATTATACAAAATTAATTAAGGACGACAGAAAATAGCAAACCA +GACAGGCAAATTAACAGATACAAATATGAGACTCCATCCTGCTGCCGACA +CACAAGTAAATCCTTCAACTCGACAACAGGAGACGGGCCTTGCAAAAGCA +AAACAAAATCGCCAACTTTTGCGATTATAAATACAAAAAATTGACAATTT +TGCAACGCCGTCTCCACCTCCTGTTGCCACTGCATTAATAAGGATCACCA +GCGCGGCGTGACGCCACACTAAAAGGCTGCAAAATCCGTCCACAAAATGT +ATACTTTTCCTCAGTACAATACTTTCTAATGAACTTCCGCCAACCTGCAA +TGAAAAGAAAAGAAATAGGTATATAAAACAAAACAAACAAAAGGACAACC +TAAAATTAGCAAACCAGACAGGCATACTAGTAGATGCTAATATGCAGCTC +CATCCTACTGACGACAACCACGCAACTCCTTTCTCCAAGACCGCAAATAC +TGAAACAAGGAAGCACAAGCTAATACTGGGAATTATTTATTTAAACAAAA +ATACTTATCTAATTGCCAATTCGACGACTCCAAATCCGCGGCTAACCGGC +GGCGATGGCCCATAAATAAAGGGCCTCCTAATTAATTACAAAATGTACCT +GAAAAACATAAAATTAACGCAACTATAATTAACGCAATTAATAAATCAAA +TAAATACAAGTATAATACTTACCTCCAAGCAAACGTACCTGAAAAACAAA +ACCAAAAAAAAAATTAATGCAATAAATAAATCAAATAAATACAAACATAA +TACTTACCTCCAATTTACCTCCCAGCCAATCTACCTGAAAAACATAATCT +AATACAATCTCAAAAACAAATAACAAATGTAATACTTACCAAATTTTAAT +TTTGTATTCATTTCCATGACCCCAACGCTGCAACTGTCCTCGGCAACAAT +TCCTGTTCCGGCGGCTCCATGCTGCCAATCCTGACGCACTGGCCACAAGA +CGCGGCGCTGCTGGCAATCTCTCGATGAACAACCGATCTACAATTTCCAT +GACGACTCCTCTGTCACGATGAGACAGAAGACACCACCAACGCCAGCAGC +TCCAAAACAATACAACAACGGCCGCGCGGAACCCATCTTCAGAATTCCCT +CTTCCTGACGACCGGCGAACGAGTTCTGGAATAAACAATGTATTAATTGC +AAACATCTACCGATGAGGGTAGAAGAGATACTCACCAAACGACTGCGGCG +CGGGAACAAACTAACTGCAACGCCGGCCGGACCTATTTGTTGCAAGTGGC +GCGCATCCAGCGCCTGCAACATGCCCCAGCCCAAGTACACAACTACTTAC +CTGCAACGTCGCCAGAGGCTCCCAGCGAATCGGTGCTTCCGTCCTTCTGG +CGGGGGTACCTGAAAAGAAACAAATTAAACAATATTAATCCTAAATTTCA +ATGTTTTTTGTAAAATAATTTAAATTGTTAAATGTAAACAAGCCTTGCAA +TATGTTAATGTTACCAGTCCATGCTACTGTCTAAAAGCCAAGAATACAAA +AAATACTAATTATAAACTAACTCACCACGCCCAACCCCCAAACTCACCCC +ATGCAATGTTAAACCTATAAATTCAAATAATTGTACCTATATATTGCACA +TACTGTAATCAAAGGCAAAATAAATCGTGGATGCGGAACAGAATTTACTC +TGTCTCCGTACCTCCACCAGCAAAGTTAAAAAA +>FBgn0001283_jockey +AAAAATCATTCACATGGGAGATGAGCAATCGAGTGGACGTGTTCACAGAA +GTCGCGAGATAAAACAAAAACGTAATTGTGATCCATCACAAACATCTGCG +CAGATCGTGTGCTTATCTCACAAACAAAATCTATTTTTAGTCACTGCATA +ACGGTGACGGCTTCGGTTCGCGAAACTTATCAGCAACTAGCAATTTCTAA +GCTGTGTTGTTTTTGCCCCTCGCCCTGCGCGCTGCGCAAGCGGGAGGTTG +TTACAATTTACCTTACAAGTAAACCGGTAAATCTTATCGTGTTTAGTAAA +TATCAATTGCATTATACGGCATAAGTATAAAGACAATTGATATAATGGAG +AATTCATTTGCTCAATCGCGACCTAGCAATGGGTGCGATAAATTTGAGAA +AATGAGGAAAGTAGCAGGTGTTGAGCCAGGAGAATTACGCTCCCAACTCC +GCGCCAGCTGTGCAGTTGTTTCCCCTAACCTGGAAGGTATGCCAACTCAA +TCTGCGGTCTCCAGCTTAATGGTGACAATCAGCAGCAACACCAATGCAAG +TGTTACCTGCACTATTTCTAACGTACAGGCCAACATGATCTGTACTCCTA +CATACACTGATTGCACAACCGTGACCACTAGCATTTGCCCAACTACGCCT +TATGACAATGGACTGCCGACACCTCTGTCATCACTGCCCAATAAGCCATC +TAAAGCGAATTGCCCCTTTCAAGCACATGATCGTACTGTCAACAGGAAAC +GAAAAGGCGTGTCTCAGCCCCCATTACCTATCCTCACCCCTTCTCCAAGC +CGTAAAACTAAAAGGCAGGCCACTATGCCACTCAATGAGGAGGCCTCTAC +CTCCACTGCAGCAGCATTAAATAACAATCGCTTCGCGCTTTTGTCCGCTG +AAGCGGAGAATATGGAGCAAGACGTGTCGGATGCTGATTCTGACATTGAA +GACTCTGCTGCCCGAGATGGTGGTGGACAATCCGCTAAATATAGCAAACC +CCCAGCCATATGCGTACCAAGTGTAAGCGATCCGGTCACCTTGGAACGGG +CTCTCAATCTGAGCACCGGCTCCTCAAACTACTACATCCGCATTTCTAGA +TTTGGTGTATCCAGAATCTATACAGCCAACCCTGATGCTTTCCGCACCGC +TGTAAAAGAACTAAATAAGTTAAATTGTCAATTCTGGCATCACCAACTTA +AAGAAGAAAAACCCTACAGAGTAGTGCTTAAAGGAATCCATGCTAATGTT +CCTAGTTCGCAGATAGAACAAGCATTTAGTGATCACGGCTATGAGGTCCT +TAATATCTATTGCCCCAGAAAGTCTGACTGGAAGAACATTCAGGTAAACG +AAGATGATAATGAAGCTACAAAAAACTTCAAAACTAGACAAAATTTGTTT +TATATTAATCTTAAACAAGGCCCGAATGTTAAAGAGTCTCTTAAGATAAC +TCGACTTGGCAGATACAGAGTCACTGTTGAGCGCGCTACACGTAGAAAAG +AACTGCTACAATGTCAAAGATGCCAAATTTTTGGACACTCTAAGAACTAT +TGCGCCCAGGATCCTATTTGTGGTAAATGTAGTGGTCCCCATATGACCGG +GTTCGCTTTGTGCATAAGTGACGTATGTCTGTGTATAAATTGTGGTGGTG +ATCATGTCTCGACAGACAAAAGCTGCCCTGTCAGAGCAGAGAAAGCCAAG +AAGCTAAAACCAAGGTCCAGGCTACCGATGACTAATAATATTGCCACACT +CAAACCTCCACAACGTTCTTCAAGCGGTTACATACCAGCTGAGGCATTAA +GAACCAACATCTCTTATGCTGATATTGCTCGACGCAACACGACTCAATCT +AGGGCTCGTGCTACTGTGCAGGCTGAAGTTATACCAACGTCGGACAATAG +CCTTAACAATAAATTTATGACGTTAGACAACTCCATTCGGGCCATCAATA +CGAGAATGGACGAACTATTTAAGCTTATACACGAAACTGTAGAGGCTAAT +AAAGCTTTCAGAGAACTGGTTCAGGTTCTAATTACACGTATTCCTAAATG +ACTCAACCAACCTTAAAAATCGGATTGTGGAACGCTCGCGGATTAACAAG +GGGCTCTGAGGAGCTTCGGATATTCCTCAGCGATCACGATATAGACGTAA +TGCTTACCACGGAAACACACATGCGAGTTGGTCAGCGCATCTATCTCCCA +GGGTATCTTATGTATCACGCCCACCACCCCAGTGGTAACAGTAGAGGTGG +CTCTGCAGTCATCATAAAATCTAGACTTTGTCACAGCCCTCTGACACCTA +TCTCTACTAATGACAGGCAGATAGCGAGAGTGCACCTGCAAACATCGGTT +GGGACCGTCACTGTAGCTGCTGTTTATCTACCTCCAGCAGAAAGATGGAT +AGTAGATGACTTCAAATCCATGTTTGCTGCGTTAGGCAACAAATTTATTG +CTGGTGGTGATTACAATGCCAAACATGCATGGTGGGGGAACCCAAGATCC +TGTCCTAGAGGTAAAATGTTGCAAGAAGTCATTGCACATGGGCAATACCA +AGTTCTGGCTACGGGCGAACCCACTTTCTACTCTTACAACCCTTTGTTAA +CACCATCAGCCCTTGATTTTTTTATAACCTGTGGGTACGGCATGGGCAGG +CTAGATGTACAAACTCTCCAGGAACTCTCGTCGGACCATCTTCCTATTCT +GGCTGTATTGCACGCTACGCCGTTAAAGAAACCACAACGCGTACGACTAC +TTGCCCATAATGCTGACATAAACATATTCAAAACCCATCTTGAACAGCTG +AGTGAGGTAAATATGCAAATTCTGGAGGCGGTGGACATTGATAATGCCAC +AAGCCTTTTCATGAGCAAACTAAGTGAGGCTGCTCAGCTTGCTGCACCGA +GAAATCGGCATGAAGTAGAGGCCTTCAGACCACTTCAACTTCCTTCCAGT +ATATTGGCACTGCTCAGGCTAAAACGAAGAGTTCGAAAAGAATATGCTAG +AACAGGTGATCCCCGCATGCAACAGATCCACAGTAGACTGGCCAACTGCC +TGCATAAGGCCCTTGCTCGAAGAAAGCAGGCCCAAATAGATACCTTCTTG +GATAACTTGGGTGCTGACGCGAGCACAAATTACTCACTGTGGCGTATCAC +GAAACGGTTCAAAGCTCAGCCCACCCCAAAATCAGCAATCAAAAATCCGT +CTGGTGGCTGGTGTCGCACTAGCTTGGAAAAAACTGAAGTGTTCGCTAAC +AACCTTGAGCAACGTTTTACACCCTATAACTATGCACCGGAAAGTCTCTG +TCGTCAGGTTGAAGAATACTTGGAATCGCCCTTTCAAATGAGCCTGCCTC +TGAGTGCTGTCACACTGGAAGAAGTGAAGAATTTAATAGCCAAGCTGCCA +CTTAAGAAAGCTCCTGGAGAAGATCTTCTTGATAATAGAACCATTAGACT +TCTCCCAGATCAAGCATTGCAGTTCCTTGCCTTAATATTCAACAGCGTTC +TTGATGTTGGCTACTTTCCGAAAGCTTGGAAATCGGCGAGCATAATTATG +ATCCATAAGACTGGAAAAACACCGACAGACGTTGACTCGTACAGGCCCAC +CAGCTTACTCCCATCTCTGGGTAAAATTATGGAGAGGCTGATCCTAAACA +GGCTGCTCACATGCAAGGATGTTACCAAAGCGATTCCCAAATTTCAGTTT +GGCTTCCGGTTGCAGCACGGTACTCCTGAGCAACTACATAGAGTAGTGAA +CTTTGCTCTGGAAGCTATGGAAAACAAGGAGTATGCAGTAGGTGCCTTTC +TTGATATTCAACAGGCATTTGACAGAGTCTGGCACCCTGGGCTCCTGTAC +AAAGCGAAGAGGCTGTTCCCGCCGCAGCTATATTTGGTTGTTAAAAGTTT +CCTGGAAGAACGCACATTCCACGTCTCTGTTGATGGGTACAAATCATCAA +TCAAGCCAATTGCAGCTGGAGTTCCTCAAGGAAGCGTTCTTGGCCCAACC +CTATACTCAGTTTTTGCTTCGGACATGCCTACTCACACACCAGTCACAGA +GGTAGACGAAGAAGATGTGCTCATAGCCACCTACGCTGACGATACTGCTG +TGCTCACGAAAAGTAAAAGTATCCTGGCTGCCACTTCTGGTCTACAGGAA +TACCTGGATGCATTCCAGCAATGGGCTGAGAACTGGAATGTGCGCATCAA +CGCTGAGAAGTGTGCCAATGTGACGTTCGCCAACCGAACAGGTAGCTGTC +CGGGTGTCAGTCTGAATGGAAGACTGATCAGACACCATCAGGCTTATAAA +TACCTTGGTATTACCCTCGATAGGAAGCTCACCTTCAGCAGGCACATCAC +AAATATTCAGCAAGCGTTCAGGACCAAGGTTGCTCGGATGTCTTGGCTCA +TTGCACCACGCAACAAACTGTCGCTTGGCTGCAAGGTCAATATTTACAAG +TCCATATTGGCCCCCTGCCTGTTCTACGGCCTGCAGGTATACGGCATTGC +TGCGAAGAGTCACCTTAATAAGATCCGGATTTTACAGGCGAAGACCTTAA +GAAGAATTTCGGGGGCTCCTTGGTATATGAGAACAAGAGACATCGAACGC +GACCTCAAGGTGCCCAAATTAGGAGACAAGCTCCAGAACATCGCCCAAAA +ATATATGGAAAGGCTTAATGTACACCCCAACAGCCTAGCAAGGAAGCTAG +GAACTGCAGCTGTGGTCAATGCTGACCCTCGGACTAGAGTCAAAAGAAGA +CTCAAGCGACACCACCCTCATGACCTCCCTAACCTGGTTTTGACCTAGAA +AGTCTTAGTTTTAAAATTCATTAGAATAATCAAATAAATAATAATTACTA +TGTTATATCAACTATTATAATTCTCCCTATCATTTTTAGATTAAAAATCT +GTTAGTCTTAAGTAACCAAGACACATTGTAAAATAAAATAATTTAAGCAG +ATCAAATTAAGTTGCCGCATGGGTAACAGTGCGTTGATCAAATAATAAAA +ACATCATAAAAAAAAAAAAA +>FBgn0003055_P-element +CATGATGAAATAACATAAGGTGGTCCCGTCGAAAGCCGAAGCTTACCGAA +GTATACACTTAAATTCAGTGCACGTTTGCTTGTTGAGAGGAAAGGTTGTG +TGCGGACGAATTTTTTTTTGAAAACATTAACCCTTACGTGGAATAAAAAA +AAATGAAATATTGCAAATTTTGCTGCAAAGCTGTGACTGGAGTAAAATTA +ATTCACGTGCCGAAGTGTGCTATTAAGAGAAAATTGTGGGAGCAGAGCCT +TGGGTGCAGCCTTGGTGAAAACTCCCAAATTTGTGATACCCACTTTAATG +ATTCGCAGTGGAAGGCTGCACCTGCAAAAGGTCAGACATTTAAAAGGAGG +CGACTCAACGCAGATGCCGTACCTAGTAAAGTGATAGAGCCTGAACCAGA +AAAGATAAAAGAAGGCTATACCAGTGGGAGTACACAAACAGAGTAAGTTT +GAATAGTAAAAAAAATCATTTATGTAAACAATAACGTGACTGTGCGTTAG +GTCCTGTTCATTGTTTAATGAAAATAAGAGCTTGAGGGAAAAAATTCGTA +CTTTGGAGTACGAAATGCGTCGTTTAGAGCAGCAGCTGAGGGAGTCTCAA +CAGTTGGAGGAGTCTCTACGCAAAATCTTCACGGACACGCAGATACGGAT +ACTGAAGAATGGTGGACAAAGAGCTACGTTCAATTCCGACGACATTTCTA +CAGCTATTTGTCTCCACACCGCAGGCCCTCGAGCGTATAACCATCTGTAC +AAAAAAGGATTTCCTTTGCCCAGTCGTACGACTTTGTACAGATGGTTATC +AGATGTGGACATAAAAAGAGGATGTTTGGATGTGGTCATAGACCTAATGG +ACAGTGATGGAGTTGATGACGCCGACAAGCTTTGCGTACTCGCTTTCGAC +GAGATGAAGGTCGCTGCTGCCTTCGAGTATGACAGCTCTGCTGATATTGT +TTACGAGCCAAGCGACTATGTCCAACTGGCTATTGTTCGTGGTCTAAAAA +AATCGTGGAAGCAGCCAGTTTTTTTCGATTTTAATACCCGAATGGACCCG +GATACTCTTAACAATATATTAAGGAAACTGCATAGGAAAGGATATTTAGT +AGTTGCTATTGTATCCGATTTAGGTACCGGAAACCAAAAGCTATGGACAG +AGCTCGGTATATCAGAATGTAAGTTTCGTATATTACAAAAATCAGATAAT +CCTTGAAATTCCATTTTTTAGCAAAAACCTGGTTTAGCCATCCTGCAGAT +GACCATTTAAAGATTTTCGTTTTTTCGGATACGCCACATTTAATTAAGTT +AGTCCGTAACCACTATGTGGATTCCGGATTAACAATAAATGGGAAAAAAT +TAACAAAAAAAACAATTCAGGAGGCACTTCATCTTTGCAACAAGTCCGAT +CTGTCTATCCTCTTTAAAATTAATGAAAATCACATTAATGTTCGATCGCT +CGCAAAACAGAAGGTTAAATTGGCTACCCAGCTGTTTTCGAATACCACCG +CTAGCTCGATCAGACGCTGCTATTCATTGGGGTATGACATTGAAAATGCC +ACCGAAACTGCGGACTTCTTCAAATTGATGAATGATTGGTTCGACATTTT +TAATTCTAAATTGTCCACATCCAATTGCATTGAGTGCTCGCAACCTTATG +GCAAGCAGTTGGATATACAGAATGATATTTTGAATCGAATGTCGGAAATT +ATGCGAACAGGAATTCTGGATAAACCCAAAAGGCTCCCATTTCAAAAAGG +TATCATTGTGAATAATGCTTCGCTTGATGGCTTGTATAAATATTTGCAAG +AAAACTTCAGTATGCAATACATATTAACAAGCCGTCTCAACCAAGACATT +GTGGAGCATTTTTTTGGCAGCATGCGATCGAGAGGTGGACAATTCGACCA +TCCCACTCCACTGCAGTTTAAGTATAGGTTAAGAAAATATATAATAGGTA +TGACAAATTTAAAAGAATGCGTAAACAAAAATGTAATTCCATGATTTATA +ATTGTTTAATGTTTAGCTATATGTTTCAGGAAAGTTTCAGTTGAGAATGT +AGGTAGTTATGTGCTGTCTATTGTGTTTTGTCTTTTATCTGTTTCTTTTC +ATTTTATTATTTAATCATTATCCTTTTGCTTATCCAGCCAGGAATACAGA +AATGTTAAGAAATTCGGGAAATATCGAAGAGGACAACTCTGAAAGCTGGC +TTAATTTAGATTTCAGTTCTAAAGAAAACGAAAATAAAAGTAAAGATGAT +GAGCCTGTCGATGATGAGCCTGTCGATGAGATGTTAAGCAATATAGATTT +CACCGAAATGGATGAGTTGACGGAGGATGCGATGGAATATATCGCGGGCT +ATGTCATTAAAAAATTGAGAATCAGTGACAAAGTAAAAGAAAATTTGACA +TTTACATACGTCGACGAGGTGTCTCACGGCGGACTTATTAAGCCGTCCGA +AAAATTTCAAGAGAAGTTAAAAGAGCTAGAATGTATTTTTTTGCATTATA +CAAATAATAATAATTTTGAAATTACAAATAATGTAAAGGAAAAATTAATA +TTAGCAGCGCGAAACGTCGATGTTGATAAACAAGTAAAATCTTTTTATTT +TAAAATTAGAATATATTTTAGAATTAAGTACTTCAACAAAAAAATTGAAA +TTAAAAATCAAAAACAAAAGTTAATTGGAAACTCCAAATTATTAAAAATA +AAACTTTAAAAATAATTTCGTCTAATTAATATTATGAGTTAATTCAAACC +CCACGGACATGCTAAGGGTTAATCAACAATCATATCGCTGTCTCACTCAG +ACTCAATACGACACTCAGAATACTATTCCTTTCACTCGCACTTATTGCAA +GCATACGTTAAGTGGATGTCTCTTGCCGACGGGACCACCTTATGTTATTT +CATCATG +>FBgn0003122_pogo +CAGTATAATTCGCTTAGCTGCATCGATAGTTAGCTGCATCGGCAAGATAT +CTGCATTATTTTTCCATTTTTTTGTGTGAATAGAAAATTTGTACGAAAAT +TCATACGTTTGCTGCATCGCAGATAACAGCCTTTTTAACTTAAGTGCATC +ATATCAGCTGTTTTTTTTGCCAATTTCAATGAATATCATCAAAGTTAGCT +GCGCCATCTATGAATCATTTTTGCATATCTAAAAGATGCAAGAATGCCAA +CTCGTTTCAGTATCTGCGCATGTCCGTTTTTGTTTTTGCTTTGATCGTGA +TTTTTGTGTTTTTGTTTCTTATGGCACAAAGTTATTAAAATGGGTAAAAC +AAAGCGTGTCGTTGGACTAACACTAAAGGAAAAGCTTCAAATAATCGAGT +TAGTGACCAACAAAGTGGACAAAAAGGAAATTTGTGCCAAGTTCAAATGC +GACAGATCCACAGTCAACCGCATTTTACAAAAAACAAATGAAATTCATGA +AGCTGTGGCCGCGTCAGGTTTAAAAAGAAAGCGTCAAAGAAAAGGAGCGC +ACGACTTAGTAGAAGAAGCCTTATACATTTGGTTCGGACAGCAGGAATCA +AAGAACGTAATTCTTGACCGGCACGTCATATTAGCAAAAGCGAAAGAATT +TTGCCAAAAATTTAACGACGCCTTTGAACCTGACGCCAGCTGGCTTTGGC +GCTGGCGCAAGCGCCACAATATAAAGTATGGCAAAATACACGGCGAAACT +GCTACAAATGATTCCGTATCAGCAAATGAGTACAAAAATGATATTTTGCC +AGGATTGCTTAAAGGTTATAACCCAGAAGACATTTTTAACGCTGACGAAA +CTGCACTCTTTTATAAAGCAATGCCGAATGCGACATTTTTTACTTGTGGA +AAGCAATTAAATGGCCAGAAATCTCAGAGAGTGAGACTTACTTTGCTGTT +TATATGCAATGCAACTGGGACATACAAAAAAACTTTTGTAATCGGCAGAT +CTAAATCGCCACGATGCTTCAAGAATGCTAATGTGCCCATTCCGTACTAT +GCAAATAAGAAGGCCTGGATGACTAAGGATCTCTGGCGAAAAATAATGAC +AGGATTTGACGAAGAAATGAAAAAGCAAAATCGAAAGATTTTACTCTTCA +TCGACAATGCAACTAGTCACACGACTGTCAAGGACTTCGAAAACATAAAA +TTGTGCTTCATGCCACCAAACGCAACGGCTCTACTTCAACCTCTGGACCA +AGGTATTATCCACTCATTCAAATTAGAGTATAGGCGTATTTTGGTCAAAC +AGCAGCTCATTGCTGTTAATTGTGGTAAATCTACTGTGGAATTTTTAAAA +TCATTATCGTTATTGGATGCTCTATATTTTGTCAACCAAGGATGGAAGAA +TGTTAAAATGTTAACTATTCAGAATTGTTTTAAAAAGGTAAGATGGGATT +ATTATTGATATGTATCTCAAATAACGAATTTATTATTTTCAGGCTGGATT +TAAGTTCAGTTTTGAAAATGAAGACACCATTGCTGAAAAAGACAAACAAT +GCGTAGAAGTTGACATTGTATCGAATATTAATTGGAATGAATATGCCAAT +GTTGATGCAGATGAGGCTTGCCATGGTCAATTAGATGATGATGAAATCGT +GCGCTCTTTAGTTCAAGATGCAAAAACCAGCGATAACGAAGAAAGCCATA +GTGATGAAGATGTGGACGATACTGAGCGTCCTACTTTTAAGGATGGGTTT +GCAGCAATTAAGGCTTTAAAGTCCATTTTTATGCGAAACAATAATGATGA +GTTTTTGCAAAACTTGAATTCTATGGAAGACAAGCTGTTTAATTTACATA +TAAACTCAGCTGTATTGCAAAAAAAAATTACTGACTATTTTTAAGTTAGT +TTTAAAAAGTGTTTTAATCAATTCACCATCACTTAAATTTATATGTCGAT +CTTACTTATCATTAAGAATGAAATTATCAGTTCCTTTTATGTTTAACATT +GTTATAAAGAAATAAATTCTTTATTTTTCCTTAAAAAAAAAAATTAAGTT +AGCTGCATTTTTAAGTTACCTGCATCGAGGCATTGTGCAAAGTACTCGAG +GCAGCTAAGCGAATTATACTG +>FBgn0000155_roo +TGTTCACACATGAACACGAATATATTTAAAGACTTACAATTTTGGGCTCC +GTTCATATCTTATGTAAATGAATCGAGAGCGATAAATTATATTTAGGATT +TTGTTATCTAAGGCGACATGGGTGCATTGCTCAAAAACATGTAATTTAAG +TGCACACTACATGAGTCAGTCACTTGAGATCGTTCCCCGCCTCCTAAAAT +AGTCCCTTAGTGGGAGACCACAGATAAGGTCCTCGCCGCTCAAGATAGGC +AGATGTGCCCGAGCGTGGGACCTCGATAAGGCGGGGACTATTTACGTAGG +CCTCTGCGTAGGCCATTTACTTTAAGATGCGATTCTCATGTCACCTATTT +AAACCGAAGATATTTCCAAATAAAATCAGTTTTTTTACAAAAACTCAACG +AGTAAAGTCTTCTTATTTGGGATTTTACATTTGGTCAATCGAGCCTTTAA +TCGACTCTGCAGTTTCCCCCTACCAAAGGTAAGGAACTCAGAGAAAGGCC +AGCTCCTTTAAGCATCTTACAGCTAAAGGTAGCAAAAATAAGTGACTCTT +GTTTCCCCCTACCAAAGGTAAGGAACAGAGTATAAATATAAAAAGCAAAA +GATACAAAAGAATCTTTTATGTTTTAAAACAAGCACCTTATAGTCTATAG +CTAAAGGTTGCTTTGTGTACCATTATAAATTGTGGTAAGGCGTGCTTGAG +GCCATACATCAGCAATTGTGAAATTAAAAAGTGCATAACAAAAGTGCCTT +ATAAATGCTCTAATAGCATTAAATCAGCTCATAAATAGAGTGCAGTGTAT +ATGCCATAAGAGCATAAATTAAATAAAAAGTGCCTGAAAACAGTGCCTTA +TAAATGCTCTAATAGCATTAAATCAGCTCATAAATAGAGTGCAGTGTATA +TGCCAAAAGAGCATAAATGCCGAAATAAATGGCTAAAAAACAAAAAATCT +GACTGGACTACAAAAATAATAAAACGTGCCAAAAAAAAAAAAAAAATCAT +CTTTAAACATCGACGGAGCCTTAAAGAAGAGAAGGAAGTCAAATTCAAAG +GAGCCTCTACCAGCAGCAGAAGCAGCAACAACAGCAGCAGCAGAAGCAGC +AACAGCAGTAGCAACAGCAGCAACAACAGCAGCAACAGCAGCAGCAACAA +CAACGACATCAGCTAAGTCAAAACAAGAATTTTCTGTTTATCCAAACACA +CATATATATATAAATACATATAAAATACATATACACGTACTATATATATT +AAGAAATTACAAAAAATTTTCAAAATGATGTCAGAAAAGACTATTCAATT +CCTTAAGAAGCAGTCCGAAATTATTTTGGAAATTAGAAAGTTGGAAGTAA +AACCAACATTAACAGATGTAGAAATTCTAAAATTAAATGAGCTTCAAAAA +TGTTTCATTGCTAATCATAGCAATTTGTTAAAGATCGGCGTTGTCGATCA +TGAATATTTTAACGCGAAGCAGTATGATTTAATAATGATGGTGTTAGAAA +AAATTAAAAATAAAAATGAAAAAATTAAGGGCGAGTCGGTAGAAAACACT +TTCCCTAAATCAAACACTGTCCCTAAATCAAACCCTCCCCCTACATTAAA +CCTTGAAATGCGTGGTCACCCTGAAAAAGAGGGTATAGCACAAAACAACG +CTTTAAAAGTAGAGCAGGCATTTCGTAATAATGTTGGCCAATTTCGAGTA +TATCTAGAAGATACGTCTAAACTAATAGACAGTAGTCCAGATTTCCTTAA +AATAAGGAAAAATAAAATTGAATTTTTATGGCATAAAATAGATAACCTGA +TTGAACAGGTGAATAGTCGTTTTGAGAGTTCGCTATTCGAAGAAGAAATT +AGCGAACTTGAATTTGACAAACAAAATATTCTTACAGCCATTAATAGTCG +ACTCAGTGGCACAATAAATAAAGCTGAAATGTCGACGGTTGTTAAGGCGG +AGGAGTTACCAACCCTGCCTAAAATACAGATTCCCACCTTCTTTGGTGAT +TCCAAAGAATGGGATCTTTTTAATGAACTCTTTACAGAGCTCATACATGT +GAGAGAGGATCTCAGTCCTTCTCTCAAATTTAATTATCTAAAGTCAGCAT +TAAAAGGAGAAGCCAGAAATGTGGTTACTCATTTACTGCTCGGCTCTGGA +GAAAATTATGAAGCCACTTGGGAGTTTTTGACCAAGCGATATGAGAATAA +AAGAAACATATTCTCAGATCATATGAATAGGCTTATGGATATGCCAAATT +TAAATTTAGAATCCAATAAGCAAATAAAGACATTTATTGACACGATTAAC +GAGTCAATTTATATTATAAAATTAAAGGCACAATTACCAGAAGATGTGGA +TGCAATTTTCGCTCACATAATTCTTCGGAAATTCAATAAAGAATCACTCA +ATTTATATGAAAGCCATGTTAAAAAGACAAAAGAAATACAGGCACTTTCT +GATGTCATGGACTTTTTAGAGCAAAGGCTCAATTCTATATCATCATTCTC +ACAGGAAGTAAAACCTGTAAAGAAAATGATTAATAATAACAAGAATAAAA +ATTATAGTGACAATTGTGCATATTGCAAACTACCAGGGCATTATTTAATT +CAATGCCATAAATTTAAAATAATGAATCCAGCAGAACGGTCTGACTGGGT +AAGAAAAAATGGGATTTGCCTAAGATGTCTGAGGCATCCGTTTGGTAAAA +AATGTATAAGCGAGCAGCTTTGTTCGACTTGTCGTAAACCTCACCACACG +TTACTTCACTTTGCAGGTCATAATCCAGAAAAAGTGAATACGTGTAGAAC +AACAGGTCAAGCCTTGTTGGCCACGGCCTTGATTCAAGTAAAGTCGAGGT +ATGGAGGCTTTGAACAATTAAGAGCATTGATTGATAGTGGCTCTCAAAGC +ACAATTATTTCAGAAGAGTCTGCACAGATTCTAAAATTGAAAAAATTTCG +GTCTCATACTGAAATAAGTGGAGTATCTTCCACAGGAACGTGCATCTCCA +AGCACAAAGCGGTTATTTCGATAAGAAATTCTCCGAAAAATTTAGAAATT +GAAGCAATTATTCTCCCAAAACTTATGAAGGCACTTCCAGTCAACACGAT +TAATGTTGATCAGAAAAAATGGAAGAACTTTAAATTAGCCGACCCCGATT +TTAATAAACCGGGTCGCATTGATTTAATCATTGGAGCAGACGTATATACT +CACATTCTGCAAAATGGAGTTATAAAAATAGACGGTCTCCTTGGGCAAAA +AACTGATTTCGGGTGGATAGTTTCTGGATGTAAAAAATCCAAAGGAAAAG +AAACCATTGTAGCCACAACAATAGAAATAAAAGAGTTAGATCGCTACTGG +GAAGTGGAAGAAGAAGAAAAAGATGATATCGAGTCTGAAATCTGTGAAAA +TAAATTTATCAAAACGACAAAAAAAGATTCAGATGGGCGATACATTGTGT +CAATTCCATTCAAGGAGGATGTCACCTTAGGAGATTCAAAGAAACAAGCG +ATAGCTCGTTACATGAATCTGGAGAAAAAACTAAAAAGAAATGAAAAACT +TAAGGTTGACTACACTAAATTCATGAATGAATACATGGATTTAGGACACA +TGATTGAAGTGAGTGATGAAGGCAAATATTTTTTACCGCACCAGGCAGTG +ATTAGAGATTCAAGCCTTACGACCAAATTGAGAGTAGTTTTTGATGCTTC +AGCAAAAACTACGAATAACAAAAGTTTGAACGACATAATGTGGGTTGGGC +CACGAGTTCAAAAAGATATTTTTGACATTATTATTAAATGGAGAAAATGG +GAATTTGTTGTTTCGGCAGACATTGAAAAGATGTACCGACAAATTAAAAT +AGATAATAATGATCAAAAATATCAATATATTTTATGGAGAAATTCTCCAA +AAGAAAAAATTAAAACATATAAATTAACCACAGTCACTTACGGAACTGCA +TCTGCACCATATTTGGCTACCAGGGTTCTGGTAGATATTGCAGATAAATG +TAAAAACCAAGTTATTAGTGCAATAATTAGGAATGATTTCTATATGGATG +ACCTAATGACTGGAGCTGATTCGGTAGAAGAAGCTAATAAATTAATAACA +TTAATTCCCCATGAATTGCAGAAAGTTGGATTCAACTTAAGGAAATGGAT +TTCCAACAATTCCAAAATATTAACCACTGTGGAGGACACAGGGGACAATA +AGGTTCTCAATATTATCGAAAATGAATGTGTTAAAACTTTAGGACTAAAA +TGGGAACCTCAAAAGGATTTATTTAAGTTCAGCGTAAATTGTAATGATGA +ATCAAAAAATATAAATAAGCGCGTTGTGTTATCAACGCTAGCAAAAATAT +TTGATCCGTTAGGATGGTTGGCACCAGTCACGGTTTCAGGAAAACTTTTT +ATTCAAAAACTTTGGATAAATAAAAGTGAATGGGATCAGGAATTATCCAT +AGAAGATAAAAATTATTGGGAAAAATATAAAGAAAATTTATTATTGTTAG +AGAATATTCGAATCCCAAGGTGGATTAATTCAAACAGTTCTTCAGTCATT +CAGATTCACGGATTTGCGGACGCCTCCGAAAAAGCATATGCTGCAGTAGT +CTATGCTAAAGTAGGACCTCATGTTAATATAATAGCTAGCAAAAGTAGAG +TCAACCCTATAAAAAATAGGAAGACAATTCCCAAACTCGAGCTGTGTGCA +GCTCACCTGCTTAGTGAATTAATCCAAAGACTAAAAGGATCAATTGACAA +TATAATGGAGATCTATGCTTGGAGTGATTCCACGATTACCTTAGCATGGA +TTAACAGTGGTCAAAGTAAGATCAAATTTATAAAAAGAAGAACGGATGAC +ATTCGGAAATTAAAAAATACTGAATGGAATCATGTTAAGTCAGAGGATAA +TCCAGCAGATTTAGCATCCAGGGGAGTGGATTCTAACCAGTTGATCAACT +GTGATTTTTGGTGGAAAGGTCCGAAATGGCTAGCAGACCCAAAAGAACTT +TGGCCTCGGCAGCAGTCTGTAGAAGAACCTGTCTTAATAAATACGGTATT +AAATGACAAAATAGATGATCCTATTTACGAATTAATAGAAAGGTATTCCA +GTATAGAAAAACTTATACGTATAATAGCATACATAAATAGATTCGTGCAG +ATGAAAACAAGAAATAAAGCCTATTCATCAATTATTTCAGTAAAGGAGAT +AAGAATAGCGGAAACAGTTGTTATTAAGAAACAACAAGAATACCAGTTTA +GGCAAGAGATAAAGTGCCTTAAAATCAAAAAGGAAATCAAGACAAATAAT +AAAATATTGTCATTGAATCCATTTTTGGACAAGGGTGGGGTTCTAAGAGT +TGGAGGAAGATTGCAAAATTCCAATGCAGAATTTAATGTTAAACATCCAA +TCATTTTAGAAAAATGCCACCTAACAAGCTTATTAATAAAAAATGCTCAT +AAGGAAACATTGCATGGAGGGATAAACCTAATGCGAAACTATATCCAAAG +AAAGTATTGGATTTTCGGGTTGAAAAATTCGTTGAAAAAGTATTTAAGAG +AATGTGTAACGTGTGCAAGGTATAAACAAAATACAGCTCAGCAAATAATG +GGTAACTTGCCAAAATATAGAGTGACGATGACATTCCCGTTTCTTAATAC +TGGAATAGATTACGCAGGTCCTTATTATGTTAAATGTTCAAAAAATCGTG +GCCAAAAAACATTTAAAGGATACGTTGCTGTATTCGTTTGCATGGCCACC +AAAGCCATACACTTAGAAATGGTAAGCGATCTAACTTCAGACGCATTTTT +AGCAGCACTCAGAAGATTTATTGCTAGACGGGGAAAATGTTCCAATATCT +ATTCAGACAACGGAACAAATTTTGTAGGAGCTGCAAGAAAATTAGATCAA +GAGTTATTTAATGCAATACAAGAAAATATAACGATTGCAGCGCAACTTGA +AAAGGACAGGATTGATTGGCATTTTATTCCCCCGGCAGGACCTCACTTCG +GAGGTATTTGGGAAGCTGGAGTTAAGTCAATGAAATACCATTTAAAGCGT +ATAATCGGCGACACTATTTTTACTTATGAAGAAATGTCAACTCTTTTATG +TCAAATAGAAGCATGCTTAAATTCAAGGCCATTATACACTATAGTTAGTG +AGAAGGACCAACAAGAAGTTTTAACACCAGGTCATTTTTTAATTGGAAGA +CCACCTTTAGAAATAGTCGAACCAATGGAAGATGAAAAAATCGGAAATTT +GGATAGGTGGAGACTTATCCAAAAAATAAAGAAAGATTTCTGGGTTAAGT +GGAAAAGTGAATATTTGCATACGCTCCAGCAAAGGAATAAATGGAAAAAG +GAAATTCCTAATATAGAAGAAGGGCAAATAGTTTTATTAAAGGATGAGAA +TTGTCATCCTGCAAGATGGCCTTTAGGAAAGGTGGAAAAGGTGCATAAGG +GGAATGATGATAAGGTCCGAGTGGCTAAAGTAAAGATGCAGGAAGGATAT +ATCACTAGACCCATTACTAAAATTTGTCCCTTGGAAGGAATAAAGTCTGT +TGACAAAAATGAGGCTGACCAAGAGCCAAAAAGACGAACTAGAGCGACAT +CGGGAATGTCCAAGATCGGAATCATTATGGCAATGTTGTTGTTTGTGTTA +AGTTGTCAAGTTTCTAGCGCATTACCTAAAGATATAGCACCAAGATATTC +TATAGACAAAATAAATAAAACCTCAGCAATATATCTAGACCCGCTAGGAG +ATGTTGAGATTGTGAGTACTTCTTGGAATTTGGTTATCTATTATAAAATG +GATCCATATTTTAAAATGTTAACAAAGGGTAATGCGCTTATACAAAGTAT +GAGGAAAGTTTGCGAAAGACTTCATAGCTTTGAAGAGCAATGTAGTCTAG +TCTTAGATAATATGCAAAGTCAGTTATCGGAACTTGAAGAAAACAATAAA +TTGTTTATGATGCAGTCTAGATCTAGAAGCAAGCGTGCTCCTTTCGAATT +TATGGGTTCCTTGTATCATATTTTATTTGGTATAATGGATGAAGATGATA +GAGAGCAATTAGAAGAAAATATGAAGAATTTGTTAGATAACCAGAACAAC +CTTGATAAACTAATTCAAAAACAAACATCTGTGGTTGATTCAACTTCTAA +TCTATTAAAGAGAACAACAGAAGATGTTAACTCCAATTTTAGAAGTATGC +AAATAAGAATTGAGAACATGACAGAAGTTCTTAAAGAAAATTATTATGTT +TATAAGGAATCAATAAAATTCTTTATGATTACGAAACAGCTACACTCATT +GATTGAAGAAGGCGAAAAAATTCAAGCAGGCATTATAAGCCTGTTGATTG +ATATTAATCACGGTAGGCTAAATACAAATATTCTCAGGCCAAATCAGCTT +AAAAAAGAAATTGCCAAAATTCAGCAGAGTCTTTCAGAGAACCTAGTAAT +TCCAGGAAAACGGTCAGGTACGGAACTTAAGGAGGTGTATACACTGTTAA +CAGCCAGGGGTTTATTCATCGACGATAAATTGATCATTAGTGCAAAAGTG +CCTCTGTTTAGCAGGCATCCATCCAAATTGTTCAGGCTTATTCCGGTGCC +AATTCGAAATGAAGATCGGATAATAATGGTGCATACAACGTCCGAATATT +TAATTTATAATTTTGAGATAGATTCCTATCACATAATGACGGAAGCCACA +TTAAATCAATGTCAGAAATGGCAACTAAATAAGAGAATATGCAAAGGAAG +TTGGCCCTGGAATTCAGCGAATGATAATGCATGTGAGATTCAGCCTCTAA +AGCCAGATAAAGCGGCGAACTGCATCTATAAAACAGTAGTCGACTCTAAA +AGTTACTGGGTAGAGTTAGAAAAGAAAAGTAGTTGGTTGTTTAAGGTTCC +TGCGAATTCAAAAGTCCGTCTGCAATGTACTGGCTCTCAAATTGAATTGT +TTGATTTGCCTCAGCAAGGAGTTTTAAGCATTGCGCCATATTGTACGGCA +AGAACCGACGATAAAATTCTAGTTGCCCACCATAACATTCAGTCCGAAAG +TGAAGAATTATTATCAACACCTTATATAGGAGAAGTTAGTGGAGTGCCGA +AGATTATTTGGGATCCGCTGAAACTATCAATATTAAATCATACTGAGGAA +TTTGAACGATTGAATAATGAAATTAAATTTATGAAAGAGAACCATCAAAA +ATTGAAAGATTTACATTTCCATCATATTTCCGGACATGCTGGATTAATTA +TTGCTTTAATACTAATGATAGTATTAATAATATATTTCATACGGAAATGT +GCTGTGCAACAAAGAATGCAAGCAATAACCTTTGCAGGTCCGTTGCCAGT +ACTATAAATATCAATAGTAAATAAACAATAAAATAATATAACAAATAAAA +ATATACAGTCCACTAATAGAAAATGTACTTCTACATAGAAAAAGCAAAAT +GTTTAAAATAAGTTAATTAAGTACAAATTGTTGAATTAAAAATAATATAA +ACCATAATTGTAATCCAATAAAATTAAAAGCCAGAAAAACTAGGCCCATT +GAAATCTTAGTTGCAAAATAAATGAACATATATCAAATAAATACAGTCCA +CTACTGTTATAAATGCAACTAATATACTAATGTACATCTCAGCTTTGCTG +GCCCTTTGGCAGAATGTTCACACATGAACACGAATATATTTAAAGACTTA +CAATTTTGGGCTCCGTTCATATCTTATGTAAATGAATCGAGAGCGATAAA +TTATATTTAGGATTTTGTTATCTAAGGCGACATGGGTGCATTGCTCAAAA +ACATGTAATTTAAGTGCACACTACATGAGTCAGTCACTTGAGATCGTTCC +CCGCCTCCTAAAATAGTCCCTTAGTGGGAGACCACAGATAAGGTCCTCGC +CGCTCAAGATAGGCAGATGTGCCCGAGCGTGGGACCTCGATAAGGCGGGG +ACTATTTACGTAGGCCTCTGCGTAGGCCATTTACTTTAAGATGCGATTCT +CATGTCACCTATTTAAACCGAAGATATTTCCAAATAAAATCAGTTTCTTA +CAAAAACTCAACGAGTAAAGTCTTCTCATTTGGGATTTTACA +>FBgn0000199_blood +TGTAGTATGTGCATATATCGAGGGTACACTGTACCTATAAGTACACAGCA +ACACTTAGTTGCATTGCATAAATAAATGTCTCAAGTGAGCGTGATATAAG +ATCACCCATTTATGCTTTAAGCTAAGTCAGCATCCCCACGCTGGCCGCTG +GCCATATATGCGCATAAGCTCTCTCTCTCTCTCTCTTATACATATATATA +TACGCTGCTCTTCTGCCGCTGTCGACGGCGGCGCAGTCGCAGTATTTAGG +TAAGATTAGACACTCTGTAGAGGTTAAGCGGGCAGAACCGTTTCTGCTAC +TCGAAGAGATAAGAAGAAATAAAAAGGTGCCTGACGGCTGCACCCAACTG +CAAGGAAAACACGTGTTCTCAATTGGTGGCATATATTGGTTTATTACATG +GCGACCGTGAGGCAGGAGCCTGCGATCTGAGGACTACTGAGGAAATGCTG +CTAATATTGCCGATTTGATTTGGGAATTCTAAACAGCGACAACAGGTGTG +AGAAGCAGGCCGCCCCTTACACCAGTGCGGGAGACCTAGAGACGGGACAC +TGATGAAAAAAAAAGAAACAAAAATACTGAGTGAGTAGAGTGTGGTAATG +GGCAAACGCGGATGTCAGGAAATCAAAAATAAAGGTATAGCACATATTAA +GTGGCTATGATATACAAATAAAACACCGCCCCCATGGGCAACGGCACAGA +AATTAACTGCCGAATTAGACTTTCTGAAAGAAAACCTCCAGCAAAGAAAG +CCGAATACCACAACTCACTCAGCAAAAATAGAAATAATCAATGAAGAAAT +AACTGAAAATTCAACATCACCCAAGCCGAAAAGACCCGACGTCTGCATGA +AAGACTGCCCTCGACCATTGTAAGCCGCAACAGCAATTAGCACGGCATCC +TGCGAGGGTAGGATTAGGATAAAGGATAAAGGATTCCACCGGCGCGCCGC +ACATGACAACAGCGAATGTCTACCAAGCAGACGTTCGAACACCCTGCTCC +TGTCGAGCAAAGGGATCTGCCAAGTATCAAAGAGGTAATAGAGGTAGATC +CGTCCGCGGGACCAAAGCCCTTGACCATACAAGAGTACAAGGCACGGACT +GCAGCGAGGGAGCAGCCACCTAAAAAGAAGAGGGGTGGCCGCCGGATTAA +GTTGCTCAGCGCCCGGAGGCTCAACATCGAACTACTGAAGACGGCAACTA +ATGAGGAAGACCGGCAGCGCTACAAAGAGCGCCTTGCAGCCATCAATCAA +CAACTTCGTGGTGCGAAGTAAAGCGGCGGGCTGCGTTATACGCCATAGCC +TCAACCGCCCAAATATTATATTAATGTTGTCGATGCGGTTTCCGCTGCAA +CAAAATTACTAACTTATCAGGGACCCATTTCATAACTAACACATTATACT +CAGTCCTAAACTTAAAATAAGTAATAATATTGTAAAATTGCAAATTGCAA +CCGATGTAAACTGAGTATAATGAATTCATCTATCAAGTAAAAATATGTTT +AACAACAGTTTAGACCTATTAAAATTTCGAGCTATATTTATATCTGATCG +AGATAACAATAATTGACCAATTCTCAAAGTTAAAATTCTATTTGTACTTT +TGATATACAAATAAAGACTAATTTTCCCCATATCAAAATGGGACATAAGT +CGTGGATACAACCCCACAGTTAAATTCAATGTACTTACTATTTTTGATTT +TAGTTATCCTATCAGCCTTTTTACCTTGGCCTTAAAACTTTATCAGTTTC +ACACAAGATCGTTGAAAAGACTTACATGAGTCGAGCCAATGATTTAGACA +AAATCTAATAGAAACTACACCAAAAAGGTACAAGGTCGATTACATCGCTA +AAAGGTACATACATGGAATGGCTAAACTTAACCATATCCATAAACAATAT +TAGAGATGCTTTTGATAAATCCTATAAATGTATTAATAAAACCGCGCTGA +TCAAAACTCAGACGCTTATTTTTCACATAAAGGTATTGATAACACAATAC +AACACATTACAAAACCTAATAGTAACAAACAAAAGCAAACTCACTGAAGA +ACATAAAGTCCAATGCTTCAAAGTTCTCAGTTCATTTGGTAAAAGACTAC +ATAATACCAGCGTTAGACACAGTATTATAATAGAAGTCCCAACAGAACTA +ACCAAAATAGCAGAATTCGACGAAAGCCAGTTAAGAGACTTGGACGAGTC +GCAGCCGTTAGAAGATTTAGATATCGAAAGCGATATCGAATCAATAGAAG +AATTAAAATTTAATACCGTACAACCAAATACAAGAAACATGGCCAACGCA +TTAGAAGCTCAGAGAGCATACGTTAAACAGGTATCTGCCACAGTACCTGA +TTTCGATGGTAAGAAACTCCATTTAAACAGGTTTGTGACAGCACTTAAGT +TGACGGATCTAACTAAAGGAGATCAAGAAACTTTAGCAGTAGAGGTCATA +AAGACCAAAATTATTGGCCCATTAAACTATAAAGTAGAACATGCGACAAC +GATACAGGCAATAATTACCATATTGCAGGCAAACGTAAAAGGCGAATCGC +CTGACGTTATAAAGGCCAAATTAATAAATGCCCAACAAAGAGGCAAGACC +GCGTCTCAGTATGTTACAGAAATAGACAGTATGCGTAAGCAGCTCGAGGC +AGCTTACATAGACGGCGGATTAGACGCCGATAATGCTGACAAATTCGCGA +CTAAAGAGTCGATATCAGCAATGACCAAAAACTGTGCCAACGAGGCACTT +AAAATGATCTTAACTGCAGGTACATTTAGTACATTCAACGACGCAATGGA +AAAATACCTACATTGCAGTACAGAAATAACCGGCAATTCAAATACAGTCT +TATTCTATAATGGGAATAATAGACGTGGTAATTATAATGCCTACTATCGT +GGTAGAGGCAGAAATAATTATAACCATAATTATAACCAGAATTATAACCA +AGGTTATAATAATAACAACAGAGGTCGCGGAGGCTACCGCGGCCACGGTA +ATAACAGAGACGGAGGTAACCGAAGGGGTAACCAAAGTCAGAATAATAAT +AACAACCGAAATGTGCGTAACGTACAATCGGAAAACAGCCAGACCCCCTT +AAGCGATCAACAGTAAAAGTGTTTAAAGTAAACCTAAATCTGAGTATTTT +CATTAAGACAAAAAACCATGAAACAAACACAGTTCTTACATTACTAATAG +ACACAGGTGCAGAAATTTCATTGCTAAAAGCCAAAGCAAAGGAATATAAT +AATATAAATTTCAGTAATATATCAAATATTACAGGTATTGGGCAAGGAAC +CATACAGTCTATAGGTACAGTAGATCTTGACATACGCATTCAGGATGTTC +TAGTGCCACATGAATTTCATGTAGTACCTGAGAATTTTCCGATACCATGC +GATGGCATAATCGGAATAGATTTTATCAAGAAATACAATTGCGTATTAGA +GTTTCAAAATAACAAAGACTGGTTCACAATAAGACCCAATAACTTCAGTA +GACAGATTAGTGTACCAATTACACATAACTTAGACTCCAACACACTCTTA +TTGCCAGCTAGATGCGAAGTAATCAGACAAGTCAAATTACTCACTAACGA +AAAAACGGTGGTAGTACCAAATCAGGAGCTGCAACCAGGTATAATAGTAG +CAAGCACCATTGCCGATAGCAAAAACGCATTGATTCGCATTATAAATACA +AATAATAAAGACGCCATAATAGATAGCGCGAAGATCAAATGCGAATCAAT +GAAAGACTATGACATTTTTACAACACCAGTAGAAAAGGAAAATAGAACTG +AAGAAATTTTAAAACAATTAAGATTCCCTAAACAATTCAATAATGAACTA +ACTAAGTTATGCACCGAGTTTAGCGATATTTTTGGTCTAGAAACAGAACC +AATATCGGCTAACAATTTCTACAAACAAAAACTCAGATTAGGGGAAAAAA +CACCGGTCTATATAAAAAACTATCGCATGGCAGATAGCCAAAAACCAGAA +ATCGCCAGACAGGTAAAAAAATTAATAGATGATGGAATAGTTGAACCATC +AATGTCTGAATATAATAGTCCATTACTTTTGGTTCCAAAGAAACCACTTC +CGAATTCCACGGAAAAAAGATGGCGATTAGCAGTTGACTATCGTCAAATA +AATAAGAAACTATTATCAGACAAATTTCCACTTCCAAGAATAGAAGATAT +TCTTGATCAATTAGGAAGAGCAAAGTATTTTTCATGTCTCGACCTAATGT +CTGGATTCCACCAGATAGAACTAGAAAAAAGGTATAGAGATATAACGTCA +TTTTCAACAGCCAATGGCTCATATCGCTTCACGCGATTACCATACGGACT +GAAAGTAGCACCAAACTCCTTCCAACGTAGGATGACACTTGCATTTTCTG +GTCTTGAACCATCGCAAGCATTTCTATATATGGATGACTTAGTAGTAATA +GGTTGTTCAGAAAAACATATGCTCAAAAATTTGACTAACGTATTCGAGCT +ATGTAGACGACATAATTTGAAACTACATCCAGGGAAATGTTCTTTCTTTA +TGAAAGAAGTAACATATTTGGGTCACAAATGTACCGATAAAGGTATACTC +CCAGATGACACCAAATATGAAGTTATAGAAAAATATCCTATACCAACAGA +TGCCGACAGTGCTAGGCGTTTCGTAGCCTTCTGTAATTATTACAGACGTT +TCATTAAAAATTTTTCTGATCATTCACGCCACTTAACGAGGCTTTGTAAA +AAGAATGTTCAATTCGAATGGACAGCAGAATGCAATGATGCATTCGAATA +CCTTAAAACAGAATTAATGAAACCAACATTACTACAGTACCCAGATTTCG +GTAAAGAATTTTGCATAACAACCGATGCTAGTAAACAGGCATGCGGAGCG +GTACTTACACAAGATCACAATGGTCAACAACTTCCAGTGGCATACGCTTC +AAGAATGTTCACTCAAGGTGAAAGTAATAAGTCCACTACAGAACAAGAAT +TAACGGCCATTCATTGGGCCATAAATCATTTTCGACCATACATATATGGC +AAGCATTTCATGGTAAAAAGCGATCATAGACCATTGTCATACCTATTCTC +TATGAAAAATCCAAGTTCAAAACTCACTCGTATGAGGCTGGATTTAGAAG +AGTATGACTTTACTGTAGAATATCTTAAGGGGAAAGATAACCATATTGCG +GACGCCTTGTCTCGCATAACAATAAAAGATCTGAAAACAATCAACAGAGA +AATATTAAAAGTTACCACCAGATCAAAAGCTAAACAGGAAAATTCCTGTA +AGGACGAAGCAATAGTCAAAATACAAGAGGAAAAAGAGCAAACAATAGAA +AAGCCCAAAGTCTATGAAGTTGTCAATAATAATGACACAAAGAAATATGT +TTTAATCAAAATAGATAAACACAAGTGTTTATTAAAACGAGGAAAAACAA +TTGTTTCACGCTTTGATGTTGATGACTTGTATTCTAATGAAACATTTGAT +CTAAATCAATTCTTTCAAAGGCTTATTTCAAAAGCCGGAATGCATAAAAT +AACAAAAATGCGAATATCACCAAGCGAACAGATGTTCCAATTTGTATCAC +TAAATGAATTTAAAATAAAGGGCAACCGAGTACTCGAAAAAGTAGAACTA +GCTATTCTACAAAAGGTGATAATTATAGACAAAAATGACGAAGCTCAGAT +TAAAGAAATTTTGACAAAATTCCATGATGATCCTATAGAAGGAGGCCACA +CTGGTATTTCGCGAACCCAGTCAAAAATCAAAAGATTTTATTATTGGCCC +CAGATGACCAAGACAATCTCAAAGTATGTAAAGACTTGTTTGAAATGTCA +ACAAGCCAAAATTACAACACATACGAAAACTCCATTAACATTGATGCCAA +CGCCAGCAACAGCATTTGATACTGTTTTAATTGATACCATTGGTCCACTA +CCGAAATCGGAAGACGGAAATGAGTATGCAGTTACAATCATATGCGATCT +AACCAAGTTTTTAGTAACTATTCCAACACCAAATAAAAGTGCTAAAACAG +TTGCAAAGGCTATATTTGAATTATTTGTACTGAAGTACGGTCCAATGAAG +ACGTTCATTACAGATCAAGGTACGGAATACAAAAATTCACTTATGAATGA +ATTATGCAAATATATGCATATAGAAAATCTAACATCTAGCGCTCACCATC +ATCAAACTTTAGGAACAATAGAAAGAAGCCACCGAACTTTTAATGAATAT +ATACGTTCATACATATCGGTTAACAAAAGTGATTGGGACATTTGGTTACC +ATATTTCACTTATTGCTTCAATACAACACCCTCAATAGTCCATGACTATT +GCCCATACGAACTAGTATTTGGCAGACTACCCAGACAATTCAAAGATTTC +AGTAAGATAAACAAAATAGACCCAATATACAACTTAGACGACTACTCTAA +AGAGCTTAAATGCAGACTAGAATTGTCGTACAACAGAGCAAGAAGAATGT +TAGAAAAAGCAAAAGCGGATAGAAAATTAAGATATGATAGGAATACAAAT +AATTTCGAATTAAAAATAGGAGATAAAGTATTACTTAGAAAAGAAACAGG +TCATAAGTTAGATAAAAGATATGAAGGTCCTTATGACGTAGTAGATATAG +GAATAAATGACAATATAACCATTAAAACAGGAAGTAAGAAACAACAAATA +GTACATAAAGATAGGCTAAAAAAGCACAAATAGAATGAAAAAAAAAAAGG +GCAATCAATGCCAAACCTTTCATAATAAAACTTAAATAACGGCCTGATCA +GCCAAAACAATATAACAAAGACATAGACATAATCGAATTTTTATTAATTC +AAAATACATACATATTTTTTCTTTATTCATTTAAAAATTCTATATCATAA +ATAATGTTAATTCATTAAAAATAATATTTAAGTAATTTTTATTTTATAAT +GGTAATATAGTTGATAGAAAATAACTTCATTTCTTTACGTTATTTTAAAA +AAGAGGGGAGGTGTAGTATGTGCATATATCGAGGGTACACTGTACCTATA +AGTACACAGCAACACTTAGTTGCATTGCATAAATAAATGTCTCAAGTGAG +CGTGATATAAGATCACCCATTTATGCTTTAAGCTAAGTCAGCATCCCCAC +GCTGGCCGCTGGCCATATATGCGCATAAGCTCTCTCTCTCTCTCTCTTAT +ACATATATATATACGCTGCTCTTCTGCCGCTGTCGACGGCGGCGCAGTCG +CAGTATTTAGGTAAGATTAGACACTCTGTAGAGGTTAAGCGGGCAGAACC +GTTTCTGCTACTCGAAGAGATAAGAAGAAATAAAAAGGTGGCCTGACGGC +TGCACCCAACTGCAAGGAAAACACGTGTTCTCAATTGGTGGCATATATTG +GTTTATTACA