changeset 1:9f2665b32c45 draft

"planemo upload for repository https://github.com/jj-umn/tools-iuc/tree/arriba/tools/arriba commit 933ae7dfba10b1b31c30a90216d76cdad6dda685"
author jjohnson
date Fri, 08 Oct 2021 11:16:21 +0000
parents 5ebf2354cc9b
children 7420753b0671
files arriba.help arriba.xml arriba_download_reference.xml test-data/Aligned.out.sam
diffstat 4 files changed, 428 insertions(+), 353 deletions(-) [+]
line wrap: on
line diff
--- a/arriba.help	Thu Oct 07 11:47:02 2021 +0000
+++ /dev/null	Thu Jan 01 00:00:00 1970 +0000
@@ -1,191 +0,0 @@
-% arriba -h
-[2021-10-06T19:04:33] Launching Arriba 2.1.0
-
-Arriba gene fusion detector
----------------------------
-Version: 2.1.0
-
-Arriba is a fast tool to search for aberrant transcripts such as gene fusions.
-It is based on chimeric alignments found by the STAR RNA-Seq aligner.
-
-Usage: arriba [-c Chimeric.out.sam] -x Aligned.out.bam \
-              -g annotation.gtf -a assembly.fa [-b blacklists.tsv] [-k known_fusions.tsv] \
-              [-t tags.tsv] [-p protein_domains.gff3] [-d structural_variants_from_WGS.tsv] \
-              -o fusions.tsv [-O fusions.discarded.tsv] \
-              [OPTIONS]
-
- -c FILE  File in SAM/BAM/CRAM format with chimeric alignments as generated by STAR
-          (Chimeric.out.sam). This parameter is only required, if STAR was run with the
-          parameter '--chimOutType SeparateSAMold'. When STAR was run with the parameter
-          '--chimOutType WithinBAM', it suffices to pass the parameter -x to Arriba and -c
-          can be omitted.
-
- -x FILE  File in SAM/BAM/CRAM format with main alignments as generated by STAR
-          (Aligned.out.sam). Arriba extracts candidate reads from this file.
-
- -g FILE  GTF file with gene annotation. The file may be gzip-compressed.
-
- -G GTF_FEATURES  Comma-/space-separated list of names of GTF features.
-                  Default: gene_name=gene_name|gene_id gene_id=gene_id
-                  transcript_id=transcript_id feature_exon=exon feature_CDS=CDS
-
- -a FILE  FastA file with genome sequence (assembly). The file may be gzip-compressed. An
-          index with the file extension .fai must exist only if CRAM files are processed.
-
- -b FILE  File containing blacklisted events (recurrent artifacts and transcripts
-          observed in healthy tissue).
-
- -k FILE  File containing known/recurrent fusions. Some cancer entities are often
-          characterized by fusions between the same pair of genes. In order to boost
-          sensitivity, a list of known fusions can be supplied using this parameter. The list
-          must contain two columns with the names of the fused genes, separated by tabs.
-
- -o FILE  Output file with fusions that have passed all filters.
-
- -O FILE  Output file with fusions that were discarded due to filtering.
-
- -t FILE  Tab-separated file containing fusions to annotate with tags in the 'tags' column.
-          The first two columns specify the genes; the third column specifies the tag. The
-          file may be gzip-compressed.
-
- -p FILE  File in GFF3 format containing coordinates of the protein domains of genes. The
-          protein domains retained in a fusion are listed in the column
-          'retained_protein_domains'. The file may be gzip-compressed.
-
- -d FILE  Tab-separated file with coordinates of structural variants found using
-          whole-genome sequencing data. These coordinates serve to increase sensitivity
-          towards weakly expressed fusions and to eliminate fusions with low evidence.
-
- -D MAX_GENOMIC_BREAKPOINT_DISTANCE  When a file with genomic breakpoints obtained via
-                                     whole-genome sequencing is supplied via the -d
-                                     parameter, this parameter determines how far a
-                                     genomic breakpoint may be away from a
-                                     transcriptomic breakpoint to consider it as a
-                                     related event. For events inside genes, the
-                                     distance is added to the end of the gene; for
-                                     intergenic events, the distance threshold is
-                                     applied as is. Default: 100000
-
- -s STRANDEDNESS  Whether a strand-specific protocol was used for library preparation,
-                  and if so, the type of strandedness (auto/yes/no/reverse). When
-                  unstranded data is processed, the strand can sometimes be inferred from
-                  splice-patterns. But in unclear situations, stranded data helps
-                  resolve ambiguities. Default: auto
-
- -i CONTIGS  Comma-/space-separated list of interesting contigs. Fusions between genes
-             on other contigs are ignored. Contigs can be specified with or without the
-             prefix "chr". Asterisks (*) are treated as wild-cards.
-             Default: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y AC_* NC_*
-
- -v CONTIGS  Comma-/space-separated list of viral contigs. Asterisks (*) are treated as
-             wild-cards.
-             Default: AC_* NC_*
-
- -f FILTERS  Comma-/space-separated list of filters to disable. By default all filters are
-             enabled. Valid values: homologs, low_entropy, isoforms,
-             top_expressed_viral_contigs, viral_contigs, non_coding_neighbors,
-             mismatches, duplicates, no_genomic_support, genomic_support, intronic,
-             end_to_end, relative_support, low_coverage_viral_contigs,
-             merge_adjacent, mismappers, multimappers, same_gene, long_gap,
-             internal_tandem_duplication, small_insert_size, read_through,
-             inconsistently_clipped, uninteresting_contigs, intragenic_exonic,
-             spliced, hairpin, blacklist, min_support, select_best, in_vitro,
-             short_anchor, known_fusions, no_coverage, homopolymer, many_spliced
-
- -E MAX_E-VALUE  Arriba estimates the number of fusions with a given number of supporting
-                 reads which one would expect to see by random chance. If the expected number
-                 of fusions (e-value) is higher than this threshold, the fusion is
-                 discarded by the 'relative_support' filter. Note: Increasing this
-                 threshold can dramatically increase the number of false positives and may
-                 increase the runtime of resource-intensive steps. Fractional values are
-                 possible. Default: 0.300000
-
- -S MIN_SUPPORTING_READS  The 'min_support' filter discards all fusions with fewer than
-                          this many supporting reads (split reads and discordant mates
-                          combined). Default: 2
-
- -m MAX_MISMAPPERS  When more than this fraction of supporting reads turns out to be
-                    mismappers, the 'mismappers' filter discards the fusion. Default:
-                    0.800000
-
- -L MAX_HOMOLOG_IDENTITY  Genes with more than the given fraction of sequence identity are
-                          considered homologs and removed by the 'homologs' filter.
-                          Default: 0.300000
-
- -H HOMOPOLYMER_LENGTH  The 'homopolymer' filter removes breakpoints adjacent to
-                        homopolymers of the given length or more. Default: 6
-
- -R READ_THROUGH_DISTANCE  The 'read_through' filter removes read-through fusions
-                           where the breakpoints are less than the given distance away
-                           from each other. Default: 10000
-
- -A MIN_ANCHOR_LENGTH  Alignment artifacts are often characterized by split reads coming
-                       from only one gene and no discordant mates. Moreover, the split
-                       reads only align to a short stretch in one of the genes. The
-                       'short_anchor' filter removes these fusions. This parameter sets
-                       the threshold in bp for what the filter considers short. Default: 23
-
- -M MANY_SPLICED_EVENTS  The 'many_spliced' filter recovers fusions between genes that
-                         have at least this many spliced breakpoints. Default: 4
-
- -K MAX_KMER_CONTENT  The 'low_entropy' filter removes reads with repetitive 3-mers. If
-                      the 3-mers make up more than the given fraction of the sequence, then
-                      the read is discarded. Default: 0.600000
-
- -V MAX_MISMATCH_PVALUE  The 'mismatches' filter uses a binomial model to calculate a
-                         p-value for observing a given number of mismatches in a read. If
-                         the number of mismatches is too high, the read is discarded.
-                         Default: 0.010000
-
- -F FRAGMENT_LENGTH  When paired-end data is given, the fragment length is estimated
-                     automatically and this parameter has no effect. But when single-end
-                     data is given, the mean fragment length should be specified to
-                     effectively filter fusions that arise from hairpin structures.
-                     Default: 200
-
- -U MAX_READS  Subsample fusions with more than the given number of supporting reads. This
-               improves performance without compromising sensitivity, as long as the
-               threshold is high. Counting of supporting reads beyond the threshold is
-               inaccurate, obviously. Default: 300
-
- -Q QUANTILE  Highly expressed genes are prone to produce artifacts during library
-              preparation. Genes with an expression above the given quantile are eligible
-              for filtering by the 'in_vitro' filter. Default: 0.998000
-
- -e EXONIC_FRACTION  The breakpoints of false-positive predictions of intragenic events
-                     are often both in exons. True predictions are more likely to have at
-                     least one breakpoint in an intron, because introns are larger. If the
-                     fraction of exonic sequence between two breakpoints is smaller than
-                     the given fraction, the 'intragenic_exonic' filter discards the
-                     event. Default: 0.330000
-
- -T TOP_N  Only report viral integration sites of the top N most highly expressed viral
-           contigs. Default: 5
-
- -C COVERED_FRACTION  Ignore virally associated events if the virus is not fully
-                      expressed, i.e., less than the given fraction of the viral contig is
-                      transcribed. Default: 0.150000
-
- -l MAX_ITD_LENGTH  Maximum length of internal tandem duplications. Note: Increasing
-                    this value beyond the default can impair performance and lead to many
-                    false positives. Default: 100
-
- -u  Instead of performing duplicate marking itself, Arriba relies on duplicate marking by a
-     preceding program using the BAM_FDUP flag. This makes sense when unique molecular
-     identifiers (UMI) are used.
-
- -X  To reduce the runtime and file size, by default, the columns 'fusion_transcript',
-     'peptide_sequence', and 'read_identifiers' are left empty in the file containing
-     discarded fusion candidates (see parameter -O). When this flag is set, this extra
-     information is reported in the discarded fusions file.
-
- -I  If assembly of the fusion transcript sequence from the supporting reads is incomplete
-     (denoted as '...'), fill the gaps using the assembly sequence wherever possible.
-
- -h  Print help and exit.
-
-         Code repository: https://github.com/suhrig/arriba
-    Get help/report bugs: https://github.com/suhrig/arriba/issues
-             User manual: https://arriba.readthedocs.io/
-             Please cite: https://doi.org/10.1101/gr.257246.119
-
--- a/arriba.xml	Thu Oct 07 11:47:02 2021 +0000
+++ b/arriba.xml	Fri Oct 08 11:16:21 2021 +0000
@@ -6,31 +6,95 @@
     <expand macro="requirements" />
     <expand macro="version_command" />
     <command detect_errors="exit_code"><![CDATA[
+#if str($input_params.input_source) == "use_fastq"
+    #if $input_params.left_fq.is_of_type("fastq.gz"):
+        #set read1 = 'input_1.fastq.gz'
+    #else:
+        #set read1 = 'input_1.fastq'
+    #end if
+    ln -f -s '${input_params.left_fq}' ${read1} &&
+    #if $input_params.right_fq.is_of_type("fastq.gz"):
+        #set read2 = 'input_2.fastq.gz'
+    #else:
+        #set read2 = 'input_2.fastq'
+    #end if
+    ln -f -s '${input_params.right_fq}' ${read2} &&
+    STAR 
+    --runThreadN \${GALAXY_SLOTS:-1} 
+    --genomeDir /path/to/STAR_index 
+    --genomeLoad NoSharedMemory 
+    --readFilesIn $read1 $read2
+    --readFilesCommand zcat 
+    --outStd BAM_Unsorted 
+    --outSAMtype BAM Unsorted 
+    --outSAMunmapped Within 
+    --outBAMcompression 0 
+    --outFilterMultimapNmax 50 
+    --peOverlapNbasesMin 10 
+    --alignSplicedMateMapLminOverLmate 0.5 
+    --alignSJstitchMismatchNmax 5 -1 5 5 
+    --chimSegmentMin 10 
+    --chimOutType WithinBAM HardClip 
+    --chimJunctionOverhangMin 10 
+    --chimScoreDropMax 30 
+    --chimScoreJunctionNonGTAG 0 
+    --chimScoreSeparation 1 
+    --chimSegmentReadGapMax 3 
+    --chimMultimapNmax 50 
+    | tee Aligned.out.bam |
      arriba 
-    -x '$input'
-    #if $chimeric
-        -c '$chimeric'
-    #endif
+    -x '/dev/stdin'
+#else
+     arriba 
+    -x '$input_params.input'
+    #if $input_params.chimeric
+        -c '$input_params.chimeric'
+    #end if
+#end if
     -a '$genome_assembly'
     -g '$gtf'
-    -b '$blacklist'
+    #if '$blacklist'
+        -b '$blacklist'
+    #end if
     #if '$protein_domains'
         -p '$protein_domains'
-    #endif
+    #end if
     #if '$known_fusions'
         -k '$known_fusions'
-    #endif
+    #end if
     #if '$tags'
         -t '$tags'
-    #endif
+    #end if
     -o fusions.tsv
     -O fusions.discarded.tsv 
     ]]></command>
     <inputs>
-        <param name="input" argument="-x" type="data" format="sam,bam,cram" label="STAR Aligned.out.sam"/>
-        <param name="chimeric" argument="-c" type="data" format="sam,bam,cram" optional="true" label="STAR Chimeric.out.sam">
-            <help><![CDATA[ only required, if STAR was run with the parameter '--chimOutType SeparateSAMold' ]]></help>
-        </param>
+        <conditional name="input_params">
+            <param name="input_source"
+                   type="select"
+                   label="Use output from earlier STAR run or let Arriba running STAR">
+                <option value="use_star">Use output from earlier STAR</option>
+                <option value="use_fastq">Let Arriba control running STAR</option>
+            </param>
+            <when value="use_star">
+                <param name="input" argument="-x" type="data" format="sam,bam,cram" label="STAR Aligned.out.sam"/>
+                <param name="chimeric" argument="-c" type="data" format="sam,bam,cram" optional="true" label="STAR Chimeric.out.sam">
+                    <help><![CDATA[ only required, if STAR was run with the parameter '--chimOutType SeparateSAMold' ]]></help>
+                </param>
+            </when>
+            <when value="use_fastq">
+                <param name="left_fq"
+                       type="data"
+                       format="fastqsanger,fastqsanger.gz"
+                       argument="--left_fq"
+                       label="left.fq file"/>
+                <param name="right_fq"
+                       type="data"
+                       format="fastqsanger,fastqsanger.gz"
+                       argument="--right_fq"
+                       label="right.fq file"/>
+            </when>
+        </conditional>
         <param name="genome_assembly" argument="-a" type="data" format="fasta" label="genome assembly fasta"/>
         <param name="gtf" argument="-g" type="data" format="gtf" label="GTF file with gene annotation"/>
         <param name="blacklist" argument="-b" type="data" format="tabular" label="File containing blacklisted ranges."/>
@@ -45,197 +109,217 @@
         <data name="discarded" format="tabular" label="${tool.name} on ${on_string}: fusions.discarded.tsv" from_work_dir="fusions.discarded.tsv"/>
     </outputs>
     <help><![CDATA[
-
-arriba -h
-[2021-10-06T19:04:33] Launching Arriba 2.1.0
+** Arriba **
 
-Arriba gene fusion detector
----------------------------
-Version: 2.1.0
 
-Arriba is a fast tool to search for aberrant transcripts such as gene fusions.
+Arriba_ is a fast tool to search for aberrant transcripts such as gene fusions.
 It is based on chimeric alignments found by the STAR RNA-Seq aligner.
 
-Usage: arriba [-c Chimeric.out.sam] -x Aligned.out.bam \
-              -g annotation.gtf -a assembly.fa [-b blacklists.tsv] [-k known_fusions.tsv] \
-              [-t tags.tsv] [-p protein_domains.gff3] [-d structural_variants_from_WGS.tsv] \
-              -o fusions.tsv [-O fusions.discarded.tsv] \
-              [OPTIONS]
+
+** INPUTS_ **
+
+
+  - Alignments
+
+    Arriba takes the main output file of STAR (Aligned.out.bam) as input (parameter -x). If STAR was run with the parameter --chimOutType WithinBAM, then this file contains all the information needed by Arriba to find fusions. When STAR was run with the parameter --chimOutType SeparateSAMold, the main output file lacks chimeric alignments. Instead, STAR writes them to a separate output file named Chimeric.out.sam. In this case, the file needs to be passed to Arriba via the parameter -c in addition to the main output file Aligned.out.bam.
+
+    Arriba extracts three types of reads from the alignment file(s):
 
- -c FILE  File in SAM/BAM/CRAM format with chimeric alignments as generated by STAR
-          (Chimeric.out.sam). This parameter is only required, if STAR was run with the
-          parameter '--chimOutType SeparateSAMold'. When STAR was run with the parameter
-          '--chimOutType WithinBAM', it suffices to pass the parameter -x to Arriba and -c
-          can be omitted.
+      * Split-reads, i.e., reads composed of segments which map in a non-linear way. STAR stores such reads as supplementary alignments.
+      * Discordant mates, i.e., paired-end reads which originate from the same fragment but which align in a non-linear way.
+      * Alignments which cross the boundaries of annotated genes, because these alignments might arise from focal deletions. In RNA-Seq data deletions of up to several hundred kb are hard to distinguish from splicing. They are represented identically as gapped alignments, because the sizes of many introns are in fact of this order of magnitude. STAR applies a rather arbitrary measure to decide whether a gapped alignment arises from splicing or from a genomic deletion: The parameter --alignIntronMax determines what gap size is still assumed to be a splicing event and introns are used to represent these gaps. Only gaps larger than this limit are classified as potential evidence for genomic deletions and are stored as chimeric alignments. Most STAR-based fusion detection tools only consider chimeric alignments as evidence for gene fusions and are blind to focal deletions, hence. As a workaround, these tools recommend reducing the value of the parameter --alignIntronMax. But this impairs the quality of alignment, because it reduces the scope that STAR searches to find a spliced alignment. To avoid compromising the quality of alignment for the sake of fusion detection, the only solution would be to run STAR twice - once with settings optimized for regular alignment and once for fusion detection. This would double the runtime. In contrast, Arriba does not require to reduce the maximum intron size. It employs a more sensible criterion to distinguish splicing from deletions: Arriba considers all those reads as potential evidence for deletions that span the boundary of annotated genes.
+
+    The alignment files can be in SAM, BAM, and CRAM format. They need not be sorted for Arriba to accept them, but doing so comes with benefits: Often, this reduces the file size. And more importantly, the supporting reads of a fusion can be inspected visually using a genome browser like IGV, which typically requires BAM files to be sorted by coordinate.
 
- -x FILE  File in SAM/BAM/CRAM format with main alignments as generated by STAR
-          (Aligned.out.sam). Arriba extracts candidate reads from this file.
+    Single-end and paired-end data and even mixtures are supported. Arriba automatically determines the data type on a read-by-read basis using the flag BAM_FPAIRED.
+
 
- -g FILE  GTF file with gene annotation. The file may be gzip-compressed.
+  - Assembly
+
+    Arriba takes the assembly as input (parameter -a) to find mismatches between the chimeric reads and the reference genome, as well as to find alignment artifacts and homologous genes.
 
- -G GTF_FEATURES  Comma-/space-separated list of names of GTF features.
-                  Default: gene_name=gene_name|gene_id gene_id=gene_id
-                  transcript_id=transcript_id feature_exon=exon feature_CDS=CDS
+    The script download_references.sh can be used to download the assembly. The available assemblies are listed when the script is run without parameters. The user is not restricted to these assemblies, however. Any assembly can be used as long as its coordinates are compatible with one of the supported assemblies (hg19/hs37d5/GRCh37 or hg38/GRCh38 or mm10/GRCm38).
+
+    The assembly must be provided in FastA format and may be gzip-compressed. An index with the file extension .fai must exist only if CRAM files are processed.
 
- -a FILE  FastA file with genome sequence (assembly). The file may be gzip-compressed. An
-          index with the file extension .fai must exist only if CRAM files are processed.
+  - Annotation
 
- -b FILE  File containing blacklisted events (recurrent artifacts and transcripts
-          observed in healthy tissue).
+    The gene annotation (parameter -g) is used for multiple purposes:
 
- -k FILE  File containing known/recurrent fusions. Some cancer entities are often
-          characterized by fusions between the same pair of genes. In order to boost
-          sensitivity, a list of known fusions can be supplied using this parameter. The list
-          must contain two columns with the names of the fused genes, separated by tabs.
+    annotation of breakpoints with genes
+    increased sensitivity for breakpoints at splice-sites
+    calculation of transcriptomic distances
+    determining the putative orientation of fused genes (i.e., 5' and 3' end)
+    GENCODE annotation is recommended over RefSeq annotation, because the former has a more comprehensive annotation of transcripts and splice-sites, which boosts the sensitivity. The file must be provided in GTF format and may be gzip-compressed. It does not need to be sorted.
+
+    The script download_references.sh can be used to download the annotation. The available annotation files are listed when the script is run without parameters. The user is not restricted to these annotation files, however. Any annotation can be used as long as its coordinates are compatible with one of the supported assemblies (hg19/hs37d5/GRCh37 or hg38/GRCh38 or mm10/GRCm38).
 
- -o FILE  Output file with fusions that have passed all filters.
 
- -O FILE  Output file with fusions that were discarded due to filtering.
+  - Blacklist
 
- -t FILE  Tab-separated file containing fusions to annotate with tags in the 'tags' column.
-          The first two columns specify the genes; the third column specifies the tag. The
-          file may be gzip-compressed.
+    It is strongly advised to run Arriba with a blacklist (parameter -b). Otherwise, the false positive rate increases by an order of magnitude. For this reason, using Arriba with assemblies or organisms which are not officially supported is not recommended. At the moment, the supported assemblies are: hg19/hs37d5/GRCh37, hg38/GRCh38, and mm10/GRCm38 (as well as any other assemblies that have compatible coordinates). The blacklists are contained in the release tarballs of Arriba.
+
+    The blacklist removes recurrent alignment artifacts and transcripts which are present in healthy tissue. This helps eliminate frequently observed transcripts, such as read-through fusions between neighboring genes, circular RNAs and other non-canonically spliced transcripts. It was trained on RNA-Seq samples from the Human Protein Atlas, the Illumina Human BodyMap2 , the ENCODE project , the Roadmap Epigenomics project, and the NCT MASTER cohort, a heterogeneous cohort of cancer samples, from which highly recurrent artifacts were identified.
+
+    Blacklists for all supported assemblies are shipped with the download package of Arriba. They can be found in the package as database/blacklist_*.
 
- -p FILE  File in GFF3 format containing coordinates of the protein domains of genes. The
-          protein domains retained in a fusion are listed in the column
-          'retained_protein_domains'. The file may be gzip-compressed.
-
- -d FILE  Tab-separated file with coordinates of structural variants found using
-          whole-genome sequencing data. These coordinates serve to increase sensitivity
-          towards weakly expressed fusions and to eliminate fusions with low evidence.
+    The blacklist is a tab-separated file with two columns and may optionally be gzip-compressed. Lines starting with a hash (#) are treated as comments. Each line represents a pair of regions between which events are ignored. A region can be:
+      * a 1-based coordinate in the format CONTIG:POSITION, optionally prefixed with the strand (example: +9:56743754). If CONTIG ends on an asterisk (*), the contig with the closest matching name is chosen. 
+      * a range in the format CONTIG:START-END, optionally prefixed with a strand (example: 9:1000000-1100000). 
+      * the name of a gene given in the provided annotation.  
 
- -D MAX_GENOMIC_BREAKPOINT_DISTANCE  When a file with genomic breakpoints obtained via
-                                     whole-genome sequencing is supplied via the -d
-                                     parameter, this parameter determines how far a
-                                     genomic breakpoint may be away from a
-                                     transcriptomic breakpoint to consider it as a
-                                     related event. For events inside genes, the
-                                     distance is added to the end of the gene; for
-                                     intergenic events, the distance threshold is
-                                     applied as is. Default: 100000
+    In addition, special keywords are allowed for the second column:
+      * any: Discard all events if one of the breakpoints matches the given region.
+      * split_read_donor: Discard fusions only supported by split reads, if all of them have their anchor in the gene given in the first column. This filter is useful for highly mutable loci, which frequently trigger clipped alignments, such as the immunoglobulin loci or the T-cell receptor loci.
+      * split_read_acceptor: Discard events only supported by split reads, if all of them have their clipped segment in the given region.
+      * split_read_any: Discard events only supported by split reads, regardless of where the anchor is.
+      * discordant_mates: Discard fusions, if they are only supported by discordant mates (no split reads).
+      * low_support: Discard events, which have few supporting reads relative to expression (as determined by the filter relative_support), even if there is other evidence that the fusion might be a true positive, nonetheless. This keyword effectively prevents recovery of speculative events by filters such as spliced or many_spliced.
+      * filter_spliced: This keyword prevents the filter spliced from being applied to a given region. It is triggered under the same circumstances as the keyword low_support, but additionally requires that the breakpoints be at splice-sites for the event to be discarded. Some breakpoints produce recurrent artifacts, but the second breakpoint is always a different one, such that the pair of breakpoints is not recurrent and cannot be blacklisted. Often, such breakpoints are at splice-sites and the filter spliced tends to recover them. This keyword prevents the filter from doing so.
+      * not_both_spliced: This keyword discards events, unless both breakpoints are at splice-sites. This is a strict blacklist criterion, which makes sense to apply to genes which are prone to produce artifacts, because they are highly expressed, for example hemoglobins, collagens, or ribosomal genes.
+      * read_through: This keyword discards events, if they could arise from read-through transcription, i.e., the supporting reads are oriented like a deletion and are at most 400 kb apart.
+
+
+  - Known fusions
+
+    Arriba can be instructed to be particularly sensitive towards events between certain gene pairs by supplying a list of gene pairs (parameter -k). A number of filters are not applied to these gene pairs. This is useful to improve the detection rate of expected or highly relevant events, such as recurrent fusions. Occassionally, this leads to false positive calls. But if high sensitivity is more important than specificity, this might be acceptable. Events which would be discarded by a filter and were recovered due to being listed in the known fusions list are usually assigned a low confidence.
+
+    Known fusions files for all supported assemblies are shipped with the download package of Arriba. They can be found in the package as database/known_fusions_*.
+
+    The file has two columns separated by a tab and may optionally be gzip-compressed. Lines starting with a hash (#) are treated as comments. Each line represents a pair of regions to which very sensitive filtering thresholds are applied. A region can be:
 
- -s STRANDEDNESS  Whether a strand-specific protocol was used for library preparation,
-                  and if so, the type of strandedness (auto/yes/no/reverse). When
-                  unstranded data is processed, the strand can sometimes be inferred from
-                  splice-patterns. But in unclear situations, stranded data helps
-                  resolve ambiguities. Default: auto
+       * a 1-based coordinate in the format CONTIG:POSITION, optionally prefixed with the strand (example: +9:56743754). If CONTIG ends on an asterisk (*), the contig with the closest matching name is chosen.
+       * a range in the format CONTIG:START-END, optionally prefixed with a strand (example: 9:1000000-1100000).
+       * the name of a gene given in the provided annotation.
+
+    The order of the given regions is important. The region given in the first column is assumed to denote the 5' end of the fusion and the region in the second column to be the 3' end. If Arriba cannot determine with confidence which gene constitutes the 5' and which the 3' end of a fusion prediction, then the order is ignored and the prediction is rescued in both cases.
 
- -i CONTIGS  Comma-/space-separated list of interesting contigs. Fusions between genes
-             on other contigs are ignored. Contigs can be specified with or without the
-             prefix "chr". Asterisks (*) are treated as wild-cards.
-             Default: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y AC_* NC_*
+
+  - Tags
+
+    Arriba can be supplied with a list of user-defined tags using the parameter -t. Whenever a fusion prediction matches the selection criteria for a tag, the column tags is populated with the respective tag. This feature is useful to annotate known oncogenic fusions, for example.
+
+    The known fusions file shipped with the download package of Arriba can be used for both known fusions and tags. It is constructed in a way that it can be passed as arguments to the parameters -k and -t alike. The former only uses the first two columns, the latter uses all three columns. If a user wants to separate filtering of known fusions and tagging of interesting fusions, different files may be used, however.
 
- -v CONTIGS  Comma-/space-separated list of viral contigs. Asterisks (*) are treated as
-             wild-cards.
-             Default: AC_* NC_*
+    The file has three columns separated by a tab and may optionally be gzip-compressed. Lines starting with a hash (#) are treated as comments. Each line represents a pair of regions to be annotated. The first two columns specify the regions to be annotated; the third column the tag that is used for annotation. Some special characters in the tag are replaced with underscores (_) in Arriba's output file. A region can be:
+
+      * a 1-based coordinate in the format CONTIG:POSITION, optionally prefixed with the strand (example: +9:56743754).
+      * a range in the format CONTIG:START-END, optionally prefixed with a strand (example: 9:1000000-1100000).
+      * the name of a gene given in the provided annotation.
 
- -f FILTERS  Comma-/space-separated list of filters to disable. By default all filters are
-             enabled. Valid values: homologs, low_entropy, isoforms,
-             top_expressed_viral_contigs, viral_contigs, non_coding_neighbors,
-             mismatches, duplicates, no_genomic_support, genomic_support, intronic,
-             end_to_end, relative_support, low_coverage_viral_contigs,
-             merge_adjacent, mismappers, multimappers, same_gene, long_gap,
-             internal_tandem_duplication, small_insert_size, read_through,
-             inconsistently_clipped, uninteresting_contigs, intragenic_exonic,
-             spliced, hairpin, blacklist, min_support, select_best, in_vitro,
-             short_anchor, known_fusions, no_coverage, homopolymer, many_spliced
+    The order of the given regions is important. The region given in the first column is assumed to denote the 5' end of the fusion and the region in the second column to be the 3' end.
+
+  - Protein domains
+
+    Protein domain annotation can be passed to Arriba via the parameter -p. The column retained_protein_domains of Arriba's output file is then populated accordingly.
+
+    Protein domain annotation files for all supported assemblies are shipped with the download package of Arriba. They can be found in the package as database/protein_domains_*.
 
- -E MAX_E-VALUE  Arriba estimates the number of fusions with a given number of supporting
-                 reads which one would expect to see by random chance. If the expected number
-                 of fusions (e-value) is higher than this threshold, the fusion is
-                 discarded by the 'relative_support' filter. Note: Increasing this
-                 threshold can dramatically increase the number of false positives and may
-                 increase the runtime of resource-intensive steps. Fractional values are
-                 possible. Default: 0.300000
+    The file must be in GFF3 format and may optionally be gzip-compressed. The ninth column must at least contain the following attributes:
+      * Name=PROTEIN_DOMAIN_NAME;
+      * gene_id=GENE_ID;
+      * gene_name=GENE_NAME
+    The attribute Name is reported in the column retained_protein_domains of Arriba's output file. Some special characters in the name are replaced with underscores (_). The columns gene_id and gene_name are used to match the protein domains to the genes given in the gene annotation. If a match cannot be found, Arriba cannot determine the retained protein domains of the respective gene and a warning is issued. There may be many warnings if RefSeq annotation is used, because the protein domains file distributed with Arriba uses ENSEMBL gene names/IDs.
+
+  - Structural variant calls from WGS
+
+    If whole-genome sequencing (WGS) data is available, the sensitivity and specificity of Arriba can be improved by passing a list of structural variants detected from WGS to Arriba via the parameter -d. This has the following effects:
 
- -S MIN_SUPPORTING_READS  The 'min_support' filter discards all fusions with fewer than
-                          this many supporting reads (split reads and discordant mates
-                          combined). Default: 2
+    Certain filters are overruled or run with extra sensitive settings, when an event is confirmed by WGS data.
+    To reduce the false positive rate, Arriba does not report low-confidence events unless they can be matched with a structural variant found in the WGS data.
+    Both of these behaviors can be disabled by disabling the filters genomic_support and no_genomic_support, respectively. Providing Arriba with a list of structural variant calls then does not influence the calls, but it still has the benefit of filling the columns closest_genomic_breakpoint1 and closest_genomic_breakpoint2 with the breakpoints of the structural variant which is closest to a fusion. If the structural variant calls were obtained from whole-exome sequencing (WES) data rather than WGS data, the filter no_genomic_support should be disabled, since WES has poor coverage in most regions of the genome, such that many structural variants are missed.
+
+    Two file formats are accepted: a simple four-column format and the standard Variant Call Format (VCF). The format is detected automatically.
 
- -m MAX_MISMAPPERS  When more than this fraction of supporting reads turns out to be
-                    mismappers, the 'mismappers' filter discards the fusion. Default:
-                    0.800000
+      * In case of the simple format, the file must contain four columns separated by tabs. The first two columns contain the breakpoints of the structural variants in the format CONTIG:POSITION. The last two columns contain the orientation of the breakpoints. The accepted values are:
 
- -L MAX_HOMOLOG_IDENTITY  Genes with more than the given fraction of sequence identity are
-                          considered homologs and removed by the 'homologs' filter.
-                          Default: 0.300000
+        + downstream or +: the fusion partner is fused downstream of the breakpoint, i.e., at a coordinate higher than the breakpoint
+        + upstream or -: the fusion partner is fused at a coordinate lower than the breakpoint
+    
+        Example:
+
+        ::
 
- -H HOMOPOLYMER_LENGTH  The 'homopolymer' filter removes breakpoints adjacent to
-                        homopolymers of the given length or more. Default: 6
+          =========== =========== =========== ===========
+          5-prime     3-prime     orientation orientation
+          =========== =========== =========== ===========
+          1:54420491  6:9248349   +           -
+          20:46703288 20:46734546 -           +
+          17:61499820 20:45133874 +           +
+          3:190967119 7:77868317  -           -
+          =========== =========== =========== ===========
 
- -R READ_THROUGH_DISTANCE  The 'read_through' filter removes read-through fusions
-                           where the breakpoints are less than the given distance away
-                           from each other. Default: 10000
+
+      * In case of the Variant Call Format, the file must comply with the VCF specification for structural variants. In particular, Arriba requires that the SVTYPE field be present in the INFO column and specify one of the four values BND, DEL, DUP, INV. In addition, for all SVTYPEs other than BND, the END field must be present and specify the second breakpoint of the structural variant. Structural variants with single breakends are silently ignored.
 
- -A MIN_ANCHOR_LENGTH  Alignment artifacts are often characterized by split reads coming
-                       from only one gene and no discordant mates. Moreover, the split
-                       reads only align to a short stretch in one of the genes. The
-                       'short_anchor' filter removes these fusions. This parameter sets
-                       the threshold in bp for what the filter considers short. Default: 23
+        Arriba checks if the orientation of the structural variant matches that of a fusion detected in the RNA-Seq data. If, for example, Arriba predicts the 5' end of a gene to be retained in a fusion, then a structural variant is expected to confirm this, or else the variant is not considered to be related.
 
- -M MANY_SPLICED_EVENTS  The 'many_spliced' filter recovers fusions between genes that
-                         have at least this many spliced breakpoints. Default: 4
+    NOTE: Arriba was designed for alignments from RNA-Seq data. It should not be run on WGS data directly. Many assumptions made by Arriba about the data (statistical models, blacklist, etc.) only apply to RNA-Seq data and are not valid for DNA-Seq data. For such data, a structural variant calling algorithm should be used and the results should be passed to Arriba.
+
+
+** OUTPUTS_ **
 
- -K MAX_KMER_CONTENT  The 'low_entropy' filter removes reads with repetitive 3-mers. If
-                      the 3-mers make up more than the given fraction of the sequence, then
-                      the read is discarded. Default: 0.600000
+  - fusions.tsv
+
+    The file fusions.tsv (as specified by the parameter -o) contains fusions which pass all of Arriba's filters. It should be highly enriched for true predictions. The predictions are listed from highest to lowest confidence. The following paragraphs describe the columns in detail:
 
- -V MAX_MISMATCH_PVALUE  The 'mismatches' filter uses a binomial model to calculate a
-                         p-value for observing a given number of mismatches in a read. If
-                         the number of mismatches is too high, the read is discarded.
-                         Default: 0.010000
+      * gene1 and gene2 : gene1 contains the gene which makes up the 5' end of the transcript and gene2 the gene which makes up the 3' end. The order is predicted on the basis of the strands that the supporting reads map to, how the reads are oriented, and splice patterns. Both columns may contain the same gene, if the event is intragenic. If a breakpoint is in an intergenic region, Arriba lists the closest genes upstream and downstream from the breakpoint, separated by a comma. The numbers in parentheses after the closest genes state the distance to the genes. If no genes are annotated for a contig (e.g., for viral genomes), the column contains a dot (.).
+
+      * strand1(gene/fusion) and strand2(gene/fusion) : Each of these columns contains two values seperated by a slash. The strand before the slash reflects the strand of the gene according to the gene annotation supplied to Arriba via the parameter -g. If the breakpoint is in an intergenic region, the value is .. The value after the slash reflects the strand that is transcribed. This does not necessarily match the strand of the gene, namely when the sense strand of a gene serves as the template for transcription. Occassionally, the strand that is transcribed cannot be predicted reliably. In this case, Arriba indicates the lack of information as a dot (.). Arriba uses splice-patterns of the alignments to assign a read to the appropriate originating gene. If a strand-specific library was used, Arriba also evaluates the strandedness in ambiguous situations, for example, when none of the supporting reads overlaps a splice-site.
 
- -F FRAGMENT_LENGTH  When paired-end data is given, the fragment length is estimated
-                     automatically and this parameter has no effect. But when single-end
-                     data is given, the mean fragment length should be specified to
-                     effectively filter fusions that arise from hairpin structures.
-                     Default: 200
+      * breakpoint1 and breakpoint2 : The columns contain the coordinates of the breakpoints in gene1 and gene2, respectively. If an event is not supported by any split reads but only by discordant mates, the coordinates given here are those of the discordant mates which are closest to the true but unknown breakpoint.
+
+      * site1 and site2 : These columns add information about the location of the breakpoints. Possible values are: 5' UTR, 3' UTR, UTR (overlapping with a 5' UTR as well as a 3' UTR), CDS (coding sequence), exon, intron, and intergenic. The keyword exon is used for non-coding genes or for ambiguous situations where the breakpoint overlaps with both a coding exon and a UTR. If the breakpoint coincides with an exon boundary, the additional keyword splice-site is appended.
+
+      * type : Based on the orientation of the supporting reads and the coordinates of breakpoints, the type of event can be inferred. Possible values are: translocation (between different chromosomes), duplication, inversion, and deletion. If genes are fused head-to-head or tail-to-tail, this is indicated as 5'-5' or 3'-3' respectively. Genes fused in such an orientation cannot yield a chimeric protein, since one of the genes is transcribed from the wrong strand. This type of event is equivalent to the truncation of the genes. The following types of events are flagged with an extra keyword, because they are frequent types of false positives and/or it is not clear if they are somatic or germline variants: Deletions with a size in the range of introns (<400kb) are flagged as read-through, because there is a high chance that the fusion arises from read-through transcription rather than an underlying genomic deletion. Intragenic duplications with both breakpoints at splice-sites are flagged as non-canonical-splicing, because the supporting reads might originate from circular RNAs, which are very abundant even in normal tissue, but manifest as duplications in RNA-Seq data. Internal tandem duplications are flagged as ITD. It is not always clear whether the ITDs observable in RNA-Seq data are somatic or germline variants, because ITDs are abundant in the germline and germline variants cannot be filtered effectively due to lack of a normal control.
 
- -U MAX_READS  Subsample fusions with more than the given number of supporting reads. This
-               improves performance without compromising sensitivity, as long as the
-               threshold is high. Counting of supporting reads beyond the threshold is
-               inaccurate, obviously. Default: 300
+      * split_reads1 and split_reads2 : The number of supporting split fragments with an anchor in gene1 or gene2, respectively, is given in these columns. The gene to which the longer segment of the split read aligns is defined as the anchor.
+
+      * discordant_mates : This column contains the number of pairs (fragments) of discordant mates (a.k.a. spanning reads or bridge reads) supporting the fusion.
+
+      * coverage1 and coverage2 : These two columns show the coverage near breakpoint1 and breakpoint2, respectively. The coverage is calculated as the number of fragments near the breakpoint on the side of the breakpoint that is retained in the fusion transcript. Note that the coverage calculation counts all fragments (even duplicates), whereas the columns split_reads1, split_reads2, and discordant_mates only count non-discarded reads. Fragments discarded due to being duplicates or other types of artifacts can be found in the column filters.
 
- -Q QUANTILE  Highly expressed genes are prone to produce artifacts during library
-              preparation. Genes with an expression above the given quantile are eligible
-              for filtering by the 'in_vitro' filter. Default: 0.998000
+      * confidence : Each prediction is assigned one of the confidences low, medium, or high. Several characteristics are taken into account, including: the number of supporting reads, the balance of split reads and discordant mates, the distance between the breakpoints, the type of event, whether the breakpoints are intragenic or not, and whether there are other events which corroborate the prediction, e.g. multiple isoforms or balanced translocations. See section Interpretation of results for further advice on judging the credibility of predictions.
+
+      * reading_frame : This column states whether the gene at the 3' end of the fusion is fused in-frame or out-of-frame. The value stop-codon indicates that there is a stop codon prior to the fusion junction, such that the 3' end is not translated, even if the reading frame is preserved across the junction. The prediction of the reading frame builds on the prediction of the peptide sequence. A dot (.) indicates that the peptide sequence cannot be predicted, for example, because the transcript sequence could not be determined or because the breakpoint of the 5' gene does not overlap a coding region.
+
+      * tags : When a user-defined list of tags is provided via the parameter -t, this column is populated with the provided tag whenever a fusion matches the coordinates specified for the respective tag. When multiple tags match, they are separated by a comma.
 
- -e EXONIC_FRACTION  The breakpoints of false-positive predictions of intragenic events
-                     are often both in exons. True predictions are more likely to have at
-                     least one breakpoint in an intron, because introns are larger. If the
-                     fraction of exonic sequence between two breakpoints is smaller than
-                     the given fraction, the 'intragenic_exonic' filter discards the
-                     event. Default: 0.330000
+      * retained_protein_domains : If Arriba is provided with protein domain annotation using the parameter -p, then this column is populated with protein domains retained in the fusion. Multiple protein domains are separated by a comma. Redundant protein domains are only listed once. After every domain the fraction that is retained is stated as a percentage value in parentheses. The protein domains of the 5' and 3' genes are separated by a pipe symbol (|).
+
+      * closest_genomic_breakpoint1 and closest_genomic_breakpoint2 : When a matched whole-genome sequencing sample is available, one can feed structural variant calls obtained therefrom into Arriba (see parameter -d). Arriba then considers this information during fusion calling, which improves the overall accuracy. These two columns contain the coordinates of the genomic breakpoints which are closest to the transcriptomic breakpoints given in the columns breakpoint1 and breakpoint2. The values in parentheses are the distances between transcriptomic and genomic breakpoints.
 
- -T TOP_N  Only report viral integration sites of the top N most highly expressed viral
-           contigs. Default: 5
+      * gene_id1 and gene_id2 : These two columns state the identifiers of the fused genes as given in the gene_id attribute in the GTF file.
+
+      * transcript_id1 and transcript_id2 : For both fused genes, Arriba determines the best matching isoform that is transcribed as part of the fusion. The isoform is selected by how well its annotated exons match the splice pattern of the supporting reads of a fusion.
 
- -C COVERED_FRACTION  Ignore virally associated events if the virus is not fully
-                      expressed, i.e., less than the given fraction of the viral contig is
-                      transcribed. Default: 0.150000
+      * direction1 and direction2 : These columns indicate the orientation of the fusion. A value of downstream means that the partner is fused downstream of the breakpoint, i.e. at a coordinate higher than the breakpoint. A value of upstream means the partner is fused at a coordinate lower than the breakpoint. When the prediction of the strands or of the 5' gene fails, this information gives insight into which parts of the fused genes are retained in the fusion.
+
+      * filters : This column lists the filters which removed one or more of the supporting reads. The section Internal algorithm describes all filters in detail. The number of filtered reads is given in parentheses after the name of the filter. The total number of supporting reads can be obtained by summing up the reads given in the columns split_reads1, split_reads2, discordant_mates, and filters. If a filter discarded the event as a whole (all reads), the number of filtered reads is not stated.
+
+      * fusion_transcript : This column contains the fusion transcript sequence. The sequence is assembled from the supporting reads of the most highly expressed transcript. It represents the transcript isoform that is most likely expressed according to the splice patterns of the supporting reads. The column contains a dot (.), when the sequence could not be predicted. This is the case when the strands or the 5' end of the transcript could not be predicted reliably. The breakpoint is represented as a pipe symbol (|). When non-template bases are inserted between the fused genes, these bases are represented as lowercase letters between two pipes. Reference mismatches (SNPs or SNVs) are indicated as lowercase letters, insertions as bases between brackets ([ and ]), deleted bases as one or more dashes (-), introns as three underscores (___), and ambiguous positions, such as positions with diverse reference mismatches, are represented as ?. Missing information due to insufficient coverage is denoted as an ellipsis (...). If the switch -I is used, then an attempt is made to fill missing information with the assembly sequence. A sequence stretch that was taken from the assembly sequence rather than the supporting reads is wrapped in parentheses (( and )). In addition, when -I is used, the sequence is trimmed to the boundaries of the fused transcripts. The coordinate of the fusion breakpoint relative to the start of the transcript can thus easily be inferred by counting the bases from the beginning of the fusion transcript to the breakpoint character (|). In case the full sequence could be constructed from the combined information of supporting reads and assembly sequence, the start of the fusion transcript is marked by a caret sign (^) and the end by a dollar sign ($). If the full sequence could not be constructed, these signs are missing.
 
- -l MAX_ITD_LENGTH  Maximum length of internal tandem duplications. Note: Increasing
-                    this value beyond the default can impair performance and lead to many
-                    false positives. Default: 100
+      * peptide_sequence : This column contains the fusion peptide sequence. The sequence is translated from the fusion transcript given in the column fusion_transcript and determines the reading frame of the fused genes according to the transcript isoforms given in the columns transcript_id1 and transcript_id2. Translation starts at the start of the assembled fusion transcript or when the start codon is encountered in the 5' gene. Translation ends when either the end of the assembled fusion transcript is reached or when a stop codon is encountered. If the fusion transcript contains an ellipsis (...), the sequence beyond the ellipsis is trimmed before translation, because the reading frame cannot be determined reliably. The column contains a dot (.), when the transcript sequence could not be predicted or when the precise breakpoints are unknown due to lack of split reads or when the fusion transcript does not overlap any coding exons in the 5' gene or when no start codon could be found in the 5' gene or when there is a stop codon prior to the fusion junction (in which case the column reading_frame contains the value stop-codon). The breakpoint is represented as a pipe symbol (|). If a codon spans the breakpoint, the amino acid is placed on the side of the breakpoint where two of the three bases reside. Codons resulting from non-template bases are flanked by two pipes. Amino acids are written as lowercase characters in the following situations: non-silent SNVs/SNPs, insertions, frameshifts, codons spanning the breakpoint, non-coding regions (introns/intergenic regions/UTRs), and non-template bases. Codons which cannot be translated to amino acids, such as those having invalid characters, are represented as ?.
+
+      * read_identifiers : This column contains the names of the supporting reads separated by commas.
 
- -u  Instead of performing duplicate marking itself, Arriba relies on duplicate marking by a
-     preceding program using the BAM_FDUP flag. This makes sense when unique molecular
-     identifiers (UMI) are used.
+  - fusions.discarded.tsv
+
+    The file fusions.discarded.tsv (as specified by the parameter -O) contains all events that Arriba classified as an artifact or that are also observed in healthy tissue. It has the same format as the file fusions.tsv. 
 
- -X  To reduce the runtime and file size, by default, the columns 'fusion_transcript',
-     'peptide_sequence', and 'read_identifiers' are left empty in the file containing
-     discarded fusion candidates (see parameter -O). When this flag is set, this extra
-     information is reported in the discarded fusions file.
+
+
+
 
- -I  If assembly of the fusion transcript sequence from the supporting reads is incomplete
-     (denoted as '...'), fill the gaps using the assembly sequence wherever possible.
-
- -h  Print help and exit.
+Code repository: https://github.com/suhrig/arriba
+Get help/report bugs: https://github.com/suhrig/arriba/issues
+User manual: https://arriba.readthedocs.io/
+Please cite: https://doi.org/10.1101/gr.257246.119
 
-         Code repository: https://github.com/suhrig/arriba
-    Get help/report bugs: https://github.com/suhrig/arriba/issues
-             User manual: https://arriba.readthedocs.io/
-             Please cite: https://doi.org/10.1101/gr.257246.119
+
+.. _Arriba: https://arriba.readthedocs.io/en/latest/
+.. _INPUTS: https://arriba.readthedocs.io/en/latest/input-files/
+.. _OUTPUTS: https://arriba.readthedocs.io/en/latest/output-files/
 
     ]]></help>
     <expand macro="citations" />
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/arriba_download_reference.xml	Fri Oct 08 11:16:21 2021 +0000
@@ -0,0 +1,71 @@
+<tool id="arriba_download_reference" name="Arriba Download Reference" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" python_template_version="3.5">
+    <description></description>
+    <macros>
+        <import>macros.xml</import>
+    </macros>
+    <expand macro="requirements" />
+    <expand macro="version_command" />
+    <command detect_errors="exit_code"><![CDATA[
+    echo $arriba_reference_name > '$arriba_reference'
+    mkdir -p '$arriba_reference.files_path' &&
+    cd '$arriba_reference.files_path' &&
+    BASE_DIR=$(dirname $(dirname `which arriba`)) &&
+    REF_SCRIPT=`find $BASE_DIR -name 'download_references.sh'` &&
+    $REF_SCRIPT '$arriba_reference_name'
+    ]]></command>
+    <inputs>
+        <param name="arriba_reference_name" type="select" label="Select reference">
+            <option value="GRCh37+ENSEMBL87">GRCh37+ENSEMBL87</option>
+            <option value="GRCh37+GENCODE19">GRCh37+GENCODE19</option>
+            <option value="GRCh37+RefSeq">GRCh37+RefSeq</option>
+            <option value="GRCh37viral+ENSEMBL87">GRCh37viral+ENSEMBL87</option>
+            <option value="GRCh37viral+GENCODE19">GRCh37viral+GENCODE19</option>
+            <option value="GRCh37viral+RefSeq">GRCh37viral+RefSeq</option>
+            <option value="GRCh38+ENSEMBL93">GRCh38+ENSEMBL93</option>
+            <option value="GRCh38+GENCODE28">GRCh38+GENCODE28</option>
+            <option value="GRCh38+RefSeq">GRCh38+RefSeq</option>
+            <option value="GRCh38viral+ENSEMBL93">GRCh38viral+ENSEMBL93</option>
+            <option value="GRCh38viral+GENCODE28">GRCh38viral+GENCODE28</option>
+            <option value="GRCh38viral+RefSeq">GRCh38viral+RefSeq</option>
+            <option value="GRCm38+GENCODEM25">GRCm38+GENCODEM25</option>
+            <option value="GRCm38+RefSeq">GRCm38+RefSeq</option>
+            <option value="GRCm38viral+GENCODEM25">GRCm38viral+GENCODEM25</option>
+            <option value="GRCm38viral+RefSeq">GRCm38viral+RefSeq</option>
+            <option value="hg19+ENSEMBL87">hg19+ENSEMBL87</option>
+            <option value="hg19+GENCODE19">hg19+GENCODE19</option>
+            <option value="hg19+RefSeq">hg19+RefSeq</option>
+            <option value="hg19viral+ENSEMBL87">hg19viral+ENSEMBL87</option>
+            <option value="hg19viral+GENCODE19">hg19viral+GENCODE19</option>
+            <option value="hg19viral+RefSeq">hg19viral+RefSeq</option>
+            <option value="hg38+ENSEMBL93">hg38+ENSEMBL93</option>
+            <option value="hg38+GENCODE28">hg38+GENCODE28</option>
+            <option value="hg38+RefSeq">hg38+RefSeq</option>
+            <option value="hg38viral+ENSEMBL93">hg38viral+ENSEMBL93</option>
+            <option value="hg38viral+GENCODE28">hg38viral+GENCODE28</option>
+            <option value="hg38viral+RefSeq">hg38viral+RefSeq</option>
+            <option value="hs37d5+ENSEMBL87">hs37d5+ENSEMBL87</option>
+            <option value="hs37d5+GENCODE19">hs37d5+GENCODE19</option>
+            <option value="hs37d5+RefSeq">hs37d5+RefSeq</option>
+            <option value="hs37d5viral+ENSEMBL87">hs37d5viral+ENSEMBL87</option>
+            <option value="hs37d5viral+GENCODE19">hs37d5viral+GENCODE19</option>
+            <option value="hs37d5viral+RefSeq">hs37d5viral+RefSeq</option>
+            <option value="mm10+GENCODEM25">mm10+GENCODEM25</option>
+            <option value="mm10+RefSeq">mm10+RefSeq</option>
+            <option value="mm10viral+GENCODEM25">mm10viral+GENCODEM25</option>
+            <option value="mm10viral+RefSeq">mm10viral+RefSeq</option>
+        </param>
+    </inputs>
+    <outputs>
+        <data name="arriba_reference" format="txt" label="$arriba_reference_name"/>
+    </outputs>
+    <help><![CDATA[
+** Arriba **
+
+Arriba_ is a fast tool to search for aberrant transcripts such as gene fusions.
+It is based on chimeric alignments found by the STAR RNA-Seq aligner.
+
+.. _Arriba: https://arriba.readthedocs.io/en/latest/
+
+]]></help>
+    <expand macro="citations" />
+</tool>
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/Aligned.out.sam	Fri Oct 08 11:16:21 2021 +0000
@@ -0,0 +1,111 @@
+@HD	VN:1.4	SO:coordinate
+@SQ	SN:1	LN:248956422
+@SQ	SN:2	LN:242193529
+@SQ	SN:3	LN:198295559
+@SQ	SN:4	LN:190214555
+@SQ	SN:5	LN:181538259
+@SQ	SN:6	LN:170805979
+@SQ	SN:7	LN:159345973
+@SQ	SN:8	LN:145138636
+@SQ	SN:9	LN:138394717
+@SQ	SN:10	LN:133797422
+@SQ	SN:11	LN:135086622
+@SQ	SN:12	LN:133275309
+@SQ	SN:13	LN:114364328
+@SQ	SN:14	LN:107043718
+@SQ	SN:15	LN:101991189
+@SQ	SN:16	LN:90338345
+@SQ	SN:17	LN:83257441
+@SQ	SN:18	LN:80373285
+@SQ	SN:19	LN:58617616
+@SQ	SN:20	LN:64444167
+@SQ	SN:21	LN:46709983
+@SQ	SN:22	LN:50818468
+@SQ	SN:X	LN:156040895
+@SQ	SN:Y	LN:57227415
+@SQ	SN:MT	LN:16569
+@PG	ID:STAR	PN:STAR	VN:2.7.8a	CL:STAR   --runThreadN 12   --genomeDir /panfs/roc/website/galaxy.msi.umn.edu/galaxy/tool-data/rnastar/2.7.4a/GRCh38_canon/GRCh38_canon/dataset_1367616_files   --genomeLoad NoSharedMemory   --readFilesIn /panfs/roc/galaxy/PRODUCTION/database/files/001/368/dataset_1368710.dat   /panfs/roc/galaxy/PRODUCTION/database/files/001/368/dataset_1368711.dat      --readFilesCommand zcat      --limitBAMsortRAM 122880000000   --outSAMtype BAM   SortedByCoordinate      --outSAMstrandField intronMotif   --outSAMattributes NH   HI   AS   nM   ch      --outSAMunmapped Within      --outSAMprimaryFlag OneBestScore   --outSAMmapqUnique 60   --outBAMsortingThreadN 12   --outBAMsortingBinsN 50   --outSAMattrIHstart 1   --winAnchorMultimapNmax 50   --chimSegmentMin 12   --chimOutType WithinBAM   Junctions      --chimOutJunctionFormat 1      --quantMode GeneCounts      --twopass1readsN 50000   --twopassMode Basic
+@CO	user command line: STAR --runThreadN 12 --genomeLoad NoSharedMemory --genomeDir /panfs/roc/website/galaxy.msi.umn.edu/galaxy/tool-data/rnastar/2.7.4a/GRCh38_canon/GRCh38_canon/dataset_1367616_files --readFilesIn /panfs/roc/galaxy/PRODUCTION/database/files/001/368/dataset_1368710.dat /panfs/roc/galaxy/PRODUCTION/database/files/001/368/dataset_1368711.dat --readFilesCommand zcat --outSAMtype BAM SortedByCoordinate --twopassMode Basic --twopass1readsN 50000  --quantMode GeneCounts --outSAMstrandField intronMotif --outSAMattrIHstart 1 --outSAMattributes NH HI AS nM ch --outSAMprimaryFlag OneBestScore --outSAMmapqUnique 60 --outSAMunmapped Within --chimSegmentMin 12 --outBAMsortingThreadN 12 --outBAMsortingBinsN 50 --winAnchorMultimapNmax 50 --limitBAMsortRAM 122880000000 --chimOutType WithinBAM Junctions --chimOutJunctionFormat 1
+BCR-ABL1-76	99	9	130854061	60	24S126M	=	130854103	755	CAGCCACTGGATTTAAGCAGAGTTCAAAAGCCCTTCAGCGGCCAGTAGCATCTGACTTTGAGCCTCAGGGTCTGAGTGAAGCCGCTCGTTGGAACTCCAAGGAAAACCTTCTCGCTGGACCCAGTGAAAATGACCCCAACCTTTTCGTTG	CCCGGGGCGGGCGJJJJJGJJJGJJJJJJJJJGJ1JCJJGJGGJJJGJJGGJJJ8GGJJGGGJJ=GGCGGGGGG=GGCCGGG8GC=GGGG=GCGGCGGGGJGG=GGGG=GGGGGGGGCGGGGCCGGGCG=GG(G=GCGCCG1CCGGCGGG	NH:i:1	HI:i:1	AS:i:274	nM:i:1	XS:A:+	NM:i:1
+BCR-ABL1-64	99	9	130854061	60	6S144M	=	130854104	756	AGAGTTCAAAAGCCCTTCAGCGGCCAGTAGCATCTGACTTTGAGCCTCAGGGTCTGAGTGAAGCCGCTCGTTGGAACTCCAAGGAAAACCTTCTCGCTGGACCCAGTGAAAATGACCCCAACCTTTTCGTTGCACTGTATGATTTTGTGG	CCCGGGGGCGGGGGJJJGJJJGGJJJJJCJJJJGGJJJGJJGJGJG=GGJG=JJJJCGCCC==JGGCGGGCJG1CCCCGG8CGGGGGGGGCCGC=CGCGGJGGGGCGCGGGGGGGGCCGCGGGG=GCGGGGGGG=GGGGCGGGGGGCCGG	NH:i:1	HI:i:1	AS:i:290	nM:i:2	XS:A:+	NM:i:1
+BCR-ABL1-54	99	9	130854061	60	61S89M	=	130854061	140	CCGGGGCTCTATGGGTTTCTGAATGTCATCGTCCACTCAGCCACTGGATTTAAGCAGAGTTCAAAAGCCCTTGAGCGGCCAGTAGCATCTGACTTTGAGCCTCAGGGTCTGAGTGAAGCCGCTCGTTGGAACTCCAAGGAAAACCTTCTC	CCCGGGGGGGGGGGJJJJJGJ=JJJJJJGJJJGGJJJJJJJCJJG8JJJGJJGJ=GG=JJJGGCGGCGGJGC(GGGGGCGC8CGGCGCCGGC=GGGCGGGJG1GGGGGG1CG=GGGGC=1G1CGGGGGCCGGGGCGG=CC=C=CGGGGG8	NH:i:1	HI:i:1	AS:i:219	nM:i:4	NM:i:2
+BCR-ABL1-54	147	9	130854061	60	10S140M	=	130854061	-140	AAGCAGAGTTCAAAAGCCCTTCAGCGGCCAGTAGCATCTGACTTTGAGCCTCAGGGTCTGAGTGAAGCCGCTCGTTGGAACTCCAAGGAAAACGTTCTCGCTGGACCCAGTGAAAATGACCCCAACCTTTTCGTTGCACTGTATGATTTT	=GGGGGGGCCCCGCCGGG=G(GGGG=CGCGGCGCCGG=GGGGGCJJJ=GC8C1GGGGGCG8GCCGC=GGG1GCCGGJC8GCGGCGCGJGJJJG1CGJGG=CJJJGGGGJG=CJGJJJJCJCJJGGJJJJJGGJGGJJCGGGGGGGGG=CC	NH:i:1	HI:i:1	AS:i:219	nM:i:4	NM:i:2
+BCR-ABL1-48	163	9	130854061	60	3S147M	=	130854101	753	GTTCAAAAGCCCTTCAGCGGCCAGTAGCATCTGACTTTGAGCCTCAGGGTCTGAGTGAAGCCGCTCGTTGGAACTCCAAGGAAAACCTTCTCGCTGGACCCAGTGAAAATGACCCCAACCTTTTCGTTGCACTGTATGATTTTGTGGCCA	CCCGGGCGGGGGGGJ1JJJJJJJCJJJCGJJCGGJCJJGJGGJJJGGGCGJJ=GJJJG=JCG=GJGGGC8=GCG=G=GCCGGG1CG1GC=GGG8GGGGG1GCJJCJJCCGGCGCCG=CGCGGGCGG=GCGGG1CGC1CGC=CGGGCGGGG	NH:i:1	HI:i:1	AS:i:295	nM:i:1	XS:A:+	NM:i:1
+BCR-ABL1-28	163	9	130854061	60	19S131M	=	130854092	744	ACTGGATTTAAGCAGAGTTCAAAAGCCCTTCAGCGGCCAGTAGCATCTGACTTTGAGCCCCAGGGTCTGAGTGAAGCCGCTCGTTGGAACTCCAAGGAAAACCTTCTCGCTGGACCCAGTGAAAATGACCCCAACCTTTTCGTTGCACTG	CCCGCGGGGGGGGGJ=GJCGJJJ1JCJJJJGJJJJJCCJJJJG8JJCC=CGGCGGJGGC(JGGG=GCCGCGCJ8CGGG=GGGCGGGGGCGGCCGGCCGGGCCJ=JC=CGCGCGC1G8GCCGGGGGC=GCGGGCGGGGGGGGGGGGGCGCC	NH:i:1	HI:i:1	AS:i:275	nM:i:3	XS:A:+	NM:i:2
+BCR-ABL1-2	99	9	130854061	60	62S88M	=	130854061	134	TCCGGGGCTCTATGGGTTTCTGAATGTCATCGTCCACTCAGCCACTGGATTTAAGCAGAGTTCAAAAGCCCCTCAGCGGCCAGTAGCATCTGACTTTGAGCCTCAGGGTCTGAGTGAAGCCGCTCGTTGGAACTCCAAGGAAAACCTTCT	CCCGGGGGGGGGGGJ1JJJJJJ=JGGGGJJJCGCJJJCJJJJGGCJGGGJCJGGJJGJCGJGG1GCG=CGG(G=CGGG1GGCCGGGGGGGCGGGG=GCCGJGGCGCGGGGCCCG1GGGCCGGG8GGCGGCGG=CC(G=GC1GGCCGGGCG	NH:i:1	HI:i:1	AS:i:214	nM:i:3	NM:i:2
+BCR-ABL1-2	147	9	130854061	60	16S134M	=	130854061	-134	GGATTTAAGCAGAGTTCAAAAGCCCTTCAGCGGCCAGTAGCATCTGACTTTGAGCCTCAGGGTCTGAGTGAAGCCGCTCGTTGGAACTCCAAGGAAAACCTTCTCGCTGGACCCAGTGAAAATGACCCCAACCTTTTCGTTGCACTGTAT	CGGGC=GGGGG=GGGGGGCCCCG=GG8=8CCGCGGGGGCGCGCJJCJ=CCCCC81GGGGC=GGGC8C8GGCGCGJCCGG8JCCGCC1GGCGGCJGGGJJJGGJJJJJCGGJGCJJJJJJG=JGJGJJJJGJJJGGJGCGCGGGGGGGCCC	NH:i:1	HI:i:1	AS:i:214	nM:i:3	NM:i:1
+BCR-ABL1-68	99	9	130854064	60	1S149M	=	130854089	175	AAAGCCCTTCAGCGGCCAGTAGCATCTGACTTTGAGCCTCAGGGTCTGAGTGAAGCCGCTCGTTGGAACTCCAAGGAAAACCTTCTCGCTGGACCCAGTGAAAATGACCCCAACCTTTTCGTTGCACTGTATGATTTTGTGGCCAGTGGA	CC1=GC=GGGGGGJJJJGJJJJJGGJJJJJJJGJJJJJ=(GJGGG8CCGJJ=GJGGGGJGJGJ=GGGCCGCGGCG1CGCGGGGGGGCGCCGCGGCGGGGGJGCC8GCGGGGCGGC=GGGGG=GGGCCC=GCGGGGGGCGCGGGCGGCGCG	NH:i:1	HI:i:1	AS:i:291	nM:i:3	NM:i:0
+BCR-ABL1-60	99	9	130854064	60	39S111M	=	130854074	160	TCATCGTCCACTCAGCCACTGGATTTAAGCAGAGTTCAAAAGCCCTTCAGCGGCCAGTAGCATCTGACTTTGAGCCTCAGGGTCTGAGTGAAGCCGCTCGTTGGAACTCCAAGGAAAACCTTCTCGCTGGACCCAGTGAAAATGACCCCA	=CCGGCGGGGG=GJJGJGGGCJJCJJGJCGJG(J(JJJGGCGGGJJJGCJGGG1G=JGGJJGCJCCGGJ(JJCCGCC=GCGGGCGGGGG1GGGGCGCGG(JCGCGGGGGGGGGGGGCCGGCGCGCGGGGGGGCGGGGCGG1GGGGGGCGC	NH:i:1	HI:i:1	AS:i:259	nM:i:0	ch:A:1	NM:i:0	SA:Z:22,23290375,+,39M111H,60,0;
+BCR-ABL1-18	2193	9	130854064	60	118H32M	22	23289532	0	AAGCCCTTCAGCGGCCAGTAGCATCTGACTTT	JJJGJGCJJJ=1JJJJ=JGGCG=GGCGGGCCC	NH:i:1	HI:i:1	AS:i:31	nM:i:0	ch:A:1	NM:i:0	SA:Z:22,23289579,-,43M717N75M32S,60,0;
+BCR-ABL1-4	2193	9	130854064	60	107H43M	22	23289525	0	AAGCCCTTCAGCGGCCAGTAGCATCTGACTTTGAGCCTCAGGG	CJJJ(JJGJGJJJGGJJGJCC1JJCGGJGG=GGGGGGGGGCCC	NH:i:1	HI:i:1	AS:i:42	nM:i:0	ch:A:1	NM:i:0	SA:Z:22,23289590,-,32M717N75M43S,60,0;
+BCR-ABL1-60	147	9	130854074	60	150M	=	130854064	-160	GCGGCCAGTAGCATCTGACTTTGAGCCTCAGGGTCTGAGTGAAGCCGCTCGTTGGAACTCCAAGGAAAACCTTCTCGCTGGACCCAGTGAAAATGACCCCAACCTTTTCGTTGCACTGTATGATTTTGTGGCCAGTGGAGATAACACTCT	GCGGCCCGCGG=GG8GGCGGGGCCGCC=GCGCGGCG1GGCGG1JCJJ8CCCG=GGGGGGCG=GCGGGG18CCGCG=GGGG1CGG8C=GGGGCGGCJGJGJGJGJJGGJGJJJGJGJJJGGJC(JJJGJJJGJCJJGCGGGGGCGGGGCC=	NH:i:1	HI:i:1	AS:i:259	nM:i:0	ch:A:1	NM:i:0
+BCR-ABL1-68	147	9	130854089	60	150M	=	130854064	-175	TGACTTTGAGCCTCAGGGTCTAAGTGAAGCCGCTCGTAGGAACTCCAAGGAAAACCTTCTCGGTGGACCCAGTGAAAATGACCCCAACCTTTTCGTTGCACTGTATGATTTTGTGGCCAGTGGAGATAACACTCTAAGCATAACTAAAGG	G=CGGCGGGCCGCGCGGCCGG8GCCGCC(GGC=8=GGG=GGGCJ1=JJCCGGGCGGGGGGGG=GCGGGGCGCG==GGCGGGGGJGCCJJGGCCG=GCCCGGJCGGJJJJJ=GJJJJGJCJ=GCJGJGJJJC1GGJJJGGG=GG1GGGCCC	NH:i:1	HI:i:1	AS:i:291	nM:i:3	NM:i:3
+BCR-ABL1-50	99	9	130854089	60	150M	=	130854133	757	TGACTTTGAGCCTCAGGGTCTGAGTGAAGCCGCTCGTTGGAACTCCAAGGAAAACCTTCTCGCTGGACCCAGTGAAAATGACCCCAACCTTTTCGTTGCACTGTATGATTTTGTGGCCAGTGGAGATAACACTCTAAGCATAACTAAAGG	CC1GGCGCGGGG=JGCJJJJJGJJJGJJGGJJGJJ8JGCGJJJJJJ8CJJGJJCGJGGGGJJCCG=CGGGGGCCCG=CGGCCGCGGGGGCGG=GGGGGGGCCGGG==GGGGGCCGG=GGGGCCG=GGGGGCCC=GGCGGGGCCGGCCGGG	NH:i:1	HI:i:1	AS:i:300	nM:i:0	XS:A:+	NM:i:0
+BCR-ABL1-28	83	9	130854092	60	146M563N4M	=	130854061	-744	CTTTGAGCCTCAAGGTCTGAGTGAAGCCGCTCGTTGGAACTCCAAGGAAAACCTTCTCGCTGGACCCAGTGAAAATGACCCCAACCTTTTCGTTGCACTGTATGATTTTGTGGCCAGTGGAGATAACACTCTAAGCATAACTAAAGGTGA	GGGGGCGGGG=C1=8GGCGCGGGCGCGGGGG=GC=GGGGCG1CCCGCGGCCGGGC=GG=GGGGGCCGGGGCGGGCGJJGGGGCGGJ1JGGGGCGGJGJGGJJJCGGCJJCGJ=GJGCGGJJJJGGJJG1JJJGG1JJ=GGCGCGGG1CCC	NH:i:1	HI:i:1	AS:i:275	nM:i:3	XS:A:+	NM:i:1
+BCR-ABL1-48	83	9	130854101	60	137M563N13M	=	130854061	-753	TCAGGGTCTGAGTGAAGCCGCTCGTTGGAACTCCAAGGAAAACCTTCTCGCTGGACCCAGTGAAAATGACCCCAACCTTTTCGTTGCACTGTATGATTTTGTGGCCAGTGGAGATAACACTCTAAGCATAACTAAAGGTGAAAAGCTCCG	GGGCGGCGCGGGG8GGGGGCGGCGGGCG1GCGCGG8GCGGCGC1G8CCGCGCGGCCGGGGGCGCCGCC1=CCCCGCCJCGGGGGGJJGJC=CCJ8JJC=JJCGCJJJGJJJJJJJJJJGJJGGGCJJJJJJJGJGJGCGGCGGGCGC=C1	NH:i:1	HI:i:1	AS:i:295	nM:i:1	XS:A:+	NM:i:0
+BCR-ABL1-76	147	9	130854103	60	135M563N15M	=	130854061	-755	AGGGTCTGAGTGAAGCCGCTCGTTGGAACTCCAAGGAAAACCTTCTCGCTGGACCCAGTGAAAATGACCCCAACCTTTTCGTTGCACTGTATGATTTTGTGGCCAGTGGAGATAACACTCTAAGCATAACTAAAGGTGAAAAGCTCCGGG	CGGCGGGGCGGGGGGCGCCGCGCGGGGC=GGCGCCGCGCGGGCJJCJC1GGGGG=GCGGGGG=GGGGGGGGGGGGGCGGCGGGGGGJGJCGJGGJGJCJGJJJJJG8JJCJGG1JJJJJJJG8(JJJJJGJJJGJJJGGGGGGGG1GCCC	NH:i:1	HI:i:1	AS:i:274	nM:i:1	XS:A:+	NM:i:0
+BCR-ABL1-64	147	9	130854104	60	134M563N16M	=	130854061	-756	GGGTCTGAGTGAAGCCGCTCGTTGGAACTCCAAGGAAAACCTTCTCGCTGGACCCAGTGAAAATGACCCCAACCTTTTCGTTGCACTGTATGATTGTGTGGCCAGTGGAGATAACACTCTAAGCATAACTAAAGGTGAAAAGCTCCGGGT	C=GGGGGCGGCGGGGGGC=GGGGC1CG=1=GGGCCC=GGGCGCCCCJJGCGCGGGGCGGCGCGCCGCCGCGGGGGCGC1GGGGG=GG1CGJGJJJ(CCGGJJGCGJGJGJJJGGGCGJGJJJJJJJJJGJGJGJJJCGCGGGGGGGGCCC	NH:i:1	HI:i:1	AS:i:290	nM:i:2	XS:A:+	NM:i:1
+BCR-ABL1-14	99	9	130854110	60	128M563N22M	=	130854134	737	GAGTGAAGCCGCTCGTTGGAACTCCAAGGAAAACCTTCTCGCTGGACCCAGTGAAAATGACCCCAACCTTTTCGTTGCACTGTATGATTTTGTGGCCAGTGGAGATAACACTCTAAGCATAACTAAAGGTGAAAAGCTCCGGGTCTTAGG	CC=GGGGGGGGGGJGJJCJGJJJGG1JJGJ=JGGGJJJJGGJJJGCJJCGJJGC=GCJGGJGGCGGGCCGGGCGGCGCGGGGGGGGGGGGCC8GGGGCGCJCGGGCCCGCG8GGGGCGGGCGGCGGGGGGCGGGGGCGGG=GGGGGGCCG	NH:i:1	HI:i:1	AS:i:302	nM:i:0	XS:A:+	NM:i:0
+BCR-ABL1-78	99	9	130854115	60	123M563N27M	=	130854164	762	AAGCCGCTCGTTGGAACTCCAAGGAAAACCTTCTCGCTGGACCCAGTGAAAATGACCCCAACCTTTTCGTTGCACTGTATGATTTTGTGGCCAGTGGAGATAACACTCTAAGCATAACTAAAGGTGAAAAGCTCCGGGTCTTAGGCTATA	CC=1G=GGGGCGGJJJJGJJGG8JJCJGJGJJ8JJJJGJCGJGJJ=JGGGCGJCJCGGG=JJJJGGG=JGC=GGGCGGGGGGCGGCG=GCCGGGGGGCGGJCGCCGCGGGGGGGCCGGGGCGCCGGG=GGGCGCGGGGCGGGCCGGGGGG	NH:i:1	HI:i:1	AS:i:302	nM:i:0	XS:A:+	NM:i:0
+BCR-ABL1-62	163	9	130854121	60	117M563N33M	=	130854179	771	CTCGTTGGAACTCCAAGGAAAACCTTCTCGCTGGACCCAGTGAAAATGACCCCAACCTTTTCGTTGCACTGTATGATTTTGTGGCCAGTGGAGATAACACTCTAAGCATAACTAAAGGTGAAAAGCTCCGGGTCTTAGGCTATAATCACA	CCCCGGGGGGGGCJGGGJJGJJJJJJJ=GJJCJJGJCJGJCGCGCGGGCCJGJCGJ81JC1GGGGGCG8GGCGGGGCG1C1GGGGCGGCGCCCGGG=GC=CGCJJJJGGGGGCGGCGC=8GCCGGGGGGGG=GCG=1GGGCGGGCGG1CG	NH:i:1	HI:i:1	AS:i:300	nM:i:1	XS:A:+	NM:i:0
+BCR-ABL1-50	147	9	130854133	60	105M563N45M	=	130854089	-757	CCAAGGAAAACCTTCTCGCTGGACCCAGTGAAAATGACCCCAACCTTTTCGTTGCACTGTATGATTTTGTGGCCAGTGGAGATAACACTCTAAGCATAACTAAAGGTGAAAAGCTCCGGGTCTTAGGCTATAATCACAATGGGGAATGGT	CCGCCCGGGGGCGGGG1GCGGGGGGC8CGG=CGGCCC=CGGGCCJ(JJ=GGCGCGGGGCGGGGGCC8GCCCGGCGCGGGJ8GGGGCC1JJGJCGGJJJGJG8JJGJJJJCJJJGGJGGGCJGGJJJJGGGJJJ=CJCGG=GCGGGGG=CC	NH:i:1	HI:i:1	AS:i:300	nM:i:0	XS:A:+	NM:i:0
+BCR-ABL1-14	147	9	130854134	60	104M563N46M	=	130854110	-737	CAAGGAAAACCTTCTCGCTGGACCCAGTGAAAATGACCCCAACCTTTTCGTTGCACTGTATGATTTTGTGGCCAGTGGAGATAACACTCTAAGCATAACTAAAGGTGAAAAGCTCCGGGTCTTAGGCTATAATCACAATGGGGAATGGTG	1CGGGGGCGGCCGG1CG=GGGGGGGCGGGGGCCGGG1CGCGCCJJJJJCG1CGGGCCGCGGGGGGGGGGGGCGGGGGGGCCGGGCGJJGG=JJ(J18GJCJGJ8JJGGJ=JJGJJGGGJJJ=JJJJCJJJJJJJJJGGGGGGG1GGGCCC	NH:i:1	HI:i:1	AS:i:302	nM:i:0	XS:A:+	NM:i:0
+BCR-ABL1-20	163	9	130854136	60	102M563N48M	=	130854183	760	AGGAAAACCTTCTCGCTGGACCCAGTGAAAATGACCCCAACCTTTTCGTTGCACTGTATGATTTTGTGGCCAGTGGAGATAACACTCTAAGCATAACTAAAGGTGAAAAGCTCCGGGTCTTAGGCTATAATCACAATGGGGAATGGTGTG	CC=GGGGGGGGGGJJGJJJJJJCJJ=JJJJGGGGJCJJGCGCGJGGCCCJJJGJJJJCGGJG=GGJGGGGGJGJGCCGGGGCCG=GG=C=G=GGCCGGGCGCCCC=JG11GGCCGCCCGCGCC8CGGGGCC1CGCGGG=GG=CCC1GGCG	NH:i:1	HI:i:1	AS:i:302	nM:i:0	XS:A:+	NM:i:0
+BCR-ABL1-26	163	9	130854138	60	100M563N50M	=	130854180	755	GAAAACCTTCTCGCTGGACCCAGTGAAAATGACCCCAACCTTTTCGTTGCACTGTATGATTTTGTGGCCAGTGGAGATAACACTCTAAGCATAACTAAAGGTGCAAAGCTCCGGGTCTTAGGCTATAATCACAATGGGGAATGGTGTGAA	C1CGGGCGGGGGGJJJ=JJGGCGGGJJGJJGGJGGJJGJCJGJJ=GGCJJGJJGGCGCGCGGG=JGGG8GGGGCGGGGGCGCCCGGGCGGGCCCCCG=GGGCJ(JCJ=GGCCGGGGGGGGGGGGGCCGCGGGCCGGGGGCGCGGGGGGGG	NH:i:1	HI:i:1	AS:i:298	nM:i:2	XS:A:+	NM:i:1
+BCR-ABL1-34	99	9	130854147	60	91M563N59M	=	130854187	753	CTCGCTGGACCCAGTGAAAATGACCCCAACCTTTTCGTTGCCCTGTATGATTTTGTGGCCAGTGGAGATAACACTCTAAGCATAACTAAAGGTGAAAAGCTCCGGGTCTTAGGCTATAATCACAATGGGGAATGGTGTGAAGCCCAAACC	CCCGGGGGGGGGGJJJJJGJGJJGG=JJJJJJGJJJGJGJC1JGGGJGGJJGGGJJGG=GCGJGGJCGGGGGGCG=GGCGG1CGGGG=CGGCCGGGCGGGJ88=CGCG=GGGGGC=GGCGGGGG1GGCCGGGG1GGGCGGGGCCCGCGGC	NH:i:1	HI:i:1	AS:i:300	nM:i:1	XS:A:+	NM:i:1
+BCR-ABL1-80	99	9	130854163	60	75M563N75M	=	130854214	764	AAAATGACCCCAACCTTTTCGTTGCACTGTATGATTTTGTGGCCAGTGGAGATAACACTCTAAGCATAACTAAAGGTGAAAAGCTCCGGGTCTTAGGCTATAATCACAATGGGGAATGGTGTGAAGCCCAAACCAAAACTGGCCAAGGCT	CC=GGGGGGGGGGJJJJCJCJJJJJJGGGJJJCJGCCJJJJJ=JCGJJJ=J8JGJGJ=J=JGG=CJCCGG1GG=CGGG=8GG=GCGCCGGCGCGGGG8GGJG1CGCGCCGCCGCGG=GGGGCGCGGGGGG=G==GGCC(GGCGGGGCGCC	NH:i:1	HI:i:1	AS:i:300	nM:i:1	XS:A:+	NM:i:1
+BCR-ABL1-8	163	9	130854163	60	75M563N75M	=	130854210	760	AAAATGACCCCAACCTTTTCGTTGCACTGTATGATTTTGTGGCCAGTGGAGATAACACTCTAAGCATAACTAAAGGTGAAAAGCTCCGGGTCTTAGGCTATAATCACAATGGGGAATGGTGTGAAGCCCAAACCAAAAATGGCCAAGGCT	=C=GG(1GGGGGGJGGJGCCJJCJJJJJJ=GJGGJJGJ=JJGCGCGJJJJJGGJ1GGJJG8C8GGG=GCCG8GGGG1=CGG88CG=GG8GG=GGGCGG8GGCJ18CJ=CGGCGGGG1CC=GCCGCGGG=GGGCGGCGCC8GCCGCGCCGG	NH:i:1	HI:i:1	AS:i:302	nM:i:0	XS:A:+	NM:i:0
+BCR-ABL1-78	147	9	130854164	60	74M563N76M	=	130854115	-762	AAATGACCCCAACCTTTTCGTTGCACTGTATGATTTTGTGGCCAGTGGAGATAACACTCTAAGCATAACTAAAGGTGAAAAGCTCCGGGTCTTAGGCTATAATCACAATGGGGAATGGTGTGAAGCCCAAACCAAAAATGGCCAAGGCTG	GGGGGCG8GGGGGGGCCGCGG=GCCGCGCGCGGGGGGGC=GCGJ88JJ=CGCGGGG=CGGGGG(GG=G(CCGCGGCJGJGGGCCGGCCJJGGJJJGJJGG1G(JJJGCGJJG=J=GJJJGJJJJJ=CGJJJGJGGJJGGCGGCGGGGCCC	NH:i:1	HI:i:1	AS:i:302	nM:i:0	XS:A:+	NM:i:0
+BCR-ABL1-42	163	9	130854168	60	70M563N80M	=	130854209	754	GACCCCAACCTTTTCGTTGCACTGTATGATTTTGTGGCCAGTGGAGATAACACTCTAAGCATAACTAAAGGTGAAAAGCTCCGGGTCTTAGGCTATAATCACAATGGGGAATGGTGTGAAGCCCAAACCAAAAATGGCCAAGGCTGGGTC	CC=GGGGGGGGGGGGJJJGGJJJ=JJJGGGJJGJGCJJJJ(GJGJJCCJGGGJCGJGGJJJGG=G1C8GCGGGG18GCC=GGGGGCCCGC1GGGGCGGCGG=CCJCCGGG==GGGGGGGG1G(GGG=C=GGGG88=CC=GGCGGGGCGGG	NH:i:1	HI:i:1	AS:i:302	nM:i:0	XS:A:+	NM:i:0
+BCR-ABL1-62	83	9	130854179	60	59M563N91M	=	130854121	-771	TTTCGTTGCACTGTATGATTTTGTGGCCGGTGGAGATAACACTCTAAGCATAACTAAAGGTGAAAAGCTCCGGGTCTTAGGCTATAATCACAATGGGGAATGGTGTGAAGCCCAAACCAAAAATGGCCAAGGCTGGGTCCCAAGCAACTA	GCCC1CCCCCGCGCCC=GGGGGGCCGGC8CGGC1GGC=CGGGGCCCGGC8GCC=GCGGGCGGGGGGGCCGJGGGGGGGGGJGG=GJGCGJGGJ=JJGJGJCG=JJJJGJJJJGJJJGJJGCGJCJ=JJJJJGJGGJJGGGGCGGGGGCCC	NH:i:1	HI:i:1	AS:i:300	nM:i:1	XS:A:+	NM:i:1
+BCR-ABL1-26	83	9	130854180	60	58M563N92M	=	130854138	-755	TTCGTTGTACTGTATGATTTTGTGGCCAGTGGAGATAACACTCTAAGCATAACTAAAGGTGAAAAGCTCCGGGTCTTAGGCTATAATCACAATGGGGAATGGTGTGAAGCCCAAACCAAAAATGGCCAAGGCTGGGTCCCAAGCAACTAC	CGGGGGC(GGGCGGCGCCGGGCGGCGCCGGGCGCGGGGCGGC==GGCCG=1GCCGGCGGCGGGCCCGGG=GGGCGCCCCJJGCCC1GJJCJGGGJJJG8JGG=GJJ=GGJJJJCJGCCGJJJJJJGJJJJJGGJGJJGGGGGGGCGCCCC	NH:i:1	HI:i:1	AS:i:298	nM:i:2	XS:A:+	NM:i:1
+BCR-ABL1-44	163	9	130854181	60	57M563N93M	=	130854224	756	TCGTTGCACTGTATGATTTTGTGGCCAGTGGAGATAACACTCTAAGCATAACTAAAGGTGAAAAGCTCCGGGTCTTAGGCTATAATCACAATGGGGAATGGTGTGAAGCCCAAACCAAAAATGGCCAAGGCTGGGTCCCAAGCAACTACA	C=CGGGGGGCGGGJJJJJGJGJJJJJ1JJJJJJ1JJJGCGGG=JGJGJGGJGGCGGCGJGCJG=JJGGCJCGCCCGGGGCCCGGGC=GG=CGGGGGGGGGGGCJJJJGC=GGGGCGGCGGC=CCGG8GGGCGCGGCCCGGCG=GGCGGGC	NH:i:1	HI:i:1	AS:i:302	nM:i:0	XS:A:+	NM:i:0
+BCR-ABL1-40	99	9	130854183	60	55M563N94M1S	=	130854802	769	GTTGCACTGTATGATTTTGTGGCCCGTGGAGATAACACTCTAAGCATAACTAAAGGTGAAAAGCTCCGGGTCTTAGGCTATAATCACAATGGGGAATGGTGTGAAGCCCAAACCAAAAATGGCCAAGGCTGGGTCCCAAGCAACTACATG	CCCCGGGGGGCGGJJJJGGJCJJJ(GJJ=JJJJJCJJJGGGGJJJGGGJJCJJJGJCJCCJJ(CJ=CGGGJ=GGGCJCGGGGCGGGCGGGGGCGGCGGGGCCGGCGCCGGGGGGCCGCGGGGGGGG=GGGCGG=GGGGGCGGG8G=CGG1	NH:i:1	HI:i:1	AS:i:297	nM:i:1	XS:A:+	NM:i:1
+BCR-ABL1-20	83	9	130854183	60	55M563N95M	=	130854136	-760	GTTGCACTGTATGATTTTGTGGCCAGTGGAGATAACACTCTAAGCATAACTAAAGGTGAAAAGCTCCGGGTCTTAGGCTATAATCACAATGGGGAATGGTGTGAAGCCCAAACCAAAAATGGCCAAGGCTGGGTCCCAAGCAACTACATC	GCG=GGGCC8GCCGCGGGGGGG=GGCCGGCGGGGCCGGGCGG8CGGCGGJGCGGCGGGGGGCGG=GGG1GCGGGGGGGGGC==JJGG=GGGCCJGGGGGGJJJJJGCCGJJGGJGJCJJCJJJJJGGJJJJGJJJJ1GGGGGGGGGGCCC	NH:i:1	HI:i:1	AS:i:302	nM:i:0	XS:A:+	NM:i:0
+BCR-ABL1-34	147	9	130854187	60	51M563N99M	=	130854147	-753	CACTGTATGATTTTGTGGCCAGTGGAGATAACACTCTAAGCATAACTAAAGGTGAAAAGCTCCGGGTCTTAGGCTATAATCACAATGGGGAATGGTGTGAAGCCCAAACCAAAAATGGCCAAGGCTGGGTCCCAAGCAACTACATCACGC	CGGG8GCC=GGGCGGGGGGCGGGGG=GGGGGGGCGGCGCGGGGJJJJCCCGC1GGGGG8GGCC=GGGGGGGGGGGCCGGCCCG=GCJJJGGGGGG81J8=CGJGGGGJGJJCJGGJGGJGCCJJGJJJJJCJJJJGJGGGCGGGG=GCCC	NH:i:1	HI:i:1	AS:i:300	nM:i:1	XS:A:+	NM:i:0
+BCR-ABL1-22	99	9	130854201	60	37M563N113M	=	130854828	777	GTGGCCAGTGGAGATAACACTCTAAGCATAACTAAAGGTGAAAAGCTCCGGGTCTTAGGCTATAATCACAATGGGGAATGGTGTGAAGCCCAAACCAAAAATGGCCAAGGCTGGGTCCCAAGCAACTACATCACGCCAGTCAACAGTCTG	CC1GGGGGCGGGGGGCJJGCGJCJJJJJJJJGJJJJGJJGGG1JGCJJCJJGJJJJJGGGGJGJJCGJJGCC8CGGCGCGGCGCCG1GGCCCCGC=GG=GJCGGGGCGCG=GGGGGCGGC=CGGGGGGGGGCGGCCCGGG(G8GGGGGGG	NH:i:1	HI:i:1	AS:i:300	nM:i:0	XS:A:+	NM:i:0
+BCR-ABL1-42	83	9	130854209	60	29M563N121M	=	130854168	-754	TGGAGATAACACTCTAAGCATAACTAAAGGTGAAAAGCTCCGGGTCTTAGGCTATAATCACAATGGGGAATGGTGTGAAGCCCAAACCAAAAATGGCCAAGGCTGGGTCCCAAGCAACTACATCACGCCAGTCAACAGTCTGGAGAAACA	GGGCGGGCGCCGGGGGCGGGGGG=CGGGGC1CGCGGGCG=GGCG=GGCCCGGGGCGCGGCGGCCGGGGGGG(GG=GGG=GGGCCGGJJGJJ=GGGGGJJGCGJGGJJJJJJGCJGJGG(GGJJJGJJJJJJJ8GJJJCGGGGCGG1G1CC	NH:i:1	HI:i:1	AS:i:302	nM:i:0	XS:A:+	NM:i:0
+BCR-ABL1-8	83	9	130854210	60	28M563N122M	=	130854163	-760	GGAGATAACACTCTAAGCATAACTAAAGGTGAAAAGCTCCGGGTCTTAGGCTATAATCACAATGGGGAATGGTGTGAAGCCCAAACCAAAAATGGCCAAGGCTGGGTCCCAAGCAACTACATCACGCCAGTCAACAGTCTGGAGAAACAC	G=GGCGGGGG=GGGGGGGCCGCCGCGCGGCCGGCGGGCCGGCGGCGGGGJGG=GGGGGGGC=GCGGGCCCG=JGCG=GGGJGGGJGJGGGJCGCGCGGJGGJJJJJJGJGJGJJJG1JJJJJJJJGJCGGJJJJJJCGGGGGGGGCGCCC	NH:i:1	HI:i:1	AS:i:302	nM:i:0	XS:A:+	NM:i:0
+BCR-ABL1-80	147	9	130854214	60	24M563N126M	=	130854163	-764	ATAACACTCTAAGCATAACTAAAGGTGAAAAGCTCCGGGTCTTAGGCTATAATCACAATGGGGAATGGTGTGAAGCCCAAACCAAAAATGGCCAAGGCTGGGTCCCAAGCAACTACATCACGCCAGTCAACAGTCTGGAGAAACACTCCT	CCCCCGGCC8GCG=CGCGCGGG(GGGGGGGGC=GGG=GG=CGCJJJJJGGGGC==G8CGGG=GCGGC1GGGC(GG=G=GGCGJC8GCGJGJCGCJGJGCGJGJJJGGJJGJGJCJJJJGJGJJJGGJJJJJJGJJJJ8GGGGGGGGGCCC	NH:i:1	HI:i:1	AS:i:300	nM:i:1	XS:A:+	NM:i:0
+BCR-ABL1-30	163	9	130854218	60	20M563N130M	=	130854831	763	CACTCTAAGCATAACTAAAGGTGAAAAGCACCGGGTCTTAGGCTATAATCACAATGGGGAATGGTGTGAAGCCCAAACCAAAAATGGCCAAGGCTGGGTCCCAAGCAACTACATCACGCCAGTCAACAGTCTGGAGAAACACTCCTGGTA	CCCGG1GGGGGGGJGJCJGJGJJJ(JGJJ(JGJGC=GJJJGJJGGJGGJJJJGGJGGCCCCJG8CG=GCJCGGGJ=GCGJ1CGGGGCCGCG8CG=GGCGGC(CJJC=CGGCGG(CGGGGGGCGCCGG=GC1GCGG=G1CGGCCCCG===8	NH:i:1	HI:i:1	AS:i:298	nM:i:1	XS:A:+	NM:i:1
+BCR-ABL1-52	99	9	130854220	60	18M563N132M	=	130854841	771	CTCTAAGCATAACTAAAGGTGAAAAGCTCCGGGTCCTAGGCTATAATCACAATGGGGAATGGTGTGAAGCCCAAACCAAAAATGGCCAAGGCTGGGTCCCAAGCAACTACATCACGCCAGTCAACAGTCTGGAGAAACACTCCTGGTACC	CCCGGGGGGGGGGGJJGCGJGG8GJJJGJJGJG=G=JJGGCJGGGGGJGGJJGJJGJGGGCJ=GJGGGCGJCGGGGGGG8CGGGGGGC=GGCGGGGCGCGJCGCGCCCCGCG11GGGGGGGGGGGGGGCCCGGGGCCGGCGGGGGCGGCG	NH:i:1	HI:i:1	AS:i:298	nM:i:1	XS:A:+	NM:i:1
+BCR-ABL1-44	83	9	130854224	60	14M563N136M	=	130854181	-756	AAGCATAACTAAAGGTGAAAAGCTCCGGGTCTTAGGCTATAATCACAATGGGGAATGGTGTGAAGCCCAAACCAAAAATGGCCAAGGCTGGGTCCCAAGCAACTACATCACGCCAGTCAACAGTCTGGAGAAACACTCCTGGTACCATGG	CGGGCGCGGGGCGCGGGGGCGGGGGGGGGCGGGGGGGG=G1GGGCGCGGCGCGGGGGGGCGGCGCCGCGGC==GGG=1GGG=CJJGCGJGJJJCGCJJ8GJJGGGJJGGJJJGJJGJJJGJJJJJJCJGJ=GGJJ=JGGGGGCGGCGCCC	NH:i:1	HI:i:1	AS:i:302	nM:i:0	XS:A:+	NM:i:0
+BCR-ABL1-40	147	9	130854802	60	150M	=	130854183	-769	TGAAAAGCTCCGGGTCTTAGGCTATAATCACAATGGGGAATGGTGTGAAGCCCAAACCAAAAATGGCCAAGGCTGGGTCCCAAGCAACTACATCACGCCAGTCAACAGTCTGGAGAAACACTCCTGGTACCATGGGCCTGTGTCCCGCAA	=GCCGGGGCGGGG=GG1G8G=G(GGCGGGGGGGGC1C=GCGG=J8JJ1GGGGGGGGGGGGGGGGGC=1G8CGGCJ8GC1GGCGCCGJG1GGJGGJGJ8JJCJJJGJ8GGJJJ=JGJJG=CGG=JJJJJJJ8JJ=JJGGGGGGGGGCGC=C	NH:i:1	HI:i:1	AS:i:297	nM:i:1	XS:A:+	NM:i:0
+BCR-ABL1-16	163	9	130854804	60	150M	=	130854852	198	AAAAGCTCCGGGTCTTAGGCTATAATCACAATGGGGAATGGTGTGAAGCCCAAACCAAAAATGGCCAAGGCTGGGTCCCAAGCAACTACATCACGCCAGTCACCAGTCTGGAGAAACACTCCTGGTACCATGGGCCTGTGTCCCGCAATG	CC8GGG1CCGGGCCJJGJG(JJJJJ(JGJCJGJJJJJJJJGGJGJGJJCCJJJGJGGGGCGGG=C=JGJJG=C8GGGCGGGGGC(GGCGCGCGGCG(CGG=1(C=JCGGGCCGGGGGCGCGGGCGGCGGGCGGCCGGGGGGGG8GGGCGC	NH:i:1	HI:i:1	AS:i:296	nM:i:1	NM:i:1
+BCR-ABL1-6	99	9	130854810	60	150M	=	130854867	207	TCCGGGTCTTAGGCTATAATCACAATGGGGAATGGTGTGAAGCCCAAACCAAAAATGGCCAAGGCTGGGTCCCAAGCAACTACATCACGCCAGTCAACAGTCTGGAGAAACACTCCTGGTACCATGGGCCTGTGTCCCGCAATGCCGCTG	CCCGGGGGGCGGGGJJGJGJ=JJJJJJJJJJGJGGJJJGGJC=GGGGJ=GGCJJJGCGJGJGJJGJGCGJGGJGJCGCGGGGGGGGGG8GC8GC8G=GGGJ=GCGGGGGCCGCCGGGGGGGG1G1=GCGC81CGGGGGCCCGGGGGCGGG	NH:i:1	HI:i:1	AS:i:298	nM:i:0	NM:i:0
+BCR-ABL1-32	99	9	130854814	60	150M	=	130854858	194	GGTCTTAGGCTATAATCACAATGGGGAATGGTGTGAAGCCCAAACCAAAAATGGCCAAGGCTGGGTCCCAAGCAACTACATCACGCCAGTCAACAGTCTGGAGAAACACTCCTGGTACCATGGGCCTGTGTCCCGCAATGCCGCTGAGTA	CCCGGGGGGGGGGJJGGJJJ=JJGJ8JGJ=JGJG=GGGJGCCJJJC=J8JJ=GCCJC8JC=GGGJGG(GGCJGGGG1GC=GJ=GCCGGCCGGG=GGGGGGJGCGGCGG=CGGCGCG=GGC=CGGGGCCGGGGGGCCCCGCCGGGCCGGGG	NH:i:1	HI:i:1	AS:i:298	nM:i:0	NM:i:0
+BCR-ABL1-38	163	9	130854818	60	150M	=	130854864	196	TTAGGCTATAATCACAATGGGGAATGGTGTGAAGCCCAAACCAAAAATGGCCAAGGCTGGGTCCCAAGCAACTACATCACGCCAGTCAACAGTCTGGAGAAACACTCCTGGTACCATGGGCCTGTGTCCCGCAATGCCGCTGAGTATCTG	CCCGCGGGGGGGG=GGJGJGJJJ1=GGCJJJCGJ=GJJJGGCJGGCGJGGJJCGJCGJGGG=GGGGJCGGGGGGJGGGGGGGGG8C=G=GCGG==CG8GCG=JCJJCCGG8CGGCGGCCGGGCCGCGGCGGG=GCGGCGG=8CGGCGCG=	NH:i:1	HI:i:1	AS:i:298	nM:i:0	NM:i:0
+BCR-ABL1-22	147	9	130854828	60	150M	=	130854201	-777	ATCACAATGGGGAATGGTGTGAAGCCCAAACCAAAAATGGCCAAGGCTGGGTCCCAAGCAACTACATCACGCCAGTCAACAGTCTGGAGAAACACTCCTGGTACCATGGGCCTGTGTCCCGCAATGCCGCTGAGTATCTGCTGAGCAGCG	GGG1C8GGG=C=GGGCGCGG=GG=GCGGGGCGGGCGCGGGCGGJ1CCJCCC8GCC=GGCGGGGGGGCGCGCGGGCCGJGCGJGJGGJGCJGGGGGCGGGJ(JCGGGJJJJJJG=1GGJJJGJJJGGJJJJJJGJ1JJGG=GGGGGCGCCC	NH:i:1	HI:i:1	AS:i:300	nM:i:0	XS:A:+	NM:i:0
+BCR-ABL1-30	83	9	130854831	60	150M	=	130854218	-763	ACAATGGGGAATGGTGTGAAGCCCAAACCAAAAATGGCCAAGGCTGGGTCCCAAGCAACTACATCACGCCAGTCAACAGTCTGGAGAAACACTCCTGGTACCATGGGCCTGTGTCCCGCAATGCCGCTGAGTATCTGCTGAGCAGCGGGA	CG8GC==GCGCCGGGGGGG8=CGGGGGC=CGGCGGGGGCG=GGGC=GCCJGGGGCGGCGCGGGGCGCGGGCCCGJGGG8(8GC8GCGCGCJGGGGGGJGGGJGJJJJJGJJJJCGGJJGJJJ=JJJJGGJ=JJJJGGGGGGGGGGGGCCC	NH:i:1	HI:i:1	AS:i:298	nM:i:1	XS:A:+	NM:i:0
+BCR-ABL1-36	99	9	130854836	60	150M	=	130854883	197	GGGGAATGGTGTGAAGCCCAAACCAAAAATGGCCAAGGCTGGGTCCCAAGCAACTACATCACGCCAGTCAACAGTCTGGAGAAACACTCCTGGTACCATGGGCCTGTGTCCCGCAATGCCGCTGAGTATCTGCTGAGCAGCGGGATCAAT	CCCGGGGGGGGGGJJJGGJJ=JJJJGJJJJJ=GGJJJCJ=JGGGJGGJGJJGGGGCGCG=JGCCGJGCJGCGGGJGGCGCG1GJGC=8GGCCCCGG8CCGJCCGGGCGGGCGGGGGCG1GCCGGCGGGGGCCCGGCCCG(G8GC8=CCGC	NH:i:1	HI:i:1	AS:i:298	nM:i:0	NM:i:0
+BCR-ABL1-70	163	9	130854838	60	150M	=	130854875	187	GGAATGGTGTGAAGCCCAAACCAAAAATGGCCAAGGCTGGGTCCCAAGCAACTACATCACGCCAGTCAACAGTCTGGAGAAACACTCCTGGTACCATGGGCCTGTGTCCCGCAATGCCGCTGAGTATCTGCTGAGCAGCGGGATCAATGG	CCCGGG==GGGGGJJJJJJJJJJJ=GJ8J=GJGJGJGGJCCJJJCGG1=GJGJG18JG=GGGC1GGGJ8GCCC1=CCGCGGG(GC1GCG1GCCGC8GG1GG=JJCJ1GGCGCCGCG(CGGCGGGGGG(GGGGGGCGCCGGGGCGCGCCG=	NH:i:1	HI:i:1	AS:i:298	nM:i:0	NM:i:0
+BCR-ABL1-52	147	9	130854841	60	150M	=	130854220	-771	ATGGTGTGAAGCCCAAACCAAAAATGGCCAAGGCTGGGTCCCAAGCAACTACATCACGCCAGTCAACAGTCTGGAGAAACACTCCTGGTACCATGGGCCTGTGTCCCGCAATGCCGCTGAGTATCTGCTGAGCAGCGGGATCAATGGCAG	CG=GGGGC(CGGGGGGGCCGCGCCGGGGG=GGCC8CGGCCCCCCJ=CJ=GGGGGGGGCG=GGGGCG(GC1GGGC=GJCCCGJ=88GJGGJGJGCJGGJJGGJJJGJJJJGCGJJGGJ(GJGGJCJJJCJJJJJ=(JJGGGGGGGGGCCCC	NH:i:1	HI:i:1	AS:i:298	nM:i:1	XS:A:+	NM:i:0
+BCR-ABL1-56	99	9	130854849	60	150M	=	130854892	193	AAGCCCAAACCAAAAATGGCCAAGGCTGGGTCCCAAGCAACTACATCACGCCAGTCAACAGTCTGGAGAAACACTCCTGGTACCATGGGCCTGTGTCCCGCAATGCCGCTGAGTATCTGCTGAGCAGCGGGATCAATGGCAGCTTCTTGG	=CCGC=GGGGGG1JJJGJGJJJCJJCGJJJJJJJJJJCGJJJJJJGGCCJ1JGCGGCGJ8GG(CJGJGJCGGCGG1CC=CG=GCC=GGGGG=GGCGCCG1CGGGGGGG1GGGGGCGGGCGCGGCGG811G8CCGGGGCGGGGGCCG=CGC	NH:i:1	HI:i:1	AS:i:298	nM:i:0	NM:i:0
+BCR-ABL1-16	83	9	130854852	60	150M	=	130854804	-198	CCCAAACCAAAAATGGCCAAGGCTGGGTCCCAAGCAACTACATCACGCCAGTCAACAGTCTGGAGAAACACTCCTGGTACCATGGGCCTGTGTCCCGCAATGCCGCTGAGTATCTGCTGAGCAGCGGGATCAATGGCAGCTTCTTGGTGC	CGGGGGG=GGCGGGCGGGGGGGCGGGGCGGGGCCCG=CGGGGGCGGGCCCGG8GCCCGG8GCGGC=GCCCGGGGGGCGGGCGGJGGGCCGGJCGGJJGJGGGGGJJJJGJJGGJJGG=JGGJJGGJ=JGJJJJJJGJC1GGGG=GGG1C1	NH:i:1	HI:i:1	AS:i:296	nM:i:1	NM:i:0
+BCR-ABL1-32	147	9	130854858	60	150M	=	130854814	-194	CCAAAAATGGCCAAGGCTGGGTCCCAAGCAACTACATCACGCCAGTCAACAGTCTGGAGAAACACTCCTGGTACCATGGGCCTGTGTCCCGCAATGCCGCTGAGTATCTGCTGAGCAGCGGGATCAATGGCAGCTTCTTGGTGCGTGAGA	8GCCCCGGCGCGC1GGGGCGGCCGGGGCG1(GG=GG=GCGGCGJCJJCGG8=GGGCCGCGGGCGGGGGC=GC1=GGGGGJGGCCGJJCGGGJJJCGGG8CCGGGCJGJGJJGCCGCJJJJJJJJJJGCGJJJGJJGJGGGGGCGGGG=CC	NH:i:1	HI:i:1	AS:i:298	nM:i:0	NM:i:0
+BCR-ABL1-38	83	9	130854864	60	150M	=	130854818	-196	ATGGCCAAGGCTGGGTCCCAAGCAACTACATCACGCCAGTCAACAGTCTGGAGAAACACTCCTGGTACCATGGGCCTGTGTCCCGCAATGCCGCTGAGTATCTGCTGAGCAGCGGGATCAATGGCAGCTTCTTGGTGCGTGAGAGTGAGA	GCGCCGCGGGCGGGGGGGCGGCGCG8CGCGGGG8GGCGGGCCCCG8CC=JGCGGGGGGCGGGGCGCCGC=GCCCGGJGGGCGGGCJGCCJJGJG=GGCJJJGGJCGJCGCGJJJC=JJGJCJGJGJJGJJJ=JJJ1GGGGGGGGGGGCCC	NH:i:1	HI:i:1	AS:i:298	nM:i:0	NM:i:0
+BCR-ABL1-6	147	9	130854867	60	150M	=	130854810	-207	GCCAAGGCTGGGTCCCAAGCAACTACATCACGCCAGTCAACAGTCTGGAGAAACACTCCTGGTACCATGGGCCTGTGTCCCGCAATGCCGCTGAGTATCTGCTGAGCAGCGGGATCAATGGCAGCTTCTTGGTGCGTGAGAGTGAGAGCA	CG8GCGGGCCCCGGCGCGGGGCCGGGCGG8CG=G=GCCGCGG=1CJ8JCCGCGGGGGCGGGGGGCGG=G=8GCGJG=GGGGGGJGCCGGJJGG=G=CJ8=JJJJGG=JJJJJGJJGJGJJGJJJJJCCCJJJGGJJJG1GGG1GGGGCCC	NH:i:1	HI:i:1	AS:i:298	nM:i:0	NM:i:0
+BCR-ABL1-70	83	9	130854875	60	150M	=	130854838	-187	TGGGTCCCAAGCAACTACATCACGCCAGTCAACAGTCTGGAGAAACACTCCTGGTACCATGGGCCTGTGTCCCGCAATGCCGCTGAGTATCTGCTGAGCAGCGGGATCAATGGCAGCTTCTTGGTGCGTGAGAGTGAGAGCAGTCCTGGC	CGGGCGGGCGGGCCGGGG=GGG8GCGGG8GGCGGGCGGCGGGCGGGGG=JGGGGGCGCGGGGGCGGGGCGCG1GGGGCJC8GG=JGGJJCCCJJGGGGJGJGJGGJJGGJJJJJGCJJJJGGJJGJJGJJJGGGJJJGGGGCGGGGG=CC	NH:i:1	HI:i:1	AS:i:298	nM:i:0	NM:i:0
+BCR-ABL1-36	147	9	130854883	60	150M	=	130854836	-197	AAGCAACTACATCACGCCAGTCAACAGTCTGGAGAAACACTCCTGGTACCATGGGCCTGTGTCCCGCAATGCCGCTGAGTATCTGCTGAGCAGCGGGATCAATGGCAGCTTCTTGGTGCGTGAGAGTGAGAGCAGTCCTGGCCAGAGGTC	C=GGGGGGCCGGGGGGGGG=GGCCGCGGGC1GCGC1GCGGCCGJJ(CCG8GCCGGGCCGGGC=CC1CG=CGCCGGG=CGGGGGGGGCGGGGGJ==JJJJJ1CJJJGJGGJCGCJGJGJGJJJJ=GG1CJJCGJG1GC=GGGCGCGGGCCC	NH:i:1	HI:i:1	AS:i:298	nM:i:0	NM:i:0
+BCR-ABL1-56	147	9	130854892	60	150M	=	130854849	-193	CATCACGCCAGTCAACAGTCTGGAGAAACACTCCTGGTACCATGGGCCTGTGTCCCGCAATGCCGCTGAGTATCTGCTGAGCAGCGGGATCAATGGCAGCTTCTTGGTGCGTGAGAGTGAGAGCAGTCCTGGCCAGAGGTCCATCTCGCT	CCGGGCGCGCGGGCG=CCCGGCGCGGGGC=CGGCGGCCGCGGGJJJJCCGCCG(GCCCCCGGCCGGG=G8GGGGGGCC=C=CGGJGJJJGC=JGGJJJGJGJ1JJJGC=JJJG=JCJJJJJJJ=JJGGGJJJCGJJJGGGGGCGG=GCCC	NH:i:1	HI:i:1	AS:i:298	nM:i:0	NM:i:0
+BCR-ABL1-46	163	22	23285101	60	75M75S	=	23285151	5255	AACTGGAGGCAGTGCCCAACATCCCCCTGGTGCCCGATGAGGAGCTGGACGCTTTGAACATCAAGATCTCCAAGAAGTGTTTCAGAAGCTTCTCCCTGACATCCGTGGAGCTGCAGATGCTGACCAACTCGTGTGTGAAACTCCAGACTG	CCCGGGGGG=GGGJJJGGJJJGGJJJJCJJGGJJGCJGCGGGC8J8JGGJJJJJGJJC(JGCCG=GGJJGCCCGC8GCCGGGGGG=GGCGGG1GG=GC1G=CJCJJCCCGGCGG1CGG1GGGGGGGG=GGGGGCCGCGGG8GGGCGG=GG	NH:i:1	HI:i:1	AS:i:214	nM:i:2	XS:A:+	NM:i:2
+BCR-ABL1-72	163	22	23285110	60	62M2994N7M1344N81M	=	23288166	5264	CAGTGCCCAACATCCCCCTGGTGCCCGATGAGGAGCTGCACGCTTTGAAGATCAAGATCTCCAAGAAGTGTTTCAGAAGCTTCTCCCTGACATCCGTGGAGCTGCAGATGCTGACCAACTCGTGTGTGAAACTCCAGACTGTCCACAGCA	CCCCGGGGGGGGGGJGJCCCJ1GJJJJGCGGGCJJJ=C1JJGGJGG8JGC=CCGJ1JGG8GGGGGJCGJCCGGGCG=CGGGGGGCGG=GGCGGG=8CCGCGGJJJ=JGGGCGGGGGCCGCCGGGGGGGGC=CCGCG8GGGGGC1GGGGCC	NH:i:1	HI:i:1	AS:i:290	nM:i:1	XS:A:+	NM:i:1
+BCR-ABL1-46	83	22	23285151	60	21M2994N7M1344N105M717N17M	=	23285101	-5255	GCTTTGAAGATCAAGATCTCCAAGAAGTGTTTCAGAAGCTTCTCCCTGACATCCGTGGAGCTGCAGATGCTGACCAACTCGTGTGTGAAACTCCAGACTGTCCACAGCATTCCGCTGACCATCAATAAGGAAGATGATGAGTCTCCGGGG	=GGCGGGGGGG=GGGCCCGCCCGGGGGGGGGGCCGGGGCGG8CGCGGG1JGGCCGG(C=GCCCGGGGGGCGGGGGCGCGGCGGJCGGGJJGJGGGJJCGGGJJJGJJJJJJJGJJJJGGGJJJJJGGJJJJJGCJJJCGGGGGGGGGCCC	NH:i:1	HI:i:1	AS:i:214	nM:i:2	XS:A:+	NM:i:0
+BCR-ABL1-72	83	22	23288166	60	3S7M1344N105M717N35M	=	23285110	-5264	TCCAAGAAGTGTTTCAGAAGCTTCTCCCTGACATCCGTGGAGCTGCAGATGCTGACCAACTCGTGTGTGAAACTCCAGACTGTCCACAGCATTCCGCTGACCATCAATAAGGAAGATGATGAGTCTCCGGGGCTCTATGGGTTTCTGAAT	=GGGGGG==GGGGCCCC=GGGGG=GGGGCGGGCGGGGGGG=CGGCCGCCJGGCGGGGG=GGG8GGGCGGC=G=CCJGGGGGGCGJJGJJCGGGGGGJJJGCJCCGJG=JJJGJGJJCJJJJGJJJJJJJ=GCJGJGCGGG=GGGGGGCC=	NH:i:1	HI:i:1	AS:i:290	nM:i:1	XS:A:+	NM:i:0
+BCR-ABL1-4	99	22	23289525	60	97M717N53M	=	23289590	889	AGCTTCTCCCTGACATCCGTGGAGCTGCAGATGCTGACCAACTCGTGTGTGAAACTCCAGACTGTCCACAGCATTCCGCTGACCATCAATAAGGAAGATGATGAGTCTCCGGGGCTCTATGGGTTTCTGAATGTCATCGTCCACTCAGCC	C==GGGGGGGGGGJJJJ1JJJGGJJGGJGGJJGJJCJGJGJJCGGCJGCJJJJCGJGGGGJGGGGGGCCGG8JGGCGCGG=GGGGGGGGGGGGGG=GCCGJGGGCCGGGGGG1GGGGGGCGCGGCGGGGGG=GGGGGGGGGCCGCGGGCC	NH:i:1	HI:i:1	AS:i:259	nM:i:0	ch:A:1	XS:A:+	NM:i:0
+BCR-ABL1-18	99	22	23289532	60	90M717N60M	=	23289579	882	CCCTGACATCCGTGGAGCTGCAGATGCTGACCAACTCGTGTGTGAAACTCCAGACTGTCCACAGCATTCCGCTGACCATCAATAAGTAAGATGATGAGTCTCCGGGGCTCTATGGGTTTCTGAATGTCATCGTCCAATCAGCCACTGGAT	CCCGGGCGGGCGGJGJJJJJJJJJ=GCJJCJJJJJGJJJGJJGJJJCGGJJGGJCGJC=GG8GCGJGCGCG==GGGCGGGGG1CCCGCGGGGGCGGCCGGC=GCGGG=GGGGCGGGGCGCGGGGGGG=GGGCGGGG(GGGCGGGCGCCGG	NH:i:1	HI:i:1	AS:i:266	nM:i:2	ch:A:1	XS:A:+	NM:i:2
+BCR-ABL1-12	163	22	23289546	60	76M717N74M	=	23290337	868	GAGCTGCAGATGCTGACCAACTCGTGTGTGAAACTCCAGACTGTCCACAGCATTCCGCTGACCATCAATAAGGAAGATGATGAGTCTCCGGGGCTCTATGGGTTTCTGAATGTCATCGTCCACTCAGCCACTGGATTTAAGCAGAGTTCA	CCCGGGGGGGGGGJGGJJJJJCCJJCJJGJJGJJGGCJJJCJCGGGJJ=CGJGJJJJGGCGGGJJJ==GG(GGC=GGGGGGCGCGG(GGGGC1C8GCC=GG=C=CCJGGGGGG8CGGCCCGCGGGGGGGGCGGGG=GGGGGCGGGG=GGC	NH:i:1	HI:i:1	AS:i:227	nM:i:0	XS:A:+	NM:i:0
+BCR-ABL1-18	147	22	23289579	60	43M717N75M32S	=	23289532	-882	CTCCAGACTGTCCACAGCATTCCGCTGACCATCAATAAGGAAGATGATGAGTCTCCGGGGCTCTATGGGTTTCTGAATGTCATCGTCCACTCAGCCACTGGATTTAAGCAGAGTTCAAAAGCCCTTCAGCGGCCAGTAGCATCTGACTTT	=GCGGGGG(GGGGGGGGGGGCC=GGC=GGCGGCCGGGGCGGG8JJJJ=GCGGGGG1GGGCCGGGCCGGGCGGCGGGJGC8GCCGCGGCG=GJCGJJGC8GC1JGG=GJJCJC1JGJGGJJJGJGCJJJ=1JJJJ=JGGCG=GGCGGGCCC	NH:i:1	HI:i:1	AS:i:266	nM:i:2	ch:A:1	XS:A:+	NM:i:0	SA:Z:9,130854064,-,118H32M,60,0;
+BCR-ABL1-4	147	22	23289590	60	32M717N75M43S	=	23289525	-889	CCACAGCATTCCGCTGACCATCAATAAGGAAGATGATGAGTCTCCGGGGCTCTATGGGTTTCTGAATGTCATCGTCCACTCAGCCACTGGATTTAAGCAGAGTTCAAAAGCCCTTCAGCGGCCAGTAGCATCTGACTTTGAGCCTCAGGG	CGGCGGGGGCGGCGCCG=GCGCGGGG8GCG881CGG=C=GCCGJJCJCCCGGG8GGCGG=GGGCCCGGCGGCCCCGGCGGGG=GGCGCJJGCGGJG1JGJJJ8JGJJCJJJ(JJGJGJJJGGJJGJCC1JJCGGJGG=GGGGGGGGGCCC	NH:i:1	HI:i:1	AS:i:259	nM:i:0	ch:A:1	XS:A:+	NM:i:0	SA:Z:9,130854064,-,107H43M,60,0;
+BCR-ABL1-12	83	22	23290337	60	19S77M54S	=	23289546	-868	CGCTGACCATCAATAAGGAAGATGATGAGTCTCCGGGGCTCTATGGGTTTCTGAATGTCATCGTCCACTCAGCCACTGGATTTAAGCAGAGTTCAAAAGCCCTTCAGCGGCCAGTAGCATCTGACTTTGAGCCTCAGGGTCTGAGTGAAG	8CCCGGCCGGCCGGGGGCGG1CCG=GGCGGGGGC1GGGGCCGCGGGGCCJGG=CGGCGGGGCGCGCGGGCGGCGGJG==GGCGCJGCGGGCJGGGGGGGCJGJGGJJJGJGGGGCJJJGJJJGGJGJJJGJJCCJJGGG1GGGGGGG=CC	NH:i:1	HI:i:1	AS:i:227	nM:i:0	XS:A:+	NM:i:0
+BCR-ABL1-60	2145	22	23290375	60	39M111H	9	130854074	0	TCATCGTCCACTCAGCCACTGGATTTAAGCAGAGTTCAA	=CCGGCGGGGG=GJJGJGGGCJJCJJGJCGJG(J(JJJG	NH:i:1	HI:i:1	AS:i:38	nM:i:0	ch:A:1	NM:i:0	SA:Z:9,130854064,+,39S111M,60,0;
+BCR-ABL1-74	77	*	0	0	*	*	0	0	TCATTTTCACTGGGTCCAGCGAGAAGGTTTTCCTTGGAGTTCCAACGAGCGGCTTCACTCAGACCCTGAGGCTCAAAGTCAGATGCTACTGGCCGCTGAAGGGCTTTTGAACTCTGCTTAAATCCAGTGGCTGAGTGGACGATGACATTC	CC11GGGGGGGGGGCCJJJGCGJJGJJJJJGGGGGGJJJGGJG==GCJCJ=GGJJGGJJGGCJGG=GGGGGJGGJGC=GC=GGGCGGGCGGGGCCGCGGGJCGC=GGC8CGCGCGGGGGGCGCC1GGCGCC=GCCGCGGC8GCGGGCCCG	NH:i:0	HI:i:0	AS:i:155	nM:i:2	uT:A:1
+BCR-ABL1-74	141	*	0	0	*	*	0	0	CATTCCGCTGACCATCAATAAGGAAGATGATGAGTCTCCGGGGCTCTATGGGTTTCTGAATGTCATCGTCCACTCAGCCACTGGATTTAGGCAGAGTTCAAAAGCCCTTCAGCGGCCAGTAGCATCTGACTTTGAGCCTCAGGGTCTGAG	CCCGGGGGGCGCGJGGJJGGJGJJJGJGGJJGGJGJJ1=JCJJGGGJJJJGGGJGCCJGGJGG=J1JG8JGCGGGJG=GC1CGCCGGCG(GGCGGCGGGGGCJC1CCGC==CCGGGGCGGCGGGCCGGCGCGC8CCCCGGG=GGGC=GGG	NH:i:0	HI:i:0	AS:i:155	nM:i:2	uT:A:1
+BCR-ABL1-66	77	*	0	0	*	*	0	0	TCCAGCGAGAAGGTTTTCCTTGGAGTTCCAACGAGCGGCTTCACTCAGACCCTGAGGCTCAAAGTCAGATGCTACTGGCCGCTGAAGGGCTTTTGAACTCTGCTTAAATCCAGTGGCTGAGTGGACGATGACATTCAGAAACCCATAGAG	CCC=GGGGCGGGGJJJJJGJJJJ=JJJGJJ1GJJGJJJJJGJJJJJGGGGCGJJGGGJJJGGCGGGGJGCGG1JCGGG=GCCGCG=GC=G=GCCGGGGG8JGGGGGGGGGGGG=GGCGGC8GGCCGGGC=GGGGGGGGG=CGG=8GGCCG	NH:i:0	HI:i:0	AS:i:159	nM:i:0	uT:A:1
+BCR-ABL1-66	141	*	0	0	*	*	0	0	CATTCCGCTGACCATCAATAAGGAAGATGATGAGTCTCCGGGGCTCTATGGGTTTCTGAATGTCATCGTCCACTCAGCCACTGGATTTAAGCAGAGTTCAAAAGCCCTTCAGCGGCCAGTAGCATCTGACTTTGAGCCTCAGGGTCTGAG	CCCGGGGGGGGGGGGJ=JGJJJJJJJGGJJCCCJGJJ1JJJGCJGGGGJJJJ=GGGJGJGC(GGGGJGGGJG1=GGGGGGGG=G=C=GG8CC8GGGGGCCCCJCCCJGCG=GGCCGGCGGCGGCG==1GCCGGC1GGGGGCGGGGGGCGG	NH:i:0	HI:i:0	AS:i:159	nM:i:0	uT:A:1
+BCR-ABL1-58	77	*	0	0	*	*	0	0	ATGATGAGTCTCCGGGGCTCTATGGGTTTCTGAATGTCATCGTCCACTCAGCCACTGGATTTAAGCAGAGTTCAAAAGCCCTTCAGCGGCCAGTAGCATCTGACTTTGAGCCTCAGGGTCTGAGTGAAGCCGCTCGTTGGAACTCCAAGG	CCCGGCGGGGGGGGJJJJJGJJGJGJGJGJJJJJJJJJCJGJJJJGCG=8GGGJGJGGCGGJGCGJJJCJGGG=CGCCGGCCGGGCGCGGGCGCG1GGGCCCGGGGCG8GCCC=C8CGCGG=CCCGCCCCGGG=CCGGCGGGCGGGGGCG	NH:i:0	HI:i:0	AS:i:185	nM:i:3	uT:A:1
+BCR-ABL1-58	141	*	0	0	*	*	0	0	TTGGGGTCATTTTCACTGGGTCCAGCGAGAAGGTTTTCCTTGGAGTTCCAACGAGCGGCTTCACTCAGACCCTGAGGCTCAAAGTCAGATTCTACTGGCCGCTGAAGGGCTTTTGAACTCTGCTTAAATCCAGTGGCTGAGTGGACGATG	CCCGGGGGGGGGGJJJJJJGJGJJJGGJ=JJJJJJJJGC=GJJGGJJGJJGG1GCJGGGG=JGGG8C=GCCGC==GGGCGGGGGG=GGG=(G=CCGCCGGGGCJJJJGGGC8GCGCGCG8CGGCCGGGCGCGCGG8CCGG8CGGGGGGGG	NH:i:0	HI:i:0	AS:i:185	nM:i:3	uT:A:1
+BCR-ABL1-24	77	*	0	0	*	*	0	0	CGCAGACCATCAATAAGGAAGATGATGAGTCTCCGGGGCTCTATGGGTTTCTGAATGTCATCGTCCACTCAGCCACTGGATTTAAGCAGAGTTCAAAAGCCCTTCAGCGGCCAGTAGCATCTGACTTTGAGCCTCAGGGGCTGAGTGAAG	CC11GCGGGGGGGJCGJGJJCCJJJJGJJJJGJJGGJJJCJJJG8JJJ1GJ=JGGGGJJJCG=8GGCGCCGGGCCGGGCGGGGCGGGGCCGCGGCCGGG=J1GCCC1(CCGGCGGGCCGCGGGCGGGGC=GGCGCCGCC1GCGGGGGCGG	NH:i:0	HI:i:0	AS:i:154	nM:i:3	uT:A:1
+BCR-ABL1-24	141	*	0	0	*	*	0	0	TTTCACTGGGTCCAGCGAGAAGGTTTTCCTTGGAGTTCCAACGAGCGGCTTCACTCAGACCCTGAGGCTCAAAGTCAGATGCTACTGGCCGCTGAAGGGCTTTTGAACTCTGCTTAAATCCAGTGGCTGAGTGGACGATGACATTCAGAA	C=CCGGGGGGGGCJ1GGJJJJ1JJJJJGJJ=GJJG8GGJ=GJGJJGJJGGGCGJGCGGGCGGG8GG=GJJGCG1GCGGJGCCGGCGGGCCGGGCG8GGGGG8C1==CGGCCCGCGGGGC8GCGGG8GGGCGCCGCCGCGGGCGGGGGGCG	NH:i:0	HI:i:0	AS:i:154	nM:i:3	uT:A:1
+BCR-ABL1-10	77	*	0	0	*	*	0	0	AGGTTGGGGTCATTTTCACTGGGTCCAGCGAGAAGGTTTTCCTTGGAGTTCCAACGAGCGGCTTCACTCAGACCCTGAGGCTCAAAGTCAGATGCTACTGGCCGCTGAAGGGCTTTTGAACTCTGCTTAAATCCAGTGGCTGAGTGGACG	CC=GGGGGGGGGG1GJJJJJCJJJJJJJJJJJGJ=GJJJGCJJJJCJGJGCJGJJJGGJJJGGCCGGJGC=GGJ1C8GGGGGGCGCCGGGGGGCGGGCGCCCG1GGCGCGCGGGCC8GCGCGCGC8CCCGCGCGGGGGCGGGGGCGGCGG	NH:i:0	HI:i:0	AS:i:181	nM:i:2	uT:A:1
+BCR-ABL1-10	141	*	0	0	*	*	0	0	ATAAGGAAGATGATGAGTCTCCGGGGCTCTATGGGTTTCTGAATGTCATCGTCCACTCAGCCACTGGATTTAAGCAGAGTTCAAAAGCCCTTCAGCGGCCAGTAGCATCTGACTTTGAGCCTCAGGGTCTGAGTGAAGCCGCTCGTTGGA	1CCGGCGGGGGG1GGJJJGCC1JJJJCCG=JGGJJGJJJ=GGGGGJJGGGGGGC1J=CJGCGGGGCGC(CGGGGG=GGGGG(G=CGGCGGGGCCCGC=CCCCJJCC8G1GGGGCGGGGGGCGCGGGGGGGCG=GGCCGCCGCC1G=GGGG	NH:i:0	HI:i:0	AS:i:181	nM:i:2	uT:A:1