Mercurial > repos > jjohnson > arriba
comparison arriba.help @ 0:5ebf2354cc9b draft
"planemo upload for repository https://github.com/jj-umn/tools-iuc/tree/arriba/tools/arriba commit 52c9f9825debe783339c13bd1da9a42b59747bd2"
author | jjohnson |
---|---|
date | Thu, 07 Oct 2021 11:47:02 +0000 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:5ebf2354cc9b |
---|---|
1 % arriba -h | |
2 [2021-10-06T19:04:33] Launching Arriba 2.1.0 | |
3 | |
4 Arriba gene fusion detector | |
5 --------------------------- | |
6 Version: 2.1.0 | |
7 | |
8 Arriba is a fast tool to search for aberrant transcripts such as gene fusions. | |
9 It is based on chimeric alignments found by the STAR RNA-Seq aligner. | |
10 | |
11 Usage: arriba [-c Chimeric.out.sam] -x Aligned.out.bam \ | |
12 -g annotation.gtf -a assembly.fa [-b blacklists.tsv] [-k known_fusions.tsv] \ | |
13 [-t tags.tsv] [-p protein_domains.gff3] [-d structural_variants_from_WGS.tsv] \ | |
14 -o fusions.tsv [-O fusions.discarded.tsv] \ | |
15 [OPTIONS] | |
16 | |
17 -c FILE File in SAM/BAM/CRAM format with chimeric alignments as generated by STAR | |
18 (Chimeric.out.sam). This parameter is only required, if STAR was run with the | |
19 parameter '--chimOutType SeparateSAMold'. When STAR was run with the parameter | |
20 '--chimOutType WithinBAM', it suffices to pass the parameter -x to Arriba and -c | |
21 can be omitted. | |
22 | |
23 -x FILE File in SAM/BAM/CRAM format with main alignments as generated by STAR | |
24 (Aligned.out.sam). Arriba extracts candidate reads from this file. | |
25 | |
26 -g FILE GTF file with gene annotation. The file may be gzip-compressed. | |
27 | |
28 -G GTF_FEATURES Comma-/space-separated list of names of GTF features. | |
29 Default: gene_name=gene_name|gene_id gene_id=gene_id | |
30 transcript_id=transcript_id feature_exon=exon feature_CDS=CDS | |
31 | |
32 -a FILE FastA file with genome sequence (assembly). The file may be gzip-compressed. An | |
33 index with the file extension .fai must exist only if CRAM files are processed. | |
34 | |
35 -b FILE File containing blacklisted events (recurrent artifacts and transcripts | |
36 observed in healthy tissue). | |
37 | |
38 -k FILE File containing known/recurrent fusions. Some cancer entities are often | |
39 characterized by fusions between the same pair of genes. In order to boost | |
40 sensitivity, a list of known fusions can be supplied using this parameter. The list | |
41 must contain two columns with the names of the fused genes, separated by tabs. | |
42 | |
43 -o FILE Output file with fusions that have passed all filters. | |
44 | |
45 -O FILE Output file with fusions that were discarded due to filtering. | |
46 | |
47 -t FILE Tab-separated file containing fusions to annotate with tags in the 'tags' column. | |
48 The first two columns specify the genes; the third column specifies the tag. The | |
49 file may be gzip-compressed. | |
50 | |
51 -p FILE File in GFF3 format containing coordinates of the protein domains of genes. The | |
52 protein domains retained in a fusion are listed in the column | |
53 'retained_protein_domains'. The file may be gzip-compressed. | |
54 | |
55 -d FILE Tab-separated file with coordinates of structural variants found using | |
56 whole-genome sequencing data. These coordinates serve to increase sensitivity | |
57 towards weakly expressed fusions and to eliminate fusions with low evidence. | |
58 | |
59 -D MAX_GENOMIC_BREAKPOINT_DISTANCE When a file with genomic breakpoints obtained via | |
60 whole-genome sequencing is supplied via the -d | |
61 parameter, this parameter determines how far a | |
62 genomic breakpoint may be away from a | |
63 transcriptomic breakpoint to consider it as a | |
64 related event. For events inside genes, the | |
65 distance is added to the end of the gene; for | |
66 intergenic events, the distance threshold is | |
67 applied as is. Default: 100000 | |
68 | |
69 -s STRANDEDNESS Whether a strand-specific protocol was used for library preparation, | |
70 and if so, the type of strandedness (auto/yes/no/reverse). When | |
71 unstranded data is processed, the strand can sometimes be inferred from | |
72 splice-patterns. But in unclear situations, stranded data helps | |
73 resolve ambiguities. Default: auto | |
74 | |
75 -i CONTIGS Comma-/space-separated list of interesting contigs. Fusions between genes | |
76 on other contigs are ignored. Contigs can be specified with or without the | |
77 prefix "chr". Asterisks (*) are treated as wild-cards. | |
78 Default: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 X Y AC_* NC_* | |
79 | |
80 -v CONTIGS Comma-/space-separated list of viral contigs. Asterisks (*) are treated as | |
81 wild-cards. | |
82 Default: AC_* NC_* | |
83 | |
84 -f FILTERS Comma-/space-separated list of filters to disable. By default all filters are | |
85 enabled. Valid values: homologs, low_entropy, isoforms, | |
86 top_expressed_viral_contigs, viral_contigs, non_coding_neighbors, | |
87 mismatches, duplicates, no_genomic_support, genomic_support, intronic, | |
88 end_to_end, relative_support, low_coverage_viral_contigs, | |
89 merge_adjacent, mismappers, multimappers, same_gene, long_gap, | |
90 internal_tandem_duplication, small_insert_size, read_through, | |
91 inconsistently_clipped, uninteresting_contigs, intragenic_exonic, | |
92 spliced, hairpin, blacklist, min_support, select_best, in_vitro, | |
93 short_anchor, known_fusions, no_coverage, homopolymer, many_spliced | |
94 | |
95 -E MAX_E-VALUE Arriba estimates the number of fusions with a given number of supporting | |
96 reads which one would expect to see by random chance. If the expected number | |
97 of fusions (e-value) is higher than this threshold, the fusion is | |
98 discarded by the 'relative_support' filter. Note: Increasing this | |
99 threshold can dramatically increase the number of false positives and may | |
100 increase the runtime of resource-intensive steps. Fractional values are | |
101 possible. Default: 0.300000 | |
102 | |
103 -S MIN_SUPPORTING_READS The 'min_support' filter discards all fusions with fewer than | |
104 this many supporting reads (split reads and discordant mates | |
105 combined). Default: 2 | |
106 | |
107 -m MAX_MISMAPPERS When more than this fraction of supporting reads turns out to be | |
108 mismappers, the 'mismappers' filter discards the fusion. Default: | |
109 0.800000 | |
110 | |
111 -L MAX_HOMOLOG_IDENTITY Genes with more than the given fraction of sequence identity are | |
112 considered homologs and removed by the 'homologs' filter. | |
113 Default: 0.300000 | |
114 | |
115 -H HOMOPOLYMER_LENGTH The 'homopolymer' filter removes breakpoints adjacent to | |
116 homopolymers of the given length or more. Default: 6 | |
117 | |
118 -R READ_THROUGH_DISTANCE The 'read_through' filter removes read-through fusions | |
119 where the breakpoints are less than the given distance away | |
120 from each other. Default: 10000 | |
121 | |
122 -A MIN_ANCHOR_LENGTH Alignment artifacts are often characterized by split reads coming | |
123 from only one gene and no discordant mates. Moreover, the split | |
124 reads only align to a short stretch in one of the genes. The | |
125 'short_anchor' filter removes these fusions. This parameter sets | |
126 the threshold in bp for what the filter considers short. Default: 23 | |
127 | |
128 -M MANY_SPLICED_EVENTS The 'many_spliced' filter recovers fusions between genes that | |
129 have at least this many spliced breakpoints. Default: 4 | |
130 | |
131 -K MAX_KMER_CONTENT The 'low_entropy' filter removes reads with repetitive 3-mers. If | |
132 the 3-mers make up more than the given fraction of the sequence, then | |
133 the read is discarded. Default: 0.600000 | |
134 | |
135 -V MAX_MISMATCH_PVALUE The 'mismatches' filter uses a binomial model to calculate a | |
136 p-value for observing a given number of mismatches in a read. If | |
137 the number of mismatches is too high, the read is discarded. | |
138 Default: 0.010000 | |
139 | |
140 -F FRAGMENT_LENGTH When paired-end data is given, the fragment length is estimated | |
141 automatically and this parameter has no effect. But when single-end | |
142 data is given, the mean fragment length should be specified to | |
143 effectively filter fusions that arise from hairpin structures. | |
144 Default: 200 | |
145 | |
146 -U MAX_READS Subsample fusions with more than the given number of supporting reads. This | |
147 improves performance without compromising sensitivity, as long as the | |
148 threshold is high. Counting of supporting reads beyond the threshold is | |
149 inaccurate, obviously. Default: 300 | |
150 | |
151 -Q QUANTILE Highly expressed genes are prone to produce artifacts during library | |
152 preparation. Genes with an expression above the given quantile are eligible | |
153 for filtering by the 'in_vitro' filter. Default: 0.998000 | |
154 | |
155 -e EXONIC_FRACTION The breakpoints of false-positive predictions of intragenic events | |
156 are often both in exons. True predictions are more likely to have at | |
157 least one breakpoint in an intron, because introns are larger. If the | |
158 fraction of exonic sequence between two breakpoints is smaller than | |
159 the given fraction, the 'intragenic_exonic' filter discards the | |
160 event. Default: 0.330000 | |
161 | |
162 -T TOP_N Only report viral integration sites of the top N most highly expressed viral | |
163 contigs. Default: 5 | |
164 | |
165 -C COVERED_FRACTION Ignore virally associated events if the virus is not fully | |
166 expressed, i.e., less than the given fraction of the viral contig is | |
167 transcribed. Default: 0.150000 | |
168 | |
169 -l MAX_ITD_LENGTH Maximum length of internal tandem duplications. Note: Increasing | |
170 this value beyond the default can impair performance and lead to many | |
171 false positives. Default: 100 | |
172 | |
173 -u Instead of performing duplicate marking itself, Arriba relies on duplicate marking by a | |
174 preceding program using the BAM_FDUP flag. This makes sense when unique molecular | |
175 identifiers (UMI) are used. | |
176 | |
177 -X To reduce the runtime and file size, by default, the columns 'fusion_transcript', | |
178 'peptide_sequence', and 'read_identifiers' are left empty in the file containing | |
179 discarded fusion candidates (see parameter -O). When this flag is set, this extra | |
180 information is reported in the discarded fusions file. | |
181 | |
182 -I If assembly of the fusion transcript sequence from the supporting reads is incomplete | |
183 (denoted as '...'), fill the gaps using the assembly sequence wherever possible. | |
184 | |
185 -h Print help and exit. | |
186 | |
187 Code repository: https://github.com/suhrig/arriba | |
188 Get help/report bugs: https://github.com/suhrig/arriba/issues | |
189 User manual: https://arriba.readthedocs.io/ | |
190 Please cite: https://doi.org/10.1101/gr.257246.119 | |
191 |