annotate EGAPXREADME.md @ 8:1680e72e27be draft default tip

planemo upload for repository https://github.com/ncbi/egapx commit bdbe05027c2c40e217a2ff0c9e0556450c443e54
author fubar
date Mon, 05 Aug 2024 03:56:41 +0000
parents c8e1543546f8
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
1
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
1 # Eukaryotic Genome Annotation Pipeline - External (EGAPx)
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
2
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
3 EGAPx is the publicly accessible version of the updated NCBI [Eukaryotic Genome Annotation Pipeline](https://www.ncbi.nlm.nih.gov/genome/annotation_euk/process/).
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
4
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
5 EGAPx takes an assembly fasta file, a taxid of the organism, and RNA-seq data. Based on the taxid, EGAPx will pick protein sets and HMM models. The pipeline runs `miniprot` to align protein sequences, and `STAR` to align RNA-seq to the assembly. Protein alignments and RNA-seq read alignments are then passed to `Gnomon` for gene prediction. In the first step of `Gnomon`, the short alignments are chained together into putative gene models. In the second step, these predictions are further supplemented by _ab-initio_ predictions based on HMM models. The final annotation for the input assembly is produced as a `gff` file.
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
6
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
7 We currently have protein datasets posted that are suitable for most vertebrates and arthropods:
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
8 - Chordata - Mammalia, Sauropsida, Actinopterygii (ray-finned fishes)
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
9 - Insecta - Hymenoptera, Diptera, Lepidoptera, Coleoptera, Hemiptera
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
10 - Arthropoda - Arachnida, other Arthropoda
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
11
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
12 We will be adding datasets for plants and other invertebrates in the next couple of months. Fungi, protists and nematodes are currently out-of-scope for EGAPx pending additional refinements.
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
13
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
14 We currently have protein datasets posted for most vertebrates (mammals, sauropsids, ray-finned fishes) and arthropods. We will be adding datasets for more arthropods, vertebrates and plants in the next couple of months. Fungi, protists and nematodes are currently out-of-scope for EGAPx pending additional refinements.
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
15
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
16 **Warning:**
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
17 The current version is an alpha release with limited features and organism scope to collect initial feedback on execution. Outputs are not yet complete and not intended for production use. Please open a GitHub [Issue](https://github.com/ncbi/egapx/issues) if you encounter any problems with EGAPx. You can also write to cgr@nlm.nih.gov to give us your feedback or if you have any questions.
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
18
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
19
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
20 **Security Notice:**
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
21 EGAPx has dependencies in and outside of its execution path that include several thousand files from the [NCBI C++ toolkit](https://www.ncbi.nlm.nih.gov/toolkit), and more than a million total lines of code. Static Application Security Testing has shown a small number of verified buffer overrun security vulnerabilities. Users should consult with their organizational security team on risk and if there is concern, consider mitigating options like running via VM or cloud instance.
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
22
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
23 **License:**
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
24 See the EGAPx license [here](https://github.com/ncbi/egapx/blob/main/LICENSE).
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
25
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
26
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
27
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
28 ## Prerequisites
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
29
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
30 - Docker or Singularity
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
31 - AWS batch, UGE cluster, or a r6a.4xlarge machine (32 CPUs, 256GB RAM)
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
32 - Nextflow v.23.10.1
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
33 - Python v.3.9+
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
34
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
35 Notes:
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
36 - General configuration for AWS Batch is described in the Nextflow documentation at https://www.nextflow.io/docs/latest/aws.html
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
37 - See Nextflow installation at https://www.nextflow.io/docs/latest/getstarted.html
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
38
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
39 ## The workflow files
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
40
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
41 - Clone the EGAPx repo:
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
42 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
43 git clone https://github.com/ncbi/egapx.git
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
44 cd egapx
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
45 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
46
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
47 ## Input data format
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
48
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
49 Input to EGAPx is in the form of a YAML file.
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
50
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
51 - The following are the _required_ key-value pairs for the input file:
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
52
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
53 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
54 genome: path to assembled genome in FASTA format
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
55 taxid: NCBI Taxonomy identifier of the target organism
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
56 reads: RNA-seq data
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
57 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
58 You can obtain taxid from the [NCBI Taxonomy page](https://www.ncbi.nlm.nih.gov/taxonomy).
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
59
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
60
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
61 - RNA-seq data can be supplied in any one of the following ways:
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
62
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
63 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
64 reads: [ array of paths to reads FASTA or FASTQ files]
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
65 reads: [ array of SRA run IDs ]
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
66 reads: [SRA Study ID]
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
67 reads: SRA query for reads
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
68 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
69 - If you are using your local reads, then the FASTA/FASTQ files should be provided in the following format:
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
70 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
71 reads:
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
72 - path_to_Sample1_R1.gz
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
73 - path_to_Sample1_R2.gz
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
74 - path_to_Sample2_R1.gz
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
75 - path_to_Sample2_R2.gz
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
76 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
77
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
78 - If you provide an SRA Study ID, all the SRA run ID's belonging to that Study ID will be included in the EGAPx run.
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
79
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
80 - The following are the _optional_ key-value pairs for the input file:
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
81
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
82 - A protein set. A taxid-based protein set will be chosen if no protein set is provided.
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
83 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
84 proteins: path to proteins data in FASTA format.
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
85 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
86
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
87 - HMM file used in Gnomon training. A taxid-based HMM will be chosen if no HMM file is provided.
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
88 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
89 hmm: path to HMM file
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
90 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
91
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
92
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
93
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
94 ## Input example
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
95
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
96 - A test example YAML file `./examples/input_D_farinae_small.yaml` is included in the `egapx` folder. Here, the RNA-seq data is provided as paths to the reads FASTA files. These FASTA files are a sampling of the reads from the complete SRA read files to expedite testing.
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
97
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
98
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
99 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
100 genome: https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/020/809/275/GCF_020809275.1_ASM2080927v1/GCF_020809275.1_ASM2080927v1_genomic.fna.gz
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
101 taxid: 6954
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
102 reads:
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
103 - https://ftp.ncbi.nlm.nih.gov/genomes/TOOLS/EGAP/data/Dermatophagoides_farinae_small/SRR8506572.1
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
104 - https://ftp.ncbi.nlm.nih.gov/genomes/TOOLS/EGAP/data/Dermatophagoides_farinae_small/SRR8506572.2
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
105 - https://ftp.ncbi.nlm.nih.gov/genomes/TOOLS/EGAP/data/Dermatophagoides_farinae_small/SRR9005248.1
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
106 - https://ftp.ncbi.nlm.nih.gov/genomes/TOOLS/EGAP/data/Dermatophagoides_farinae_small/SRR9005248.2
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
107 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
108
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
109 - To specify an array of NCBI SRA datasets:
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
110 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
111 reads:
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
112 - SRR8506572
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
113 - SRR9005248
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
114 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
115
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
116 - To specify an SRA entrez query:
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
117 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
118 reads: 'txid6954[Organism] AND biomol_transcript[properties] NOT SRS024887[Accession] AND (SRR8506572[Accession] OR SRR9005248[Accession] )'
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
119 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
120
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
121 **Note:** Both the above examples will have more RNA-seq data than the `input_D_farinae_small.yaml` example. To make sure the entrez query does not produce a large number of SRA runs, please run it first at the [NCBI SRA page](https://www.ncbi.nlm.nih.gov/sra). If there are too many SRA runs, then select a few of them and list it in the input yaml.
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
122
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
123 - First, test EGAPx on the example provided (`input_D_farinae_small.yaml`, a dust mite) to make sure everything works. This example usually runs under 30 minutes depending upon resource availability. There are other examples you can try: `input_C_longicornis.yaml`, a green fly, and `input_Gavia_tellata.yaml`, a bird. These will take close to two hours. You can prepare your input YAML file following these examples.
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
124
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
125 ## Run EGAPx
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
126
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
127 - The `egapx` folder contains the following directories:
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
128 - examples
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
129 - nf
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
130 - test
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
131 - third_party_licenses
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
132 - ui
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
133
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
134 - The runner script is within the ui directory (`ui/egapx.py`). 
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
135
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
136 - Create a virtual environment where you can run EGAPx. There is a `requirements.txt` file. PyYAML will be installed in this environment.
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
137 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
138 python -m venv /path/to/new/virtual/environment
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
139 source /path/to/new/virtual/environment/bin/activate
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
140 pip install -r ui/requirements.txt
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
141 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
142
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
143
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
144
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
145
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
146
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
147 - Run EGAPx for the first time to copy the config files so you can edit them:
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
148 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
149 python3 ui/egapx.py ./examples/input_D_farinae_small.yaml -o example_out
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
150 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
151 - When you run `egapx.py` for the first time it copies the template config files to the directory `./egapx_config`.
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
152 - You will need to edit these templates to reflect the actual parameters of your setup.
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
153 - For AWS Batch execution, set up AWS Batch Service following advice in the AWS link above. Then edit the value for `process.queue` in `./egapx_config/aws.config` file.
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
154 - For execution on the local machine you don't need to adjust anything.
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
155
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
156
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
157 - Run EGAPx with the following command for real this time.
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
158 - For AWS Batch execution, replace temp_datapath with an existing S3 bucket.
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
159 - For local execution, use a local path for `-w`
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
160 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
161 python3 ui/egapx.py ./examples/input_D_farinae_small.yaml -e aws -w s3://temp_datapath/D_farinae -o example_out
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
162 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
163
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
164 - use `-e aws` for AWS batch using Docker image
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
165 - use `-e docker` for using Docker image
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
166 - use `-e singularity` for using the Singularity image
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
167 - use `-e biowulf_cluster` for Biowulf cluster using Singularity image
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
168 - use '-e slurm` for using SLURM in your HPC.
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
169 - Note that for this option, you have to edit `./egapx_config/slurm.config` according to your cluster specifications.
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
170 - type `python3 ui/egapx.py  -h ` for the help menu
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
171
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
172 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
173 $ ui/egapx.py -h
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
174
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
175
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
176 !!WARNING!!
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
177 This is an alpha release with limited features and organism scope to collect initial feedback on execution. Outputs are not yet complete and not intended for production use.
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
178
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
179 usage: egapx.py [-h] [-o OUTPUT] [-e EXECUTOR] [-c CONFIG_DIR] [-w WORKDIR] [-r REPORT] [-n] [-st]
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
180 [-so] [-dl] [-lc LOCAL_CACHE] [-q] [-v] [-fn FUNC_NAME]
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
181 [filename]
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
182
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
183 Main script for EGAPx
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
184
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
185 optional arguments:
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
186 -h, --help show this help message and exit
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
187 -e EXECUTOR, --executor EXECUTOR
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
188 Nextflow executor, one of docker, singularity, aws, or local (for NCBI
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
189 internal use only). Uses corresponding Nextflow config file
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
190 -c CONFIG_DIR, --config-dir CONFIG_DIR
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
191 Directory for executor config files, default is ./egapx_config. Can be also
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
192 set as env EGAPX_CONFIG_DIR
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
193 -w WORKDIR, --workdir WORKDIR
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
194 Working directory for cloud executor
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
195 -r REPORT, --report REPORT
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
196 Report file prefix for report (.report.html) and timeline (.timeline.html)
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
197 files, default is in output directory
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
198 -n, --dry-run
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
199 -st, --stub-run
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
200 -so, --summary-only Print result statistics only if available, do not compute result
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
201 -lc LOCAL_CACHE, --local-cache LOCAL_CACHE
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
202 Where to store the downloaded files
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
203 -q, --quiet
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
204 -v, --verbose
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
205 -fn FUNC_NAME, --func_name FUNC_NAME
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
206 func_name
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
207
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
208 run:
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
209 filename YAML file with input: section with at least genome: and reads: parameters
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
210 -o OUTPUT, --output OUTPUT
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
211 Output path
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
212
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
213 download:
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
214 -dl, --download-only Download external files to local storage, so that future runs can be
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
215 isolated
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
216
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
217
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
218 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
219
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
220
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
221 ## Test run
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
222
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
223 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
224 $ python3 ui/egapx.py examples/input_D_farinae_small.yaml -e aws -o example_out -w s3://temp_datapath/D_farinae
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
225
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
226 !!WARNING!!
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
227 This is an alpha release with limited features and organism scope to collect initial feedback on execution. Outputs are not yet complete and not intended for production use.
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
228
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
229 N E X T F L O W ~ version 23.10.1
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
230 Launching `/../home/user/egapx/ui/../nf/ui.nf` [golden_mercator] DSL2 - revision: c134f40af5
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
231 in egapx block
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
232 executor > awsbatch (67)
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
233 [f5/3007b8] process > egapx:setup_genome:get_genome_info [100%] 1 of 1 ✔
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
234 [32/a1bfa5] process > egapx:setup_proteins:convert_proteins [100%] 1 of 1 ✔
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
235 [96/621c4b] process > egapx:miniprot:run_miniprot [100%] 1 of 1 ✔
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
236 [6d/766c2f] process > egapx:paf2asn:run_paf2asn [100%] 1 of 1 ✔
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
237 [56/f1dd6b] process > egapx:best_aligned_prot:run_best_aligned_prot [100%] 1 of 1 ✔
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
238 [c1/ccc4a3] process > egapx:align_filter_sa:run_align_filter_sa [100%] 1 of 1 ✔
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
239 [e0/5548d0] process > egapx:run_align_sort [100%] 1 of 1 ✔
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
240 [a8/456a0e] process > egapx:star_index:build_index [100%] 1 of 1 ✔
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
241 [d5/6469a6] process > egapx:star_simplified:exec (1) [100%] 2 of 2 ✔
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
242 [64/99ab35] process > egapx:bam_strandedness:exec (2) [100%] 2 of 2 ✔
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
243 [98/a12969] process > egapx:bam_strandedness:merge [100%] 1 of 1 ✔
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
244 [78/0d7007] process > egapx:bam_bin_and_sort:calc_assembly_sizes [100%] 1 of 1 ✔
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
245 [74/bb014e] process > egapx:bam_bin_and_sort:bam_bin (2) [100%] 2 of 2 ✔
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
246 [39/3cdd00] process > egapx:bam_bin_and_sort:merge_prepare [100%] 1 of 1 ✔
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
247 [01/f64e38] process > egapx:bam_bin_and_sort:merge (1) [100%] 1 of 1 ✔
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
248 [aa/47a002] process > egapx:bam2asn:convert (1) [100%] 1 of 1 ✔
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
249 [45/6661b3] process > egapx:rnaseq_collapse:generate_jobs [100%] 1 of 1 ✔
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
250 [64/68bc37] process > egapx:rnaseq_collapse:run_rnaseq_collapse (3) [100%] 9 of 9 ✔
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
251 [18/bff1ac] process > egapx:rnaseq_collapse:run_gpx_make_outputs [100%] 1 of 1 ✔
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
252 [a4/76a4a5] process > egapx:get_hmm_params:run_get_hmm [100%] 1 of 1 ✔
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
253 [3c/b71c42] process > egapx:chainer:run_align_sort (1) [100%] 1 of 1 ✔
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
254 [e1/340b6d] process > egapx:chainer:generate_jobs [100%] 1 of 1 ✔
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
255 [c0/477d02] process > egapx:chainer:run_chainer (16) [100%] 16 of 16 ✔
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
256 [9f/27c1c8] process > egapx:chainer:run_gpx_make_outputs [100%] 1 of 1 ✔
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
257 [5c/8f65d0] process > egapx:gnomon_wnode:gpx_qsubmit [100%] 1 of 1 ✔
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
258 [34/6ab0c9] process > egapx:gnomon_wnode:annot (1) [100%] 10 of 10 ✔
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
259 [a9/e38221] process > egapx:gnomon_wnode:gpx_qdump [100%] 1 of 1 ✔
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
260 [bc/8ebca4] process > egapx:annot_builder:annot_builder_main [100%] 1 of 1 ✔
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
261 [5f/6b72c0] process > egapx:annot_builder:annot_builder_input [100%] 1 of 1 ✔
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
262 [eb/1ccdd0] process > egapx:annot_builder:annot_builder_run [100%] 1 of 1 ✔
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
263 [4d/6c33db] process > egapx:annotwriter:run_annotwriter [100%] 1 of 1 ✔
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
264 [b6/d73d18] process > export [100%] 1 of 1 ✔
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
265 Waiting for file transfers to complete (1 files)
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
266 Completed at: 27-Mar-2024 11:43:15
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
267 Duration : 27m 36s
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
268 CPU hours : 4.2
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
269 Succeeded : 67
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
270 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
271 ## Output
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
272
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
273 Look at the output in the out diectory (`example_out`) that was supplied in the command line. The annotation file is called `accept.gff`.
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
274 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
275 accept.gff
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
276 annot_builder_output
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
277 nextflow.log
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
278 run.report.html
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
279 run.timeline.html
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
280 run.trace.txt
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
281 run_params.yaml
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
282 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
283 The `nextflow.log` is the log file that captures all the process information and their work directories. `run_params.yaml` has all the parameters that were used in the EGAPx run. More information about the process time and resources can be found in the other run* files.
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
284
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
285
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
286
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
287 ## Intermediate files
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
288
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
289 In the above log, each line denotes the process that completed in the workflow. The first column (_e.g._ `[96/621c4b]`) is the subdirectory where the intermediate output files and logs are found for the process in the same line, _i.e._, `egapx:miniprot:run_miniprot`. To see the intermediate files for that process, you can go to the work directory path that you had supplied and traverse to the subdirectory `96/621c4b`:
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
290
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
291 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
292 $ aws s3 ls s3://temp_datapath/D_farinae/96/
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
293 PRE 06834b76c8d7ceb8c97d2ccf75cda4/
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
294 PRE 621c4ba4e6e87a4d869c696fe50034/
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
295 $ aws s3 ls s3://temp_datapath/D_farinae/96/621c4ba4e6e87a4d869c696fe50034/
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
296 PRE output/
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
297 2024-03-27 11:19:18 0
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
298 2024-03-27 11:19:28 6 .command.begin
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
299 2024-03-27 11:20:24 762 .command.err
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
300 2024-03-27 11:20:26 762 .command.log
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
301 2024-03-27 11:20:23 0 .command.out
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
302 2024-03-27 11:19:18 13103 .command.run
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
303 2024-03-27 11:19:18 129 .command.sh
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
304 2024-03-27 11:20:24 276 .command.trace
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
305 2024-03-27 11:20:25 1 .exitcode
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
306 $ aws s3 ls s3://temp_datapath/D_farinae/96/621c4ba4e6e87a4d869c696fe50034/output/
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
307 2024-03-27 11:20:24 17127134 aligns.paf
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
308 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
309
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
310 ## Offline mode
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
311
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
312 If you do not have internet access from your cluster, you can run EGAPx in offline mode. To do this, you would first pull the Singularity image, then download the necessary files from NCBI FTP using `egapx.py` script, and then finally use the path of the downloaded folder in the run command. Here is an example of how to download the files and execute EGAPx in the Biowulf cluster.
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
313
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
314
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
315 - Download the Singularity image:
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
316 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
317 rm egap*sif
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
318 singularity cache clean
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
319 singularity pull docker://ncbi/egapx:0.2-alpha
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
320 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
321
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
322 - Clone the repo:
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
323 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
324 git clone https://github.com/ncbi/egapx.git
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
325 cd egapx
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
326 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
327
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
328 - Download EGAPx related files from NCBI:
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
329 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
330 python3 ui/egapx.py -dl -lc ../local_cache
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
331 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
332
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
333 - Download SRA reads:
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
334 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
335 prefetch SRR8506572
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
336 prefetch SRR9005248
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
337 fasterq-dump --skip-technical --threads 6 --split-files --seq-defline ">\$ac.\$si.\$ri" --fasta -O sradir/ ./SRR8506572
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
338 fasterq-dump --skip-technical --threads 6 --split-files --seq-defline ">\$ac.\$si.\$ri" --fasta -O sradir/ ./SRR9005248
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
339
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
340 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
341 You should see downloaded files inside the 'sradir' folder":
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
342 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
343 ls sradir/
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
344 SRR8506572_1.fasta SRR8506572_2.fasta SRR9005248_1.fasta SRR9005248_2.fasta
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
345 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
346 Now edit the file paths of SRA reads files in `examples/input_D_farinae_small.yaml` to include the above SRA files.
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
347
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
348 - Run `egapx.py` first to edit the `biowulf_cluster.config`:
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
349 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
350 ui/egapx.py examples/input_D_farinae_small.yaml -e biowulf_cluster -w dfs_work -o dfs_out -lc ../local_cache
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
351 echo "process.container = '/path_to_/egapx_0.2-alpha.sif'" >> egapx_config/biowulf_cluster.config
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
352 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
353
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
354 - Run `egapx.py`:
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
355 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
356 ui/egapx.py examples/input_D_farinae_small.yaml -e biowulf_cluster -w dfs_work -o dfs_out -lc ../local_cache
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
357
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
358 ```
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
359
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
360
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
361 ## References
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
362
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
363 Buchfink B, Reuter K, Drost HG. Sensitive protein alignments at tree-of-life scale using DIAMOND. Nat Methods. 2021 Apr;18(4):366-368. doi: 10.1038/s41592-021-01101-x. Epub 2021 Apr 7. PMID: 33828273; PMCID: PMC8026399.
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
364
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
365 Danecek P, Bonfield JK, Liddle J, Marshall J, Ohan V, Pollard MO, Whitwham A, Keane T, McCarthy SA, Davies RM, Li H. Twelve years of SAMtools and BCFtools. Gigascience. 2021 Feb 16;10(2):giab008. doi: 10.1093/gigascience/giab008. PMID: 33590861; PMCID: PMC7931819.
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
366
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
367 Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013 Jan 1;29(1):15-21. doi: 10.1093/bioinformatics/bts635. Epub 2012 Oct 25. PMID: 23104886; PMCID: PMC3530905.
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
368
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
369 Li H. Protein-to-genome alignment with miniprot. Bioinformatics. 2023 Jan 1;39(1):btad014. doi: 10.1093/bioinformatics/btad014. PMID: 36648328; PMCID: PMC9869432.
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
370
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
371 Shen W, Le S, Li Y, Hu F. SeqKit: A Cross-Platform and Ultrafast Toolkit for FASTA/Q File Manipulation. PLoS One. 2016 Oct 5;11(10):e0163962. doi: 10.1371/journal.pone.0163962. PMID: 27706213; PMCID: PMC5051824.
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
372
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
373
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
374
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
375 ## Contact us
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
376
c8e1543546f8 planemo upload for repository https://github.com/ncbi/egapx commit 8173d01b08d9a91c9ec5f6cb50af346edc8020c4-dirty
fubar
parents:
diff changeset
377 Please open a GitHub [Issue](https://github.com/ncbi/egapx/issues) if you encounter any problems with EGAPx. You can also write to cgr@nlm.nih.gov to give us your feedback or if you have any questions.