| Previous changeset 23:95be88b49f4e (2025-08-04) Next changeset 25:bc18e25d4204 (2026-01-14) |
|
Commit message:
planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/main/tools/ncbi_datasets commit 5a65a62588a36d757f96681bf72f537c12c91beb |
|
modified:
datasets_gene.xml datasets_genome.xml macros.xml |
| b |
| diff -r 95be88b49f4e -r 94e32337ba54 datasets_gene.xml --- a/datasets_gene.xml Mon Aug 04 08:51:59 2025 +0000 +++ b/datasets_gene.xml Fri Dec 26 17:17:02 2025 +0000 |
| [ |
| @@ -209,7 +209,7 @@ <filter>file_choices['kingdom_cond']['include'] and "cds" in file_choices['kingdom_cond']['include']</filter> </data> <data name="threep_utr_fasta" label="NCBI Gene Datasets: 3' UTR fasta" format="fasta" from_work_dir="ncbi_dataset/data/3p_utr.fna"> - <filter>file_choices['kingdom_cond']['include'] and "5p-utr" in file_choices['kingdom_cond']['include']</filter> + <filter>file_choices['kingdom_cond']['include'] and "3p-utr" in file_choices['kingdom_cond']['include']</filter> </data> <data name="fivep_utr_fasta" label="NCBI Gene Datasets: 5' UTR fasta" format="fasta" from_work_dir="ncbi_dataset/data/5p_utr.fna"> <filter>file_choices['kingdom_cond']['include'] and "5p-utr" in file_choices['kingdom_cond']['include']</filter> @@ -513,15 +513,60 @@ </output> </test> --> </tests> - <help> -<![CDATA[ -**Download Gene Datasets from NCBI** + <help><![CDATA[ +.. class:: infomark + +**What it does** + +Downloads gene data from NCBI using the `datasets`_ command-line tool. +Retrieve gene sequences, transcripts, proteins, and annotation reports. + +**Query Options** + +============= ================================================================ +Method Description +============= ================================================================ +Gene ID NCBI Gene ID (e.g., 672 for BRCA1) +Symbol Gene symbol with taxon (e.g., TP53 in human) +Accession RefSeq nucleotide (NM\_) or protein (NP\_/WP\_) accession +Taxon All genes for a taxon (large downloads) +============= ================================================================ + +---- + +**Key Options** + +- **Ortholog retrieval**: Get orthologous genes across taxa (vertebrates/insects) +- **Taxon filter**: Limit WP\_ accession results to specific organisms +- **Flanking sequence**: Include nucleotides upstream/downstream (WP\_ only) +- **FASTA filter**: Subset output to specific accessions -Download a gene dataset (gene sequence, transcipt, amino acid sequences, -nucleotide coding sequences, 5'-UTR, 3'-UTR) as well as gene and gene -product reports. Genes can be referred by gene id, symbol, accession, -or taxon. -]]> - </help> +**Outputs (Eukaryote)** + +- **Gene Data Report**: Tabular metadata (ID, symbol, description, coordinates) +- **Gene Product Report**: Detailed transcript/protein information +- **Sequences**: Gene, RNA, protein, CDS, 5'/3' UTR FASTA files + +**Outputs (Prokaryote)** + +Prokaryotic genes (WP\_ accessions) use a different report format with: +accession, description, EC number, gene symbol, protein info. + +**Examples** + +Download human BRCA1:: + + Query by: Gene ID + Gene ID: 672 + +Download TP53 orthologs in rodents:: + + Query by: Symbol + Symbol: tp53 + Ortholog: rodentia + + +.. _datasets: https://www.ncbi.nlm.nih.gov/datasets/ +]]></help> <expand macro="citations"/> </tool> |
| b |
| diff -r 95be88b49f4e -r 94e32337ba54 datasets_genome.xml --- a/datasets_genome.xml Mon Aug 04 08:51:59 2025 +0000 +++ b/datasets_genome.xml Fri Dec 26 17:17:02 2025 +0000 |
| [ |
| b'@@ -46,7 +46,7 @@\n @RELEASED_BEFORE@\n @RELEASED_AFTER@\n #for search_term in $filters.search:\n- --search \'$filters.search_term\'\n+ --search \'$search_term.search\'\n #end for\n --no-progressbar\n --dehydrated\n@@ -191,18 +191,18 @@\n <output name="genome_data_report">\n <assert_contents>\n <has_text text="Assembly Accession	Assembly Name	Assembly Submitter	Organism Name"/>\n- <has_n_lines n="142"/>\n+ <has_n_lines min="140"/>\n <has_n_columns n="4"/>\n </assert_contents>\n </output>\n- <output_collection name="rna_fasta" type="list" count="2">\n+ <output_collection name="rna_fasta" type="list">\n <element name="GCF_000306695.2" decompress="true">\n <assert_contents>\n <has_text text=">"/>\n </assert_contents>\n </element>\n </output_collection>\n- <output_collection name="genomic_gff" type="list" count="12">\n+ <output_collection name="genomic_gff" type="list">\n <element name="GCF_000306695.2">\n <assert_contents>\n <has_n_lines min="1000000"/>\n@@ -218,29 +218,25 @@\n <test expect_num_outputs="2">\n <conditional name="query|subcommand">\n <param name="download_by" value="taxon"/>\n- <param name="taxon_positional" value="human"/>\n+ <param name="taxon_positional" value="Norway rat"/>\n </conditional>\n <section name="filters">\n- <param name="chromosomes" value="21"/>\n- <param name="assembly_level" value="chromosome,complete"/> \n- <param name="released_before" value="01/01/2018"/>\n+ <param name="chromosomes" value="MT"/>\n </section>\n <section name="file_choices">\n <param name="include" value="genome"/>\n <param name="decompress" value="true"/>\n </section>\n- <output_collection name="genome_fasta" type="list:list" count="11">\n- <expand macro="genome_fasta_assert" el1="GCA_000002115.2" el2="chr21" expression=">"/>\n- <expand macro="genome_fasta_assert" el1="GCA_000002125.2" el2="chr21" expression=">"/>\n- <expand macro="genome_fasta_assert" el1="GCA_000212995.1" el2="chr21" expression=">"/>\n- <expand macro="genome_fasta_assert" el1="GCA_000306695.2" el2="chr21" expression=">"/>\n- <expand macro="genome_fasta_assert" el1="GCA_000365445.1" el2="chr21" expression=">"/>\n- <expand macro="genome_fasta_assert" el1="GCA_001292825.2" el2="chr21" expression=">"/>\n- <expand macro="genome_fasta_assert" el1="GCA_001524155.4" el2="chr21" expression=">"/>\n- <expand macro="genome_fasta_assert" el1="GCA_001712695.1" el2="chr21" expression=">"/>\n- <expand macro="genome_fasta_assert" el1="GCA_022833125.2" el2="chr21" expression=">"/>\n- <expand macro="genome_fasta_assert" el1="GCF_000002125.1" el2="chr21" expression=">"/>\n- <expand macro="genome_fasta_assert" el1="GCF_000306695.2" el2="chr21" expression=">"/>\n+ <output_collection name="genome_fasta" type="list:list" count="9">\n+ <expand macro="genome_fasta_assert" el1="GCA_000001895.4" el2="chrMT" expression=">"/>\n+ <expand macro="genome_fasta_assert" el1="GCA_015227675.2" el2="chrMT" expression=">"/>\n+ <expand macro="genome_fasta_assert" el1="GCA_036323735.1" el2="chrMT" expression=">"/>\n+ <expand macro="genome_fasta_assert" el1="GCA_041222355.1" el2="chrMT" expression=">"/>\n+ <expand macro="genome_fasta_assert" el1="GCA_045687965.1" el2="chrMT" expression=">"/>'..b'arCen1.1_normalized" expression=">" expression_n="25"/>\n@@ -249,7 +245,7 @@\n </output_collection>\n <output name="genome_data_report">\n <assert_contents>\n- <has_text text="Homo sapiens"/>\n+ <has_text text="Rattus norvegicus"/>\n <has_n_columns n="4"/>\n </assert_contents>\n </output>\n@@ -495,25 +491,72 @@\n </assert_contents>\n </output>\n </test>\n+ <!-- test search filter -->\n+ <test expect_num_outputs="1">\n+ <conditional name="query|subcommand">\n+ <param name="download_by" value="taxon"/>\n+ <param name="taxon_positional" value="Streptococcus"/>\n+ </conditional>\n+ <section name="filters">\n+ <repeat name="search">\n+ <param name="search" value="pyogenes"/>\n+ </repeat>\n+ </section>\n+ <section name="file_choices">\n+ <param name="include" value_json="null"/>\n+ </section>\n+ <output name="genome_data_report">\n+ <assert_contents>\n+ <has_text text="pyogenes"/>\n+ </assert_contents>\n+ </output>\n+ </test>\n </tests>\n- <help>\n-<![CDATA[\n-**Download Genome Datasets from NCBI**\n+ <help><![CDATA[\n+.. class:: infomark\n \n-Download a genome dataset including genome, transcript and protein sequence, annotation and a detailed data report.\n-Genome datasets can be specified by NCBI Assembly or BioProject accession(s) or by taxon.\n+**What it does**\n+\n+Downloads genome assemblies from NCBI using the `datasets`_ command-line tool.\n+Retrieve genome sequences, annotations, and metadata by accession or taxon.\n \n-The download is a three step process:\n+**Query Options**\n+\n+- **By Accession**: NCBI Assembly (GCF\\_/GCA\\_) or BioProject accession\n+- **By Taxon**: Taxonomy ID, scientific name, or common name\n \n-1. A "dehydrated" zip file is downloaded which includes the metadata and the download URL)\n-2. The metadata is transformed into a tabular (TSV) file\n-3. The data is hydrated (the actual data is downloaded)\n+**Filters**\n+\n+==================== ===============================================\n+Filter Description\n+==================== ===============================================\n+Reference only Limit to reference/representative assemblies\n+Annotated only Include only genomes with annotations\n+Assembly level Chromosome, complete, contig, or scaffold\n+Assembly source RefSeq (GCF\\_) or GenBank (GCA\\_)\n+Exclude atypical Remove atypical assemblies (e.g., partial)\n+MAG filter Include/exclude metagenome-assembled genomes\n+Date range Filter by release date\n+==================== ===============================================\n+\n+----\n \n-The 3rd step can be skipped by unselecting all output types in the `Include` parameter.\n-Thereby its possible to inspect the metadata prior to the actual data download. Also this\n-allows to use the tool for querying data sets (and their accessions) of interest which\n-can then be downloaded in a second call using the accessions.\n-]]>\n- </help>\n+.. class:: warningmark\n+\n+**Note**: The "Reference only" filter returns only RefSeq (GCF\\_) assemblies.\n+If a taxon has only GenBank (GCA\\_) assemblies, this filter will return no results\n+with a misleading error message. It is a NCBI datasets bug (not a Galaxy bug).\n+\n+**Outputs**\n+\n+- **Data Report**: Tabular metadata for matching assemblies\n+- **Genome FASTA**: Genomic sequences (nested collection by accession)\n+- **Annotation files**: GFF3, GTF, GenBank flat files\n+- **Protein/RNA/CDS**: Amino acid and nucleotide sequences\n+- **Sequence Report**: Per-sequence metadata (chromosome, length, etc.)\n+\n+.. _datasets: https://www.ncbi.nlm.nih.gov/datasets/\n+\n+]]></help>\n <expand macro="citations"/>\n </tool>\n' |
| b |
| diff -r 95be88b49f4e -r 94e32337ba54 macros.xml --- a/macros.xml Mon Aug 04 08:51:59 2025 +0000 +++ b/macros.xml Fri Dec 26 17:17:02 2025 +0000 |
| b |
| @@ -1,5 +1,5 @@ <macros> - <token name="@TOOL_VERSION@">18.5.1</token> + <token name="@TOOL_VERSION@">18.13.0</token> <token name="@VERSION_SUFFIX@">0</token> <token name="@PROFILE@">23.0</token> <token name="@LICENSE@">MIT</token> @@ -95,7 +95,7 @@ <xml name="genome_includes"> <option value="genome" selected="true">genomic sequence (genome)</option> <option value="rna">transcript (rna)</option> - <option value="protein">amnio acid sequences (protein)</option> + <option value="protein">amino acid sequences (protein)</option> <option value="cds">nucleotide coding sequences (cds)</option> <option value="gff3">general feature file (gff3)</option> <option value="gtf">gene transfer format (gtf)</option> @@ -105,7 +105,7 @@ </xml> <xml name="gene_includes"> <option value="gene">gene sequence (gene)</option> - <option value="protein" selected="true">amnio acid sequences (protein)</option> + <option value="protein" selected="true">amino acid sequences (protein)</option> <yield/> </xml> @@ -402,11 +402,8 @@ </xml> <xml name="citations"> <citations> - <citation type="bibtex">@misc{NCBI, - author = "{NCBI}", - title = "NCBI Datasets", - year = "2022", - url = "https://github.com/ncbi/datasets"} + <citation type="doi"> + 10.1038/s41597-024-03571-y </citation> </citations> </xml> |