Mercurial > repos > ecology > eml_validate
diff test-data/annotemp/pivot_wider_jupytool_notebook.ipynb @ 0:ad96b20423cf draft default tip
planemo upload for repository https://github.com/galaxyecology/tools-ecology/tree/master/tools/EMLassemblyline commit 4b040fe7867d965fb88ce70cc08081367b62b063
author | ecology |
---|---|
date | Fri, 27 Sep 2024 13:01:04 +0000 |
parents | |
children |
line wrap: on
line diff
--- /dev/null Thu Jan 01 00:00:00 1970 +0000 +++ b/test-data/annotemp/pivot_wider_jupytool_notebook.ipynb Fri Sep 27 13:01:04 2024 +0000 @@ -0,0 +1,85 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Pivot wider Jupytool " + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "This Jupyter notebook is dedicated to the pivot_wider function from the tidyr R package. \n", + "This script is the final part of the data preparation for the ecoregionalization Galaxy workflow. " + ] + }, + { + "cell_type": "code", + "execution_count": 62, + "metadata": { + "tags": [] + }, + "outputs": [], + "source": [ + "#Date : 22/05/2024\n", + "#Author : Seguineau Pauline & Yvan Le Bras \n", + "\n", + "#Load libraries\n", + "library(tidyr)\n", + "\n", + "#load file \n", + "\n", + "input_path = \"galaxy_inputs\"\n", + "\n", + "for (dir in list.dirs(input_path)){\n", + " for (file in list.files(dir)) {\n", + " file_path = file.path(dir, file)}\n", + "}\n", + "\n", + "file = read.table(file_path,header=T, sep = \"\\t\")\n", + "\n", + "#Run pivot_wider function\n", + "pivot_file = pivot_wider(data = file,\n", + " names_from = phylum_class_order_family_genus_specificEpithet,\n", + " values_from = individualCount,\n", + " values_fill = 0,\n", + " values_fn = sum)\n", + "\n", + "#Replace all occurences >= 1 by 1 to have only presence (1) or absence (0) data\n", + "for(c in 3:length(pivot_file)){\n", + " pivot_file[c][pivot_file[c]>=1] <- 1}\n", + "\n", + "\n", + "write.table(pivot_file, \"outputs/pivot_file.tabular\", sep = \"\\t\", quote = F, row.names = F)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In this Jupyter notebook, we used the pivot_wider function of the tidyr package to transform our data into a wider format and adapted to subsequent analyses as part of the Galaxy workflow for ecoregionalization. This transformation allowed us to convert our data to a format where each taxon becomes a separate column. We also took care to fill in the missing values with zeros and to sum the individual counts in case of duplications. Then all data >= 1 are replace by 1 to have only presence (1) or abscence (0) data.\n", + "\n", + "Thus, this notebook is an essential building block of our analysis pipeline, ensuring that the data is properly formatted and ready to be explored and interpreted for ecoregionalization studies." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "R", + "language": "R", + "name": "ir" + }, + "language_info": { + "codemirror_mode": "r", + "file_extension": ".r", + "mimetype": "text/x-r-source", + "name": "R", + "pygments_lexer": "r", + "version": "4.0.3" + } + }, + "nbformat": 4, + "nbformat_minor": 4 +}