Mercurial > repos > ecology > eml_validate

diff test-data/annotemp/pivot_wider_jupytool_notebook.ipynb @ 0:ad96b20423cf draft
planemo upload for repository https://github.com/galaxyecology/tools-ecology/tree/master/tools/EMLassemblyline commit 4b040fe7867d965fb88ce70cc08081367b62b063
author: ecology
date: Fri, 27 Sep 2024 13:01:04 +0000
--- /dev/null	Thu Jan 01 00:00:00 1970 +0000
+++ b/test-data/annotemp/pivot_wider_jupytool_notebook.ipynb	Fri Sep 27 13:01:04 2024 +0000
@@ -0,0 +1,85 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Pivot wider Jupytool "
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "This Jupyter notebook is dedicated to the pivot_wider function from the tidyr R package. \n",
+    "This script is the final part of the data preparation for the ecoregionalization Galaxy workflow.   "
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 62,
+   "metadata": {
+    "tags": []
+   },
+   "outputs": [],
+   "source": [
+    "#Date : 22/05/2024\n",
+    "#Author : Seguineau Pauline & Yvan Le Bras \n",
+    "\n",
+    "#Load libraries\n",
+    "library(tidyr)\n",
+    "\n",
+    "#load file \n",
+    "\n",
+    "input_path = \"galaxy_inputs\"\n",
+    "\n",
+    "for (dir in list.dirs(input_path)){\n",
+    "    for (file in list.files(dir)) {\n",
+    "        file_path = file.path(dir, file)}\n",
+    "}\n",
+    "\n",
+    "file = read.table(file_path,header=T, sep = \"\\t\")\n",
+    "\n",
+    "#Run pivot_wider function\n",
+    "pivot_file = pivot_wider(data = file,\n",
+    "                        names_from = phylum_class_order_family_genus_specificEpithet,\n",
+    "                        values_from = individualCount,\n",
+    "                        values_fill = 0,\n",
+    "                        values_fn = sum)\n",
+    "\n",
+    "#Replace all occurences >= 1 by 1 to have only presence (1) or absence (0) data\n",
+    "for(c in 3:length(pivot_file)){\n",
+    "    pivot_file[c][pivot_file[c]>=1] <- 1}\n",
+    "\n",
+    "\n",
+    "write.table(pivot_file, \"outputs/pivot_file.tabular\", sep = \"\\t\", quote = F, row.names = F)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "In this Jupyter notebook, we used the pivot_wider function of the tidyr package to transform our data into a wider format and adapted to subsequent analyses as part of the Galaxy workflow for ecoregionalization. This transformation allowed us to convert our data to a format where each taxon becomes a separate column. We also took care to fill in the missing values with zeros and to sum the individual counts in case of duplications. Then all data >= 1 are replace by 1 to have only presence (1) or abscence (0) data.\n",
+    "\n",
+    "Thus, this notebook is an essential building block of our analysis pipeline, ensuring that the data is properly formatted and ready to be explored and interpreted for ecoregionalization studies."
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "R",
+   "language": "R",
+   "name": "ir"
+  },
+  "language_info": {
+   "codemirror_mode": "r",
+   "file_extension": ".r",
+   "mimetype": "text/x-r-source",
+   "name": "R",
+   "pygments_lexer": "r",
+   "version": "4.0.3"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
author	ecology
date	Fri, 27 Sep 2024 13:01:04 +0000
parents
children