makeeml: test-data/annotemp/pivot_wider_jupytool

comparison test-data/annotemp/pivot_wider_jupytool_notebook.ipynb @ 6:4cf24a95c4ff draft

planemo upload for repository https://github.com/galaxyecology/tools-ecology/tree/master/tools/EMLassemblyline commit 2d36dc964f548b5acbc43ffd78e51e6fc7dc80bb

author	ecology
date	Tue, 10 Sep 2024 12:53:27 +0000
parents
children

comparison

equal deleted inserted replaced

-:34dcb86a9351
+:4cf24a95c4ff
+{
+"cells": [
+{
+"cell_type": "markdown",
+"metadata": {},
+"source": [
+"# Pivot wider Jupytool "
+]
+},
+{
+"cell_type": "markdown",
+"metadata": {},
+"source": [
+"This Jupyter notebook is dedicated to the pivot_wider function from the tidyr R package. \n",
+"This script is the final part of the data preparation for the ecoregionalization Galaxy workflow.   "
+]
+},
+{
+"cell_type": "code",
+"execution_count": 62,
+"metadata": {
+"tags": []
+},
+"outputs": [],
+"source": [
+"#Date : 22/05/2024\n",
+"#Author : Seguineau Pauline & Yvan Le Bras \n",
+"\n",
+"#Load libraries\n",
+"library(tidyr)\n",
+"\n",
+"#load file \n",
+"\n",
+"input_path = \"galaxy_inputs\"\n",
+"\n",
+"for (dir in list.dirs(input_path)){\n",
+"    for (file in list.files(dir)) {\n",
+"        file_path = file.path(dir, file)}\n",
+"}\n",
+"\n",
+"file = read.table(file_path,header=T, sep = \"\\t\")\n",
+"\n",
+"#Run pivot_wider function\n",
+"pivot_file = pivot_wider(data = file,\n",
+"                        names_from = phylum_class_order_family_genus_specificEpithet,\n",
+"                        values_from = individualCount,\n",
+"                        values_fill = 0,\n",
+"                        values_fn = sum)\n",
+"\n",
+"#Replace all occurences >= 1 by 1 to have only presence (1) or absence (0) data\n",
+"for(c in 3:length(pivot_file)){\n",
+"    pivot_file[c][pivot_file[c]>=1] <- 1}\n",
+"\n",
+"\n",
+"write.table(pivot_file, \"outputs/pivot_file.tabular\", sep = \"\\t\", quote = F, row.names = F)"
+]
+},
+{
+"cell_type": "markdown",
+"metadata": {},
+"source": [
+"In this Jupyter notebook, we used the pivot_wider function of the tidyr package to transform our data into a wider format and adapted to subsequent analyses as part of the Galaxy workflow for ecoregionalization. This transformation allowed us to convert our data to a format where each taxon becomes a separate column. We also took care to fill in the missing values with zeros and to sum the individual counts in case of duplications. Then all data >= 1 are replace by 1 to have only presence (1) or abscence (0) data.\n",
+"\n",
+"Thus, this notebook is an essential building block of our analysis pipeline, ensuring that the data is properly formatted and ready to be explored and interpreted for ecoregionalization studies."
+]
+}
+],
+"metadata": {
+"kernelspec": {
+"display_name": "R",
+"language": "R",
+"name": "ir"
+},
+"language_info": {
+"codemirror_mode": "r",
+"file_extension": ".r",
+"mimetype": "text/x-r-source",
+"name": "R",
+"pygments_lexer": "r",
+"version": "4.0.3"
+}
+},
+"nbformat": 4,
+"nbformat_minor": 4
+}

Mercurial > repos > ecology > makeeml

comparison test-data/annotemp/pivot_wider_jupytool_notebook.ipynb @ 6:4cf24a95c4ff draft