Mercurial > repos > ecology > raster_template
view test-data/annotemp/pivot_wider_jupytool_notebook.ipynb @ 4:eeadb27fe7b3 draft default tip
planemo upload for repository https://github.com/galaxyecology/tools-ecology/tree/master/tools/EMLassemblyline commit 61182ba790bdeeb98750403b869051ccad1a736c
author | ecology |
---|---|
date | Thu, 16 Jan 2025 15:56:47 +0000 |
parents | 9eaf84bf856f |
children |
line wrap: on
line source
{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Pivot wider Jupytool " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This Jupyter notebook is dedicated to the pivot_wider function from the tidyr R package. \n", "This script is the final part of the data preparation for the ecoregionalization Galaxy workflow. " ] }, { "cell_type": "code", "execution_count": 62, "metadata": { "tags": [] }, "outputs": [], "source": [ "#Date : 22/05/2024\n", "#Author : Seguineau Pauline & Yvan Le Bras \n", "\n", "#Load libraries\n", "library(tidyr)\n", "\n", "#load file \n", "\n", "input_path = \"galaxy_inputs\"\n", "\n", "for (dir in list.dirs(input_path)){\n", " for (file in list.files(dir)) {\n", " file_path = file.path(dir, file)}\n", "}\n", "\n", "file = read.table(file_path,header=T, sep = \"\\t\")\n", "\n", "#Run pivot_wider function\n", "pivot_file = pivot_wider(data = file,\n", " names_from = phylum_class_order_family_genus_specificEpithet,\n", " values_from = individualCount,\n", " values_fill = 0,\n", " values_fn = sum)\n", "\n", "#Replace all occurences >= 1 by 1 to have only presence (1) or absence (0) data\n", "for(c in 3:length(pivot_file)){\n", " pivot_file[c][pivot_file[c]>=1] <- 1}\n", "\n", "\n", "write.table(pivot_file, \"outputs/pivot_file.tabular\", sep = \"\\t\", quote = F, row.names = F)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this Jupyter notebook, we used the pivot_wider function of the tidyr package to transform our data into a wider format and adapted to subsequent analyses as part of the Galaxy workflow for ecoregionalization. This transformation allowed us to convert our data to a format where each taxon becomes a separate column. We also took care to fill in the missing values with zeros and to sum the individual counts in case of duplications. Then all data >= 1 are replace by 1 to have only presence (1) or abscence (0) data.\n", "\n", "Thus, this notebook is an essential building block of our analysis pipeline, ensuring that the data is properly formatted and ready to be explored and interpreted for ecoregionalization studies." ] } ], "metadata": { "kernelspec": { "display_name": "R", "language": "R", "name": "ir" }, "language_info": { "codemirror_mode": "r", "file_extension": ".r", "mimetype": "text/x-r-source", "name": "R", "pygments_lexer": "r", "version": "4.0.3" } }, "nbformat": 4, "nbformat_minor": 4 }