view test-data/annotemp/pivot_wider_jupytool_notebook.ipynb @ 6:4f2d8a7c97fa draft default tip

planemo upload for repository https://github.com/galaxyecology/tools-ecology/tree/master/tools/EMLassemblyline commit 4b040fe7867d965fb88ce70cc08081367b62b063
author ecology
date Fri, 27 Sep 2024 12:58:10 +0000
parents 20224597fa2c
children
line wrap: on
line source

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Pivot wider Jupytool "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This Jupyter notebook is dedicated to the pivot_wider function from the tidyr R package. \n",
    "This script is the final part of the data preparation for the ecoregionalization Galaxy workflow.   "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "#Date : 22/05/2024\n",
    "#Author : Seguineau Pauline & Yvan Le Bras \n",
    "\n",
    "#Load libraries\n",
    "library(tidyr)\n",
    "\n",
    "#load file \n",
    "\n",
    "input_path = \"galaxy_inputs\"\n",
    "\n",
    "for (dir in list.dirs(input_path)){\n",
    "    for (file in list.files(dir)) {\n",
    "        file_path = file.path(dir, file)}\n",
    "}\n",
    "\n",
    "file = read.table(file_path,header=T, sep = \"\\t\")\n",
    "\n",
    "#Run pivot_wider function\n",
    "pivot_file = pivot_wider(data = file,\n",
    "                        names_from = phylum_class_order_family_genus_specificEpithet,\n",
    "                        values_from = individualCount,\n",
    "                        values_fill = 0,\n",
    "                        values_fn = sum)\n",
    "\n",
    "#Replace all occurences >= 1 by 1 to have only presence (1) or absence (0) data\n",
    "for(c in 3:length(pivot_file)){\n",
    "    pivot_file[c][pivot_file[c]>=1] <- 1}\n",
    "\n",
    "\n",
    "write.table(pivot_file, \"outputs/pivot_file.tabular\", sep = \"\\t\", quote = F, row.names = F)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this Jupyter notebook, we used the pivot_wider function of the tidyr package to transform our data into a wider format and adapted to subsequent analyses as part of the Galaxy workflow for ecoregionalization. This transformation allowed us to convert our data to a format where each taxon becomes a separate column. We also took care to fill in the missing values with zeros and to sum the individual counts in case of duplications. Then all data >= 1 are replace by 1 to have only presence (1) or abscence (0) data.\n",
    "\n",
    "Thus, this notebook is an essential building block of our analysis pipeline, ensuring that the data is properly formatted and ready to be explored and interpreted for ecoregionalization studies."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "R",
   "language": "R",
   "name": "ir"
  },
  "language_info": {
   "codemirror_mode": "r",
   "file_extension": ".r",
   "mimetype": "text/x-r-source",
   "name": "R",
   "pygments_lexer": "r",
   "version": "4.0.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}