Mercurial > repos > ecology > eml_validate

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Pivot wider Jupytool "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "This Jupyter notebook is dedicated to the pivot_wider function from the tidyr R package. \n",
    "This script is the final part of the data preparation for the ecoregionalization Galaxy workflow.   "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 62,
   "metadata": {
    "tags": []
   },
   "outputs": [],
   "source": [
    "#Date : 22/05/2024\n",
    "#Author : Seguineau Pauline & Yvan Le Bras \n",
    "\n",
    "#Load libraries\n",
    "library(tidyr)\n",
    "\n",
    "#load file \n",
    "\n",
    "input_path = \"galaxy_inputs\"\n",
    "\n",
    "for (dir in list.dirs(input_path)){\n",
    "    for (file in list.files(dir)) {\n",
    "        file_path = file.path(dir, file)}\n",
    "}\n",
    "\n",
    "file = read.table(file_path,header=T, sep = \"\\t\")\n",
    "\n",
    "#Run pivot_wider function\n",
    "pivot_file = pivot_wider(data = file,\n",
    "                        names_from = phylum_class_order_family_genus_specificEpithet,\n",
    "                        values_from = individualCount,\n",
    "                        values_fill = 0,\n",
    "                        values_fn = sum)\n",
    "\n",
    "#Replace all occurences >= 1 by 1 to have only presence (1) or absence (0) data\n",
    "for(c in 3:length(pivot_file)){\n",
    "    pivot_file[c][pivot_file[c]>=1] <- 1}\n",
    "\n",
    "\n",
    "write.table(pivot_file, \"outputs/pivot_file.tabular\", sep = \"\\t\", quote = F, row.names = F)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In this Jupyter notebook, we used the pivot_wider function of the tidyr package to transform our data into a wider format and adapted to subsequent analyses as part of the Galaxy workflow for ecoregionalization. This transformation allowed us to convert our data to a format where each taxon becomes a separate column. We also took care to fill in the missing values with zeros and to sum the individual counts in case of duplications. Then all data >= 1 are replace by 1 to have only presence (1) or abscence (0) data.\n",
    "\n",
    "Thus, this notebook is an essential building block of our analysis pipeline, ensuring that the data is properly formatted and ready to be explored and interpreted for ecoregionalization studies."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "R",
   "language": "R",
   "name": "ir"
  },
  "language_info": {
   "codemirror_mode": "r",
   "file_extension": ".r",
   "mimetype": "text/x-r-source",
   "name": "R",
   "pygments_lexer": "r",
   "version": "4.0.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
author	ecology
date	Fri, 27 Sep 2024 13:01:04 +0000
parents
children