Mercurial > repos > ecology > makeeml
comparison test-data/annotemp/pivot_wider_jupytool_notebook.ipynb @ 6:4cf24a95c4ff draft
planemo upload for repository https://github.com/galaxyecology/tools-ecology/tree/master/tools/EMLassemblyline commit 2d36dc964f548b5acbc43ffd78e51e6fc7dc80bb
| author | ecology | 
|---|---|
| date | Tue, 10 Sep 2024 12:53:27 +0000 | 
| parents | |
| children | 
   comparison
  equal
  deleted
  inserted
  replaced
| 5:34dcb86a9351 | 6:4cf24a95c4ff | 
|---|---|
| 1 { | |
| 2 "cells": [ | |
| 3 { | |
| 4 "cell_type": "markdown", | |
| 5 "metadata": {}, | |
| 6 "source": [ | |
| 7 "# Pivot wider Jupytool " | |
| 8 ] | |
| 9 }, | |
| 10 { | |
| 11 "cell_type": "markdown", | |
| 12 "metadata": {}, | |
| 13 "source": [ | |
| 14 "This Jupyter notebook is dedicated to the pivot_wider function from the tidyr R package. \n", | |
| 15 "This script is the final part of the data preparation for the ecoregionalization Galaxy workflow. " | |
| 16 ] | |
| 17 }, | |
| 18 { | |
| 19 "cell_type": "code", | |
| 20 "execution_count": 62, | |
| 21 "metadata": { | |
| 22 "tags": [] | |
| 23 }, | |
| 24 "outputs": [], | |
| 25 "source": [ | |
| 26 "#Date : 22/05/2024\n", | |
| 27 "#Author : Seguineau Pauline & Yvan Le Bras \n", | |
| 28 "\n", | |
| 29 "#Load libraries\n", | |
| 30 "library(tidyr)\n", | |
| 31 "\n", | |
| 32 "#load file \n", | |
| 33 "\n", | |
| 34 "input_path = \"galaxy_inputs\"\n", | |
| 35 "\n", | |
| 36 "for (dir in list.dirs(input_path)){\n", | |
| 37 " for (file in list.files(dir)) {\n", | |
| 38 " file_path = file.path(dir, file)}\n", | |
| 39 "}\n", | |
| 40 "\n", | |
| 41 "file = read.table(file_path,header=T, sep = \"\\t\")\n", | |
| 42 "\n", | |
| 43 "#Run pivot_wider function\n", | |
| 44 "pivot_file = pivot_wider(data = file,\n", | |
| 45 " names_from = phylum_class_order_family_genus_specificEpithet,\n", | |
| 46 " values_from = individualCount,\n", | |
| 47 " values_fill = 0,\n", | |
| 48 " values_fn = sum)\n", | |
| 49 "\n", | |
| 50 "#Replace all occurences >= 1 by 1 to have only presence (1) or absence (0) data\n", | |
| 51 "for(c in 3:length(pivot_file)){\n", | |
| 52 " pivot_file[c][pivot_file[c]>=1] <- 1}\n", | |
| 53 "\n", | |
| 54 "\n", | |
| 55 "write.table(pivot_file, \"outputs/pivot_file.tabular\", sep = \"\\t\", quote = F, row.names = F)" | |
| 56 ] | |
| 57 }, | |
| 58 { | |
| 59 "cell_type": "markdown", | |
| 60 "metadata": {}, | |
| 61 "source": [ | |
| 62 "In this Jupyter notebook, we used the pivot_wider function of the tidyr package to transform our data into a wider format and adapted to subsequent analyses as part of the Galaxy workflow for ecoregionalization. This transformation allowed us to convert our data to a format where each taxon becomes a separate column. We also took care to fill in the missing values with zeros and to sum the individual counts in case of duplications. Then all data >= 1 are replace by 1 to have only presence (1) or abscence (0) data.\n", | |
| 63 "\n", | |
| 64 "Thus, this notebook is an essential building block of our analysis pipeline, ensuring that the data is properly formatted and ready to be explored and interpreted for ecoregionalization studies." | |
| 65 ] | |
| 66 } | |
| 67 ], | |
| 68 "metadata": { | |
| 69 "kernelspec": { | |
| 70 "display_name": "R", | |
| 71 "language": "R", | |
| 72 "name": "ir" | |
| 73 }, | |
| 74 "language_info": { | |
| 75 "codemirror_mode": "r", | |
| 76 "file_extension": ".r", | |
| 77 "mimetype": "text/x-r-source", | |
| 78 "name": "R", | |
| 79 "pygments_lexer": "r", | |
| 80 "version": "4.0.3" | |
| 81 } | |
| 82 }, | |
| 83 "nbformat": 4, | |
| 84 "nbformat_minor": 4 | |
| 85 } | 
