comparison test-data/annotemp/pivot_wider_jupytool_notebook.ipynb @ 6:4cf24a95c4ff draft

planemo upload for repository https://github.com/galaxyecology/tools-ecology/tree/master/tools/EMLassemblyline commit 2d36dc964f548b5acbc43ffd78e51e6fc7dc80bb
author ecology
date Tue, 10 Sep 2024 12:53:27 +0000
parents
children
comparison
equal deleted inserted replaced
5:34dcb86a9351 6:4cf24a95c4ff
1 {
2 "cells": [
3 {
4 "cell_type": "markdown",
5 "metadata": {},
6 "source": [
7 "# Pivot wider Jupytool "
8 ]
9 },
10 {
11 "cell_type": "markdown",
12 "metadata": {},
13 "source": [
14 "This Jupyter notebook is dedicated to the pivot_wider function from the tidyr R package. \n",
15 "This script is the final part of the data preparation for the ecoregionalization Galaxy workflow. "
16 ]
17 },
18 {
19 "cell_type": "code",
20 "execution_count": 62,
21 "metadata": {
22 "tags": []
23 },
24 "outputs": [],
25 "source": [
26 "#Date : 22/05/2024\n",
27 "#Author : Seguineau Pauline & Yvan Le Bras \n",
28 "\n",
29 "#Load libraries\n",
30 "library(tidyr)\n",
31 "\n",
32 "#load file \n",
33 "\n",
34 "input_path = \"galaxy_inputs\"\n",
35 "\n",
36 "for (dir in list.dirs(input_path)){\n",
37 " for (file in list.files(dir)) {\n",
38 " file_path = file.path(dir, file)}\n",
39 "}\n",
40 "\n",
41 "file = read.table(file_path,header=T, sep = \"\\t\")\n",
42 "\n",
43 "#Run pivot_wider function\n",
44 "pivot_file = pivot_wider(data = file,\n",
45 " names_from = phylum_class_order_family_genus_specificEpithet,\n",
46 " values_from = individualCount,\n",
47 " values_fill = 0,\n",
48 " values_fn = sum)\n",
49 "\n",
50 "#Replace all occurences >= 1 by 1 to have only presence (1) or absence (0) data\n",
51 "for(c in 3:length(pivot_file)){\n",
52 " pivot_file[c][pivot_file[c]>=1] <- 1}\n",
53 "\n",
54 "\n",
55 "write.table(pivot_file, \"outputs/pivot_file.tabular\", sep = \"\\t\", quote = F, row.names = F)"
56 ]
57 },
58 {
59 "cell_type": "markdown",
60 "metadata": {},
61 "source": [
62 "In this Jupyter notebook, we used the pivot_wider function of the tidyr package to transform our data into a wider format and adapted to subsequent analyses as part of the Galaxy workflow for ecoregionalization. This transformation allowed us to convert our data to a format where each taxon becomes a separate column. We also took care to fill in the missing values with zeros and to sum the individual counts in case of duplications. Then all data >= 1 are replace by 1 to have only presence (1) or abscence (0) data.\n",
63 "\n",
64 "Thus, this notebook is an essential building block of our analysis pipeline, ensuring that the data is properly formatted and ready to be explored and interpreted for ecoregionalization studies."
65 ]
66 }
67 ],
68 "metadata": {
69 "kernelspec": {
70 "display_name": "R",
71 "language": "R",
72 "name": "ir"
73 },
74 "language_info": {
75 "codemirror_mode": "r",
76 "file_extension": ".r",
77 "mimetype": "text/x-r-source",
78 "name": "R",
79 "pygments_lexer": "r",
80 "version": "4.0.3"
81 }
82 },
83 "nbformat": 4,
84 "nbformat_minor": 4
85 }