Mercurial > repos > ecology > eml_validate
comparison test-data/annotemp/pivot_wider_jupytool_notebook.ipynb @ 0:ad96b20423cf draft default tip
planemo upload for repository https://github.com/galaxyecology/tools-ecology/tree/master/tools/EMLassemblyline commit 4b040fe7867d965fb88ce70cc08081367b62b063
author | ecology |
---|---|
date | Fri, 27 Sep 2024 13:01:04 +0000 |
parents | |
children |
comparison
equal
deleted
inserted
replaced
-1:000000000000 | 0:ad96b20423cf |
---|---|
1 { | |
2 "cells": [ | |
3 { | |
4 "cell_type": "markdown", | |
5 "metadata": {}, | |
6 "source": [ | |
7 "# Pivot wider Jupytool " | |
8 ] | |
9 }, | |
10 { | |
11 "cell_type": "markdown", | |
12 "metadata": {}, | |
13 "source": [ | |
14 "This Jupyter notebook is dedicated to the pivot_wider function from the tidyr R package. \n", | |
15 "This script is the final part of the data preparation for the ecoregionalization Galaxy workflow. " | |
16 ] | |
17 }, | |
18 { | |
19 "cell_type": "code", | |
20 "execution_count": 62, | |
21 "metadata": { | |
22 "tags": [] | |
23 }, | |
24 "outputs": [], | |
25 "source": [ | |
26 "#Date : 22/05/2024\n", | |
27 "#Author : Seguineau Pauline & Yvan Le Bras \n", | |
28 "\n", | |
29 "#Load libraries\n", | |
30 "library(tidyr)\n", | |
31 "\n", | |
32 "#load file \n", | |
33 "\n", | |
34 "input_path = \"galaxy_inputs\"\n", | |
35 "\n", | |
36 "for (dir in list.dirs(input_path)){\n", | |
37 " for (file in list.files(dir)) {\n", | |
38 " file_path = file.path(dir, file)}\n", | |
39 "}\n", | |
40 "\n", | |
41 "file = read.table(file_path,header=T, sep = \"\\t\")\n", | |
42 "\n", | |
43 "#Run pivot_wider function\n", | |
44 "pivot_file = pivot_wider(data = file,\n", | |
45 " names_from = phylum_class_order_family_genus_specificEpithet,\n", | |
46 " values_from = individualCount,\n", | |
47 " values_fill = 0,\n", | |
48 " values_fn = sum)\n", | |
49 "\n", | |
50 "#Replace all occurences >= 1 by 1 to have only presence (1) or absence (0) data\n", | |
51 "for(c in 3:length(pivot_file)){\n", | |
52 " pivot_file[c][pivot_file[c]>=1] <- 1}\n", | |
53 "\n", | |
54 "\n", | |
55 "write.table(pivot_file, \"outputs/pivot_file.tabular\", sep = \"\\t\", quote = F, row.names = F)" | |
56 ] | |
57 }, | |
58 { | |
59 "cell_type": "markdown", | |
60 "metadata": {}, | |
61 "source": [ | |
62 "In this Jupyter notebook, we used the pivot_wider function of the tidyr package to transform our data into a wider format and adapted to subsequent analyses as part of the Galaxy workflow for ecoregionalization. This transformation allowed us to convert our data to a format where each taxon becomes a separate column. We also took care to fill in the missing values with zeros and to sum the individual counts in case of duplications. Then all data >= 1 are replace by 1 to have only presence (1) or abscence (0) data.\n", | |
63 "\n", | |
64 "Thus, this notebook is an essential building block of our analysis pipeline, ensuring that the data is properly formatted and ready to be explored and interpreted for ecoregionalization studies." | |
65 ] | |
66 } | |
67 ], | |
68 "metadata": { | |
69 "kernelspec": { | |
70 "display_name": "R", | |
71 "language": "R", | |
72 "name": "ir" | |
73 }, | |
74 "language_info": { | |
75 "codemirror_mode": "r", | |
76 "file_extension": ".r", | |
77 "mimetype": "text/x-r-source", | |
78 "name": "R", | |
79 "pygments_lexer": "r", | |
80 "version": "4.0.3" | |
81 } | |
82 }, | |
83 "nbformat": 4, | |
84 "nbformat_minor": 4 | |
85 } |