view seaborn_pairgrid.xml @ 0:4a46d83cb08f draft default tip

planemo upload for repository https://github.com/galaxyproject/tools-iuc/tree/main/tools/seaborn commit 24dc6373560bd5e409fca84154634f5a528001c3
author iuc
date Wed, 14 May 2025 08:39:22 +0000
parents
children
line wrap: on
line source

<tool id="seaborn_pairgrid" name="seaborn pair-wise scatterplot" version="@TOOL_VERSION@+galaxy@VERSION_SUFFIX@" profile="23.0" license="MIT">
    <description>of all-vs-all column combinations</description>
    <macros>
        <import>macros.xml</import>
    </macros>

    <expand macro="edam"/>
    <expand macro="requirements"/>
    <expand macro="creator" />

    <command detect_errors="exit_code"><![CDATA[
        python3 '${run_script}'
    ]]></command>

    <configfiles>
        <configfile name="run_script"><![CDATA[
index_col = $index_col

@INIT@

from scipy.stats import gaussian_kde

def scatter_density(x, y, **kwargs):
    kwargs.pop('color')
    # Calculate the point density
    xy = np.vstack([x, y])
    z = gaussian_kde(xy)(xy)
    plt.scatter(x, y, c=z, cmap="jet", **kwargs)


g = sns.PairGrid(data)
g.map_lower(sns.regplot, scatter_kws=dict(s=4))
g.map_lower(sns.kdeplot, levels=4, color=".2")
g.map_upper(scatter_density, s=6)
g.map_diag(sns.histplot)

plt.savefig(f"{output_file}", format=output_format, dpi=300)
        ]]></configfile>
    </configfiles>
    <inputs>
        <expand macro="inputs"/>
        <param argument="index_col" type="boolean" truevalue="0" falsevalue="None" checked="false" label="Is the first column the index?" help="Specify whether the first column of the input data should be treated as the index. If selected, the first column will not be used as data for plotting." />
        <expand macro="transformation"/>
    </inputs>
    <outputs>
        <data name="output_file" format="png" label="${tool.name} on ${on_string}" />
    </outputs>
    <tests>
        <!-- Test 1: Generate a pairgrid plot with default settings -->
        <test>
            <param name="input_data" value="mtcars.txt" />
            <output name="output_file">
                <assert_contents>
                    <has_image_channels channels="4"/>
                    <has_image_height height="8250"/>
                    <has_image_width width="8250" />
                    <has_image_center_of_mass center_of_mass="4143.08, 4103.67" eps="0.1"/>
                </assert_contents>
            </output>
        </test>
    </tests>
    <help><![CDATA[
.. class:: infomark

**What it does**

This tool generates a scatterplot matrix (pair-wise scatterplots) of all column combinations in the input dataset using the Seaborn library. The scatterplot matrix provides a visual summary of the relationships between variables in the dataset, making it useful for exploratory data analysis.
The tool uses Seaborn's `PairGrid` functionality to create the matrix, with the following features:

- **Lower Triangle**: Regression plots and kernel density estimates (KDE).
- **Upper Triangle**: Scatterplots with point density coloring.
- **Diagonal**: Histograms of individual variables.

**Usage**

1. **Input**: Provide a tabular data file in one of the supported formats (TSV, CSV, TXT, or Parquet). Optionally, specify whether the first column should be treated as the index.
2. **Advanced Options**: Apply transformations to the data (e.g., log10 or log2) before plotting.
3. **Output**: The tool generates a PNG image of the scatterplot matrix, which can be downloaded or used in further analyses.

**Input**

- **Input Data Table**: Upload your data file in TSV, CSV, TXT, or Parquet format. The file should contain numerical data for plotting.
- **Index Column**: Specify whether the first column of the input data should be treated as the index. If selected, the first column will not be used for plotting.
- **Data Transformation**: Apply transformations such as log10 or log2 to numerical data before plotting.

**Output**

The tool generates a PNG file containing the scatterplot matrix. The file can be downloaded or used as input for other tools in Galaxy.

**Example Input**

Here is an example of a simple input dataset:

+------------+------------+------------+------------+
| Category   | Value1     | Value2     | Value3     |
+============+============+============+============+
| A          | 10         | 20         | 30         |
+------------+------------+------------+------------+
| B          | 15         | 25         | 35         |
+------------+------------+------------+------------+
| C          | 20         | 30         | 40         |
+------------+------------+------------+------------+

**Example Output**

The tool will generate a scatterplot matrix where:
- The lower triangle contains regression plots and KDE plots.
- The upper triangle contains scatterplots with point density coloring.
- The diagonal contains histograms of individual variables.

**Links**

- For more information about Seaborn's `PairGrid`, visit the official documentation: https://seaborn.pydata.org/generated/seaborn.PairGrid.html
- For detailed parameter descriptions, refer to the Galaxy tool documentation.
    ]]></help>
    <expand macro="citation"/>
</tool>