Mercurial > repos > mvdbeek > damidseq_consecutive_peaks
comparison consecutive_peaks.py @ 1:f3ca59e53b73 draft default tip
planemo upload for repository https://github.com/bardin-lab/damid_galaxy_tools commit c753dd4f3e1863aae7ba45dcc7efdf6937b03542-dirty
| author | mvdbeek |
|---|---|
| date | Mon, 29 Oct 2018 06:49:17 -0400 |
| parents | 7f827a8e4ec5 |
| children |
comparison
equal
deleted
inserted
replaced
| 0:7f827a8e4ec5 | 1:f3ca59e53b73 |
|---|---|
| 17 """Finds the two lowest consecutives peaks for a group and reports""" | 17 """Finds the two lowest consecutives peaks for a group and reports""" |
| 18 df = pd.read_csv(input_file, sep='\t', header=None) | 18 df = pd.read_csv(input_file, sep='\t', header=None) |
| 19 grouped = df.groupby(groupby_column, sort=False) | 19 grouped = df.groupby(groupby_column, sort=False) |
| 20 if add_number_of_peaks: | 20 if add_number_of_peaks: |
| 21 df[PEAKS_PER_GROUP] = grouped[groupby_column].transform(np.size) | 21 df[PEAKS_PER_GROUP] = grouped[groupby_column].transform(np.size) |
| 22 df[SHIFTED_PADJ_COLUMN] = grouped[8].shift() | 22 df[SHIFTED_PADJ_COLUMN] = grouped[padj_column].shift() |
| 23 df[CONSECUTIVE_MAX] = df[[padj_column, SHIFTED_PADJ_COLUMN]].max(axis=1) | 23 df[CONSECUTIVE_MAX] = df[[padj_column, SHIFTED_PADJ_COLUMN]].max(axis=1) |
| 24 grouped = df.groupby(groupby_column, sort=False) | 24 grouped = df.groupby(groupby_column, sort=False) |
| 25 idx = grouped[CONSECUTIVE_MAX].transform(min) # index of groupwise consecutive minimum | 25 idx = grouped[CONSECUTIVE_MAX].idxmin() # index of groupwise consecutive minimum |
| 26 new_df = df[df[CONSECUTIVE_MAX] == idx] | 26 new_df = df.loc[idx] |
| 27 new_df.sort_values(by=CONSECUTIVE_MAX) | 27 new_df.sort_values(by=CONSECUTIVE_MAX) |
| 28 new_df[padj_column].replace(new_df[CONSECUTIVE_MAX]) | 28 new_df[padj_column].replace(new_df[CONSECUTIVE_MAX]) |
| 29 new_df = new_df.drop(labels=[CONSECUTIVE_MAX, SHIFTED_PADJ_COLUMN], axis=1) | 29 new_df = new_df.drop(labels=[CONSECUTIVE_MAX, SHIFTED_PADJ_COLUMN], axis=1) |
| 30 new_df.to_csv(output_file, sep='\t', header=None, na_rep="NaN") | 30 new_df.to_csv(output_file, sep='\t', header=None, na_rep="NaN") |
| 31 | 31 |
