Mercurial > repos > fubar > lifelines_km_cph_tool
annotate lifelines_tool/lifelineskmcph.xml @ 2:dd5e65893cb8 draft default tip
add survival and collapsed life table outputs suggested by Wolfgang
author | fubar |
---|---|
date | Thu, 10 Aug 2023 22:52:45 +0000 |
parents | 232b874046a7 |
children |
rev | line source |
---|---|
0 | 1 <tool name="lifelineskmcph" id="lifelineskmcph" version="0.01"> |
2 <!--Source in git at: https://github.com/fubar2/galaxy_tf_overlay--> | |
2
dd5e65893cb8
add survival and collapsed life table outputs suggested by Wolfgang
fubar
parents:
1
diff
changeset
|
3 <!--Created by toolfactory@galaxy.org at 10/08/2023 21:59:53 using the Galaxy Tool Factory.--> |
0 | 4 <description>Lifelines KM and optional Cox PH models</description> |
5 <requirements> | |
6 <requirement version="1.5.3" type="package">pandas</requirement> | |
7 <requirement version="3.7.2" type="package">matplotlib</requirement> | |
8 <requirement version="0.27.7" type="package">lifelines</requirement> | |
9 </requirements> | |
10 <stdio> | |
11 <exit_code range="1:" level="fatal"/> | |
12 </stdio> | |
13 <version_command><![CDATA[echo "0.01"]]></version_command> | |
14 <command><![CDATA[python | |
15 $runme | |
16 --input_tab | |
17 $input_tab | |
18 --readme | |
19 $readme | |
20 --time | |
21 '$time' | |
22 --status | |
23 '$status' | |
24 --cphcols | |
25 '$CPHcovariatecolumnnames' | |
26 --title | |
27 '$title' | |
28 --header | |
29 '$header' | |
30 --group | |
31 '$group' | |
32 --image_type | |
33 '$image_type' | |
34 --image_dir | |
35 'image_dir']]></command> | |
36 <configfiles> | |
37 <configfile name="runme"><![CDATA[#raw | |
38 | |
39 # script for a lifelines ToolFactory KM/CPH tool for Galaxy | |
40 # km models for https://github.com/galaxyproject/tools-iuc/issues/5393 | |
41 # test as | |
42 # python plotlykm.py --input_tab rossi.tab --htmlout "testfoo" --time "week" --status "arrest" --title "test" --image_dir images --cphcol="prio,age,race,paro,mar,fin" | |
1 | 43 # Ross Lazarus July 2023 |
44 import argparse | |
0 | 45 |
46 import os | |
47 import sys | |
48 | |
49 import lifelines | |
50 | |
51 from matplotlib import pyplot as plt | |
52 | |
53 import pandas as pd | |
54 | |
55 | |
1 | 56 def trimlegend(v): |
57 """ | |
58 for int64 quintiles - must be ints - otherwise get silly legends with long float values | |
59 """ | |
60 for i, av in enumerate(v): | |
61 x = int(av) | |
62 v[i] = str(x) | |
63 return v | |
0 | 64 |
65 kmf = lifelines.KaplanMeierFitter() | |
66 cph = lifelines.CoxPHFitter() | |
67 | |
68 parser = argparse.ArgumentParser() | |
69 a = parser.add_argument | |
1 | 70 a('--input_tab', default='rossi.tab', required=True) |
0 | 71 a('--header', default='') |
72 a('--htmlout', default="test_run.html") | |
73 a('--group', default='') | |
74 a('--time', default='', required=True) | |
75 a('--status',default='', required=True) | |
76 a('--cphcols',default='') | |
77 a('--title', default='Default plot title') | |
78 a('--image_type', default='png') | |
79 a('--image_dir', default='images') | |
80 a('--readme', default='run_log.txt') | |
81 args = parser.parse_args() | |
82 sys.stdout = open(args.readme, 'w') | |
83 df = pd.read_csv(args.input_tab, sep='\t') | |
84 NCOLS = df.columns.size | |
85 NROWS = len(df.index) | |
1 | 86 QVALS = [.2, .4, .6, .8] # for partial cox ph plots |
0 | 87 defaultcols = ['col%d' % (x+1) for x in range(NCOLS)] |
88 testcols = df.columns | |
89 if len(args.header.strip()) > 0: | |
90 newcols = args.header.split(',') | |
91 if len(newcols) == NCOLS: | |
92 if (args.time in newcols) and (args.status in newcols): | |
93 df.columns = newcols | |
94 else: | |
95 sys.stderr.write('## CRITICAL USAGE ERROR (not a bug!): time %s and/or status %s not found in supplied header parameter %s' % (args.time, args.status, args.header)) | |
96 sys.exit(4) | |
97 else: | |
98 sys.stderr.write('## CRITICAL USAGE ERROR (not a bug!): Supplied header %s has %d comma delimited header names - does not match the input tabular file %d columns' % (args.header, len(newcols), NCOLS)) | |
99 sys.exit(5) | |
100 else: # no header supplied - check for a real one that matches the x and y axis column names | |
101 colsok = (args.time in testcols) and (args.status in testcols) # if they match, probably ok...should use more code and logic.. | |
102 if colsok: | |
103 df.columns = testcols # use actual header | |
104 else: | |
105 colsok = (args.time in defaultcols) and (args.status in defaultcols) | |
106 if colsok: | |
2
dd5e65893cb8
add survival and collapsed life table outputs suggested by Wolfgang
fubar
parents:
1
diff
changeset
|
107 print('Replacing first row of data derived header %s with %s' % (testcols, defaultcols)) |
0 | 108 df.columns = defaultcols |
109 else: | |
110 sys.stderr.write('## CRITICAL USAGE ERROR (not a bug!): time %s and status %s do not match anything in the file header, supplied header or automatic default column names %s' % (args.time, args.status, defaultcols)) | |
2
dd5e65893cb8
add survival and collapsed life table outputs suggested by Wolfgang
fubar
parents:
1
diff
changeset
|
111 print('## Lifelines tool\nInput data header =', df.columns, 'time column =', args.time, 'status column =', args.status) |
0 | 112 os.makedirs(args.image_dir, exist_ok=True) |
113 fig, ax = plt.subplots() | |
114 if args.group > '': | |
115 names = [] | |
116 times = [] | |
117 events = [] | |
118 for name, grouped_df in df.groupby(args.group): | |
119 T = grouped_df[args.time] | |
120 E = grouped_df[args.status] | |
121 gfit = kmf.fit(T, E, label=name) | |
122 kmf.plot_survival_function(ax=ax) | |
123 names.append(str(name)) | |
124 times.append(T) | |
125 events.append(E) | |
126 ax.set_title(args.title) | |
127 fig.savefig(os.path.join(args.image_dir,'KM_%s.png' % args.title)) | |
128 ngroup = len(names) | |
129 if ngroup == 2: # run logrank test if 2 groups | |
130 results = lifelines.statistics.logrank_test(times[0], times[1], events[0], events[1], alpha=.99) | |
131 print('Logrank test for %s - %s vs %s\n' % (args.group, names[0], names[1])) | |
132 results.print_summary() | |
133 else: | |
134 kmf.fit(df[args.time], df[args.status]) | |
135 kmf.plot_survival_function(ax=ax) | |
136 ax.set_title(args.title) | |
137 fig.savefig(os.path.join(args.image_dir,'KM_%s.png' % args.title)) | |
1 | 138 print('#### No grouping variable, so no log rank or other Kaplan-Meier statistical output is available') |
2
dd5e65893cb8
add survival and collapsed life table outputs suggested by Wolfgang
fubar
parents:
1
diff
changeset
|
139 survdf = lifelines.utils.survival_table_from_events(df[args.time], df[args.status]) |
dd5e65893cb8
add survival and collapsed life table outputs suggested by Wolfgang
fubar
parents:
1
diff
changeset
|
140 lifedf = lifelines.utils.survival_table_from_events(df[args.time], df[args.status], collapse=True) |
dd5e65893cb8
add survival and collapsed life table outputs suggested by Wolfgang
fubar
parents:
1
diff
changeset
|
141 print("#### Survival table using time %s and event %s" % (args.time, args.status)) |
dd5e65893cb8
add survival and collapsed life table outputs suggested by Wolfgang
fubar
parents:
1
diff
changeset
|
142 with pd.option_context('display.max_rows', None, |
dd5e65893cb8
add survival and collapsed life table outputs suggested by Wolfgang
fubar
parents:
1
diff
changeset
|
143 'display.max_columns', None, |
dd5e65893cb8
add survival and collapsed life table outputs suggested by Wolfgang
fubar
parents:
1
diff
changeset
|
144 'display.precision', 3, |
dd5e65893cb8
add survival and collapsed life table outputs suggested by Wolfgang
fubar
parents:
1
diff
changeset
|
145 ): |
dd5e65893cb8
add survival and collapsed life table outputs suggested by Wolfgang
fubar
parents:
1
diff
changeset
|
146 print(survdf) |
dd5e65893cb8
add survival and collapsed life table outputs suggested by Wolfgang
fubar
parents:
1
diff
changeset
|
147 print("#### Life table using time %s and event %s" % (args.time, args.status)) |
dd5e65893cb8
add survival and collapsed life table outputs suggested by Wolfgang
fubar
parents:
1
diff
changeset
|
148 with pd.option_context('display.max_rows', None, |
dd5e65893cb8
add survival and collapsed life table outputs suggested by Wolfgang
fubar
parents:
1
diff
changeset
|
149 'display.max_columns', None, |
dd5e65893cb8
add survival and collapsed life table outputs suggested by Wolfgang
fubar
parents:
1
diff
changeset
|
150 'display.precision', 3, |
dd5e65893cb8
add survival and collapsed life table outputs suggested by Wolfgang
fubar
parents:
1
diff
changeset
|
151 ): |
dd5e65893cb8
add survival and collapsed life table outputs suggested by Wolfgang
fubar
parents:
1
diff
changeset
|
152 print(lifedf) |
dd5e65893cb8
add survival and collapsed life table outputs suggested by Wolfgang
fubar
parents:
1
diff
changeset
|
153 outpath = os.path.join(args.image_dir,'survival_table.tabular') |
dd5e65893cb8
add survival and collapsed life table outputs suggested by Wolfgang
fubar
parents:
1
diff
changeset
|
154 survdf.to_csv(outpath, sep='\t') |
dd5e65893cb8
add survival and collapsed life table outputs suggested by Wolfgang
fubar
parents:
1
diff
changeset
|
155 outpath = os.path.join(args.image_dir,'life_table.tabular') |
dd5e65893cb8
add survival and collapsed life table outputs suggested by Wolfgang
fubar
parents:
1
diff
changeset
|
156 lifedf.to_csv(outpath, sep='\t') |
0 | 157 if len(args.cphcols) > 0: |
158 fig, ax = plt.subplots() | |
1 | 159 ax.set_title('Cox-PH model: %s' % args.title) |
0 | 160 cphcols = args.cphcols.strip().split(',') |
161 cphcols = [x.strip() for x in cphcols] | |
162 notfound = sum([(x not in df.columns) for x in cphcols]) | |
163 if notfound > 0: | |
164 sys.stderr.write('## CRITICAL USAGE ERROR (not a bug!): One or more requested Cox PH columns %s not found in supplied column header %s' % (args.cphcols, df.columns)) | |
165 sys.exit(6) | |
1 | 166 colsdf = df[cphcols] |
0 | 167 print('### Lifelines test of Proportional Hazards results with %s as covariates on %s' % (', '.join(cphcols), args.title)) |
1 | 168 cutcphcols = [args.time, args.status] + cphcols |
169 cphdf = df[cutcphcols] | |
170 ucolcounts = colsdf.nunique(axis=0) | |
0 | 171 cph.fit(cphdf, duration_col=args.time, event_col=args.status) |
172 cph.print_summary() | |
1 | 173 for i, cov in enumerate(colsdf.columns): |
2
dd5e65893cb8
add survival and collapsed life table outputs suggested by Wolfgang
fubar
parents:
1
diff
changeset
|
174 if ucolcounts[i] > 10: # a hack - assume categories are sparse - if not imaginary quintiles will have to do |
1 | 175 v = pd.Series.tolist(cphdf[cov].quantile(QVALS)) |
176 vdt = df.dtypes[cov] | |
177 if vdt == 'int64': | |
178 v = trimlegend(v) | |
179 axp = cph.plot_partial_effects_on_outcome(cov, cmap='coolwarm', values=v) | |
180 axp.set_title('Cox-PH %s quintile partials: %s' % (cov,args.title)) | |
181 figr = axp.get_figure() | |
182 oname = os.path.join(args.image_dir,'%s_CoxPH_%s.%s' % (args.title, cov, args.image_type)) | |
183 figr.savefig(oname) | |
184 else: | |
185 v = pd.unique(cphdf[cov]) | |
186 v = [str(x) for x in v] | |
187 try: | |
188 axp = cph.plot_partial_effects_on_outcome(cov, cmap='coolwarm', values=v) | |
189 axp.set_title('Cox-PH %s partials: %s' % (cov,args.title)) | |
190 figr = axp.get_figure() | |
191 oname = os.path.join(args.image_dir,'%s_CoxPH_%s.%s' % (args.title, cov, args.image_type)) | |
192 figr.savefig(oname) | |
193 except: | |
194 pass | |
0 | 195 cphaxes = cph.check_assumptions(cphdf, p_value_threshold=0.01, show_plots=True) |
196 for i, ax in enumerate(cphaxes): | |
197 figr = ax[0].get_figure() | |
198 titl = figr._suptitle.get_text().replace(' ','_').replace("'","") | |
199 oname = os.path.join(args.image_dir,'CPH%s.%s' % (titl, args.image_type)) | |
200 figr.savefig(oname) | |
201 | |
202 | |
203 #end raw]]></configfile> | |
204 </configfiles> | |
205 <inputs> | |
206 <param name="input_tab" type="data" optional="false" label="Tabular input file for failure time testing." help="Must have a column with a measure of time and status (0,1) at observation." format="tabular" multiple="false"/> | |
207 <param name="time" type="text" value="week" label="Name of column containing a time to observation" help="Use a column name from the file header if the data has one, or use one from the list supplied below, or use col1....colN otherwise to select the correct column"/> | |
208 <param name="status" type="text" value="arrest" label="Status at observation. Typically 1=alive, 0=deceased for life-table observations" help="Use a column name from the header if the file has one, or use one from the list supplied below, or use col1....colN otherwise to select the correct column"/> | |
209 <param name="CPHcovariatecolumnnames" type="text" value="prio,age,race,paro,mar,fin" label="Optional comma delimited column names to use as covariates in the Cox Proportional Hazards model" help="Leave blank for no Cox PH model tests "/> | |
210 <param name="title" type="text" value="KM and CPH in lifelines test" label="Title for this lifelines analysis" help="Special characters will probably be escaped so do not use them"/> | |
211 <param name="header" type="text" value="" label="Optional comma delimited list of column names to use for this tabular file. Default is None when col1...coln will be used if no header row in the input data" help="The column names supplied for time, status and so on MUST match either this supplied list, or if none, the original file header if it exists, or col1...coln as the default of last resort."/> | |
212 <param name="group" type="text" value="race" label="Optional group column name for KM plot" help="If there are exactly 2 groups, a log-rank statistic will be generated as part of the Kaplan-Meier test."/> | |
213 <param name="image_type" type="select" label="Output format for all images" help=""> | |
214 <option value="png">Portable Network Graphics .png format</option> | |
215 <option value="jpg">JPEG</option> | |
216 <option value="pdf">PDF</option> | |
217 <option value="tiff">TIFF</option> | |
218 </param> | |
219 </inputs> | |
220 <outputs> | |
221 <collection name="image_dir" type="list" label="Images from $title on $input_tab.element_identifier"> | |
222 <discover_datasets pattern="__name_and_ext__" directory="image_dir" visible="false"/> | |
223 </collection> | |
224 <data name="readme" format="txt" label="Lifelines_km_cph $title on $input_tab.element_identifier" hidden="false"/> | |
225 </outputs> | |
226 <tests> | |
227 <test> | |
228 <output_collection name="image_dir"/> | |
229 <output name="readme" value="readme_sample" compare="sim_size" delta="1000"/> | |
230 <param name="input_tab" value="input_tab_sample"/> | |
231 <param name="time" value="week"/> | |
232 <param name="status" value="arrest"/> | |
233 <param name="CPHcovariatecolumnnames" value="prio,age,race,paro,mar,fin"/> | |
234 <param name="title" value="KM and CPH in lifelines test"/> | |
235 <param name="header" value=""/> | |
236 <param name="group" value="race"/> | |
237 <param name="image_type" value="png"/> | |
238 </test> | |
239 </tests> | |
240 <help><![CDATA[ | |
241 | |
242 This is a wrapper for some elementary life table analysis functions from the Lifelines package - see https://lifelines.readthedocs.io/en/latest for the full story | |
243 | |
244 | |
245 | |
246 Given a Galaxy tabular dataset with suitable indicators for time and status at observation, this tool can perform some simple life-table analyses and produce some useful plots. Kaplan-Meier is the default. Cox Proportional Hazards model will be tested if covariates to include are provided. | |
247 | |
248 | |
249 | |
250 1. Kaplan-Meier survival analysis - see https://lifelines.readthedocs.io/en/latest/Survival%20analysis%20with%20lifelines.html | |
251 | |
252 This is always performed and a survival curve is plotted. | |
253 | |
254 If there is an optional "group" column, the plot will show each group separately. If there are *exactly* two groups, a log-rank test for difference is performed and reported | |
255 | |
256 | |
257 | |
258 2. The Cox Proportional Hazards model can be tested, if a comma separated list of covariate column names is supplied on the tool form. | |
259 | |
260 These are used in as covariates. | |
261 | |
262 Although not usually a real problem, some diagnostics and advice about the assumption of proportional hazards are are also provided as outputs - see https://lifelines.readthedocs.io/en/latest/Survival%20Regression.html | |
263 | |
264 | |
265 | |
266 A big shout out to the lifelines authors - no R code needed - nice job, thanks! | |
267 | |
268 ]]></help> | |
269 <citations> | |
270 <citation type="doi">10.1093/bioinformatics/bts573</citation> | |
271 </citations> | |
272 </tool> | |
273 |