view lifelines_tool/test-data/readme_sample @ 1:232b874046a7 draft

Uploaded
author fubar
date Thu, 10 Aug 2023 07:15:22 +0000
parents dd49a7040643
children dd5e65893cb8
line wrap: on
line source

## Lifelines tool starting.
Using data header = Index(['Unnamed: 0', 'week', 'arrest', 'fin', 'age', 'race', 'wexp', 'mar',
       'paro', 'prio'],
      dtype='object') time column = week status column = arrest
Logrank test for race - 0 vs 1

<lifelines.StatisticalResult: logrank_test>
               t_0 = -1
 null_distribution = chi squared
degrees_of_freedom = 1
             alpha = 0.99
         test_name = logrank_test

---
 test_statistic    p  -log2(p)
           0.58 0.45      1.16
### Lifelines test of Proportional Hazards results with prio, age, race, paro, mar, fin as covariates on KM and CPH in lifelines test
<lifelines.CoxPHFitter: fitted with 432 total observations, 318 right-censored observations>
             duration col = 'week'
                event col = 'arrest'
      baseline estimation = breslow
   number of observations = 432
number of events observed = 114
   partial log-likelihood = -659.00
         time fit was run = 2023-08-10 05:49:04 UTC

---
            coef  exp(coef)   se(coef)   coef lower 95%   coef upper 95%  exp(coef) lower 95%  exp(coef) upper 95%
covariate                                                                                                         
prio        0.10       1.10       0.03             0.04             0.15                 1.04                 1.16
age        -0.06       0.94       0.02            -0.10            -0.02                 0.90                 0.98
race        0.32       1.38       0.31            -0.28             0.92                 0.75                 2.52
paro       -0.09       0.91       0.20            -0.47             0.29                 0.62                 1.34
mar        -0.48       0.62       0.38            -1.22             0.25                 0.30                 1.29
fin        -0.38       0.68       0.19            -0.75            -0.00                 0.47                 1.00

            cmp to     z      p   -log2(p)
covariate                                 
prio          0.00  3.53 <0.005      11.26
age           0.00 -2.95 <0.005       8.28
race          0.00  1.04   0.30       1.75
paro          0.00 -0.46   0.65       0.63
mar           0.00 -1.28   0.20       2.32
fin           0.00 -1.98   0.05       4.40
---
Concordance = 0.63
Partial AIC = 1330.00
log-likelihood ratio test = 32.77 on 6 df
-log2(p) of ll-ratio test = 16.39


   Bootstrapping lowess lines. May take a moment...


   Bootstrapping lowess lines. May take a moment...

The ``p_value_threshold`` is set at 0.01. Even under the null hypothesis of no violations, some
covariates will be below the threshold by chance. This is compounded when there are many covariates.
Similarly, when there are lots of observations, even minor deviances from the proportional hazard
assumption will be flagged.

With that in mind, it's best to use a combination of statistical tests and visual tests to determine
the most serious violations. Produce visual plots using ``check_assumptions(..., show_plots=True)``
and looking for non-constant lines. See link [A] below for a full example.

<lifelines.StatisticalResult: proportional_hazard_test>
 null_distribution = chi squared
degrees_of_freedom = 1
             model = <lifelines.CoxPHFitter: fitted with 432 total observations, 318 right-censored observations>
         test_name = proportional_hazard_test

---
           test_statistic    p  -log2(p)
age  km              6.99 0.01      6.93
     rank            7.40 0.01      7.26
fin  km              0.02 0.90      0.15
     rank            0.01 0.91      0.13
mar  km              1.64 0.20      2.32
     rank            1.80 0.18      2.48
paro km              0.06 0.81      0.31
     rank            0.07 0.79      0.34
prio km              0.92 0.34      1.57
     rank            0.88 0.35      1.52
race km              1.70 0.19      2.38
     rank            1.68 0.19      2.36


1. Variable 'age' failed the non-proportional test: p-value is 0.0065.

   Advice 1: the functional form of the variable 'age' might be incorrect. That is, there may be
non-linear terms missing. The proportional hazard test used is very sensitive to incorrect
functional forms. See documentation in link [D] below on how to specify a functional form.

   Advice 2: try binning the variable 'age' using pd.cut, and then specify it in `strata=['age',
...]` in the call in `.fit`. See documentation in link [B] below.

   Advice 3: try adding an interaction term with your time variable. See documentation in link [C]
below.


   Bootstrapping lowess lines. May take a moment...


   Bootstrapping lowess lines. May take a moment...


   Bootstrapping lowess lines. May take a moment...


   Bootstrapping lowess lines. May take a moment...


---
[A]  https://lifelines.readthedocs.io/en/latest/jupyter_notebooks/Proportional%20hazard%20assumption.html
[B]  https://lifelines.readthedocs.io/en/latest/jupyter_notebooks/Proportional%20hazard%20assumption.html#Bin-variable-and-stratify-on-it
[C]  https://lifelines.readthedocs.io/en/latest/jupyter_notebooks/Proportional%20hazard%20assumption.html#Introduce-time-varying-covariates
[D]  https://lifelines.readthedocs.io/en/latest/jupyter_notebooks/Proportional%20hazard%20assumption.html#Modify-the-functional-form
[E]  https://lifelines.readthedocs.io/en/latest/jupyter_notebooks/Proportional%20hazard%20assumption.html#Stratification