view lifelines_tool/test-data/readme_sample @ 0:dd49a7040643 draft

Initial commit
author fubar
date Wed, 09 Aug 2023 11:12:16 +0000
parents
children 232b874046a7
line wrap: on
line source

## Lifelines tool starting.
Using data header = Index(['Unnamed: 0', 'week', 'arrest', 'fin', 'age', 'race', 'wexp', 'mar',
       'paro', 'prio'],
      dtype='object') time column = week status column = arrest
Logrank test for race - 0 vs 1

<lifelines.StatisticalResult: logrank_test>
               t_0 = -1
 null_distribution = chi squared
degrees_of_freedom = 1
             alpha = 0.99
         test_name = logrank_test

---
 test_statistic    p  -log2(p)
           0.58 0.45      1.16
### Lifelines test of Proportional Hazards results with prio, age, race, paro, mar, fin as covariates on KM and CPH in lifelines test
<lifelines.CoxPHFitter: fitted with 432 total observations, 318 right-censored observations>
             duration col = 'week'
                event col = 'arrest'
      baseline estimation = breslow
   number of observations = 432
number of events observed = 114
   partial log-likelihood = -659.00
         time fit was run = 2023-08-09 07:43:37 UTC

---
            coef  exp(coef)   se(coef)   coef lower 95%   coef upper 95%  exp(coef) lower 95%  exp(coef) upper 95%
covariate                                                                                                         
prio        0.10       1.10       0.03             0.04             0.15                 1.04                 1.16
age        -0.06       0.94       0.02            -0.10            -0.02                 0.90                 0.98
race        0.32       1.38       0.31            -0.28             0.92                 0.75                 2.52
paro       -0.09       0.91       0.20            -0.47             0.29                 0.62                 1.34
mar        -0.48       0.62       0.38            -1.22             0.25                 0.30                 1.29
fin        -0.38       0.68       0.19            -0.75            -0.00                 0.47                 1.00

            cmp to     z      p   -log2(p)
covariate                                 
prio          0.00  3.53 <0.005      11.26
age           0.00 -2.95 <0.005       8.28
race          0.00  1.04   0.30       1.75
paro          0.00 -0.46   0.65       0.63
mar           0.00 -1.28   0.20       2.32
fin           0.00 -1.98   0.05       4.40
---
Concordance = 0.63
Partial AIC = 1330.00
log-likelihood ratio test = 32.77 on 6 df
-log2(p) of ll-ratio test = 16.39


   Bootstrapping lowess lines. May take a moment...


   Bootstrapping lowess lines. May take a moment...

The ``p_value_threshold`` is set at 0.01. Even under the null hypothesis of no violations, some
covariates will be below the threshold by chance. This is compounded when there are many covariates.
Similarly, when there are lots of observations, even minor deviances from the proportional hazard
assumption will be flagged.

With that in mind, it's best to use a combination of statistical tests and visual tests to determine
the most serious violations. Produce visual plots using ``check_assumptions(..., show_plots=True)``
and looking for non-constant lines. See link [A] below for a full example.

<lifelines.StatisticalResult: proportional_hazard_test>
 null_distribution = chi squared
degrees_of_freedom = 1
             model = <lifelines.CoxPHFitter: fitted with 432 total observations, 318 right-censored observations>
         test_name = proportional_hazard_test

---
           test_statistic    p  -log2(p)
age  km              6.99 0.01      6.93
     rank            7.40 0.01      7.26
fin  km              0.02 0.90      0.15
     rank            0.01 0.91      0.13
mar  km              1.64 0.20      2.32
     rank            1.80 0.18      2.48
paro km              0.06 0.81      0.31
     rank            0.07 0.79      0.34
prio km              0.92 0.34      1.57
     rank            0.88 0.35      1.52
race km              1.70 0.19      2.38
     rank            1.68 0.19      2.36


1. Variable 'age' failed the non-proportional test: p-value is 0.0065.

   Advice 1: the functional form of the variable 'age' might be incorrect. That is, there may be
non-linear terms missing. The proportional hazard test used is very sensitive to incorrect
functional forms. See documentation in link [D] below on how to specify a functional form.

   Advice 2: try binning the variable 'age' using pd.cut, and then specify it in `strata=['age',
...]` in the call in `.fit`. See documentation in link [B] below.

   Advice 3: try adding an interaction term with your time variable. See documentation in link [C]
below.


   Bootstrapping lowess lines. May take a moment...


   Bootstrapping lowess lines. May take a moment...


   Bootstrapping lowess lines. May take a moment...


   Bootstrapping lowess lines. May take a moment...


---
[A]  https://lifelines.readthedocs.io/en/latest/jupyter_notebooks/Proportional%20hazard%20assumption.html
[B]  https://lifelines.readthedocs.io/en/latest/jupyter_notebooks/Proportional%20hazard%20assumption.html#Bin-variable-and-stratify-on-it
[C]  https://lifelines.readthedocs.io/en/latest/jupyter_notebooks/Proportional%20hazard%20assumption.html#Introduce-time-varying-covariates
[D]  https://lifelines.readthedocs.io/en/latest/jupyter_notebooks/Proportional%20hazard%20assumption.html#Modify-the-functional-form
[E]  https://lifelines.readthedocs.io/en/latest/jupyter_notebooks/Proportional%20hazard%20assumption.html#Stratification