+ - 0:00:00
Notes for current slide
Notes for next slide

Penalized Estimation of Cumulative Effects

Andreas Bender, F Scheipl, W Hartl, A G Day, H Küchenhoff

Departement of Statistics, LMU Munich

2017/12/17

1 / 30

Outline




  • Motivation

  • Exposure-Lag-Response Associations

  • Application

2 / 30

Motivation

  • Multi-center study of critical care patients from 457 ICUs ( 10k patients)

  • maximum follow up of 60 days (we only consider short term survival t30)

  • Various confounders:

    • Age, Gender, BMI
    • Diagnosis, Admission Category
    • year of ICU admission
    • Apache II Score
    • ICU random effect
  • 11-day nutrition protocol

    • prescribed calories (determined at baseline t=0)
    • daily caloric intake
    • daily caloric adequacy (CA) = caloric intake/prescribed calories
3 / 30

Caloric Intake

4 / 30

Motivation

  • We are interested in how artificial nutrition (exposure) affects short term survival (outcome)

  • Difficulty:

    • effect of nutrition might have a temporal delay (e.g. nutrition today affects survival 4 days later)

    • effect of nutrition might "wear off" after some time (e.g. nutrition on day 1 likely won't affect the hazard on day 30)

    • the (delayed) effect of nutrition also depends on the amount of nutrition (caloric adequacy) provided, possibly non-linearly

    • the same amount of exposure might have a different effect depending on the follow up and exposure time

    • the effect may be cumulative (i.e., 5 days of malnutrition in a row may be worse than only 2 in a row or 5 days malnutrition scattered throughout the follow up while on the other days "correct" amount was provided)

5 / 30

Terminology

We use the following terminology and notation:

  • Time-to-event t: Time at which event times are observed

  • Time of exposure te: Time at which values of the exposure are observed (must not necessarily overlap temporally with t, measured in the same units or be in the same domain as t, e.g. calendar days ( te) vs. 24h periods (days) since admission to ICU t)

  • time-varying effects (TVE): Effects of time-constant covariates (covariates observed at the beginning of the follow-up) that can vary over time t

  • time-dependent covariates (TDC): Covariates whose values change over time. Value changes are recorded at exposure time te (here synonymous to exposure)

  • Exposure value z(te): The value of the TDC observed at exposure time te

  • Exposure history z: The complete history of observed values of the exposure/TDC z=(z(te,1),z(te,2),...,z(te,Q))

6 / 30

Terminology

A general cumulative effect/Exposure-Lag-Response Association (ELRA) can be defined as

g(z,t)=te:teth(t,te,z(te))dte

  • Partial effects h(t,te,z(te)): The effect of the TDC recorded at exposure time te with value z(te) on the hazard at follow up time t (the tri-variate function h is potentially non-linear in all three dimensions)

  • Cumulative effect g(z,t): The total (cumulated) effect of the partial effects on the log-hazard at time t given exposure history z

7 / 30

Lag-Lead-Window

The integration borders can be defined more general, such that g(z,t)=ttlagtleadttlagh(t,te,z(te))dte

  • Lag time tlag: The length of the delay until the TDC recorded at exposure time te starts to affect the hazard (often tlag=0)
  • Lead time tlead: The duration of the effect of the TDC observed at exposure time te
  • tlag and tlead define the set of exposures that contribute to the cumulative effect at time t as {z(te):te[ttlagtlead,ttlag]}
  • Minimal requirement: te:tet
  • Special case 0t follows with tlag=0 and tlead=t
  • Example ( tlag=4, tlead=3):
    • The last nutrition that will enter the cumulative effect at time t=10 is nutrition at tettlag=104=6, i.e. z(te=6)
    • The earliest nutrition that will contribute to the cumulative effect at time t=10 is nutrition at tettlagtlead=1043=3
8 / 30

Lag-Lead-Window

The integration borders can be defined even more general, such that

g(z,t)=ttlag(te)tlead(te)ttlag(te)h(t,te,z(te))dte=Te(t)h(t,te,z(te))dte

  • tlag and tlead times can themselves depend on (exposure) time

  • Te(t) is the set of exposure times te relevant to the cumulative effect at time t

  • We call Te(t) the Lag-Lead-Window or Window of effectiveness

9 / 30

Lag-Lead Window (Example)

10 / 30

ELRAs in the literature

Some models known from the literature follow as special cases of the general specification g(z,t)=Te(t)h(t,te,z(te)) when we assume that partial effects h only depend on latency tte instead of concrete combination of t and te, i.e., h(t=30,te=3,z(te))=!h(t=40,te=13,z(te))=!h~(tte=27,z(te))

  • DLNM: Distributed Lag Non-linear Models (Gasparrini et al, 2014, 2017): g(z,t)=Te(t)h(tte,z(te))
  • WCE: Weighted Cumulative Exposure (Sylvestre and Abrahamowicz, 2009): g(z,t)=Te(t)h(tte)z(te)

  • Also possible within general framework:

    • more flexible WCE: g(z,t)=Te(t)h(t,te)z(te)
    • time-varying DLNM (TV DLNM): g(z,t)=Te(t)h(t,tte,z(te))
11 / 30

Exposure-Lag-Response Association

  • g(z,t) represents the cumulative, time-varying effect of exposure history z on the log-hazard at time t

  • we define its contribution to the model's additive predictor as

g(zi,t)=Te(t)h(t~j,te,zi(te))dteq:te,qTe(t)Δqh(t~j,te,q,zi(te,q))t(κj1,κj],

with

  • t~j:=(κjκj1)/2,j=1,,J
  • partial effects h(t~j,te,zi(te))
  • quadrature weights Δq=te,qte,q1 for numerical integration are given by the time between two consecutive exposure measurements
12 / 30

Tensor product smooths

Low rank representation of the tri-variate smooth function h(t,te,z(te))==1Lr=1Rm=1MγrmBm(z(te))Br(te)B(t)

with

  • model matrix X=XtXteXz(te) and

  • penalty S=νz(te)IdRIdLSz(te)+νteIdLSteIdM+νtStIdRIdM

Estimate parameters γ by optimizing D(γ)+kνkγSkγ (Wood, 2011), where

  • D(γ) is the model deviance (of the Poisson GAMM)
  • γ contains all Spline basis coefficients and random effects
  • νk and Sk,k=1,,K are the smoothing parameters and penalty matrices for the k-th smooth term, respectively
13 / 30

Exposure-Lag Response Association

  • If we restrict the ELRA to be linear in the exposure, i.e., h(zi(te),te,t)=h~(te,t)zi(te) we can simplify to g(zi,t)q=1QΔ~i,qh~(te,q,t) with Δ~i,q={zi(te,q)Δq if te,qTe(t)0 else
14 / 30

Exposure-Lag Response Association

  • Spline bases for the bivariate functions h~(te,t) are set up via tensor product B-spline basis with marginal bases Bm(te),m=1,,M and Bk(t),k=1,,K defined over the exposure and hazard time domains, respectively

  • M and K delimit the maximal complexity of the ELRA

  • h~(te,t)=m=1Mk=1Kγm,kBm(te)Bk(t)

  • Combining above equations yields: g(zi,t)m=1Mk=1Kγm,kB~i,m(te,t)Bk(t), where B~i,m(te,t)=q=1QΔ~i,qBm(te).

15 / 30

Simulation - DLNM

  • λ(t|z)=λ0(t)exp(h(tte,z(te))dte)
  • t(0,40], te[40,40], z(te)[0,10]

16 / 30

Simulation (2) - TV DLNM

λ(t|z)=λ0(t)exp(h~(t,tte,z(te))dte)=λ0(t)exp(f(t)h(tte,z(te))dte) and f(t)=cos(πt/tmax)

17 / 30

Application

In the application example (categorical nutrition), we estimate log(λi(t|xi,zi,i))=f0(t)+p=1Pfp(xi,p,t)+g(zi,t)+bi with

  • f0(tj)=m=1Mγ0mBm(tj) represents the log baseline-hazard
  • f(xi,p,tj)=m=1M=1LγmBm(xi,p)B(tj) are potentially non-linear, potentially non-linearly time-varying effects of confounders xi,p
  • g(zi,t)=gC2(ziC2,t)+gC3(ziC3,t)
    • ziC2 and ziC3 dummy variables that indicate whether subject i received category C2 and C3 nutrition on day te,q,q=1,,11, respectively
    • gC2(zi,t)q=1QΔ~i,qC2h~C2(te,q,t)
  • bi is the random effect associated with ICU (cluster) i at which subject i is treated

C1 reference category

18 / 30

PAMM

  • Fortunately, we can fit survival models via Poisson GLMs/GAMMs by representing them as a Piece-wise exponential Additive Mixed Model (PAMMs)

  • to do so requires to

    • divide the follow up (0,tmax] into J intervals with J+1 cut-points 0=κ0<<κJ=tmax
    • transform the data into appropriate format (pseudo observations in each interval):
      • interval specific event-indicators δij, where δij=1 if subject i experienced an event in interval j (i.e. ti(κj1,κj] and Ti<Ci) and δij=0 else
      • offsets oij=log(tij), where tij=min(tiκj1,κjκj1) is the time subject i spent in interval j
    • in the jth interval (κj1,κj] estimate a piece-wise constant hazard rate λ(t)=λj  t(κj1,κj] (more intervals lead to better approximation)
  • See Holford 1980, Laird 1981, Friedman 1982, Whitehead 1982

19 / 30

Application (Results)

20 / 30

Application (Results)

Example:

  • z=(5×C2,6×C3)
  • zC2=(1,1,1,1,1,0,0,0,0,0,0), zC3=(0,0,0,0,0,1,1,1,1,1,1)
  • g(z,t~j=18.5)=g(zC2,18.5)+g(zC3,18.5)0.57

Risk reduction of exp(0.57)=0.57 compared to subject with 11×C1 nutrition (c.p)

21 / 30

Application (Results)

These bivariate surfaces are difficult to interpret as

  • they must be interpreted with respect to a subject who received C1 nutrition on all 11 days of nutrition protocol

  • partial effects hC2(t,te) and hC3(t,te) can both contribute to the cumulative effect, depending on the specific nutrition profile

  • for these reasons, we prefer to analyze and interpret estimated hazard ratios between hypothetical patients with different clinically relevant exposure histories ( z1 and z2)

ej=λ(t~j|z2)λ(t~j|z1)

22 / 30

Application (Results)

We compare the following nutrition profiles:

23 / 30

Application (Results)

Complete, mildly hypocaloric nutrition reduces risk of mortality compared to a complete, severely hypocaloric nutrition (Comparison B)

No further risk reduction when moving from mildly hypocaloric to partial or complete near target nutrition (Comparisons E, F)

Sensitivity analyses (Imputation of missing protocols, lag/lead specification, penalty structure, ...) show no substantive deviation from main results

24 / 30

Limitations (and outlook)

  • currently, tlag and tlead must be specified a priori would be nice if the lag-lead window could be selected data-driven (e.g. Obermeier et al., 2015)

  • we assume that patients released from hospital survived until the end of the follow-up ( t=30 ). Sensitivity analysis with hospital discharge as censoring event do not change the results Competing risks model for outcomes hospital discharge and death would be preferable

  • Modeling and interpretation of TDCs always difficult, especially if exogeneity is unclear, e.g.

    • although nutrition is provided by hospital staff, amount provided might depend on patients' health status
    • more recent values provide better confounder adjustment but may also be fully indicative of the outcome (indication bias)
  • Model choice becomes difficult when all effects are potentially non-linear and/or non-linearly time-varying (boosting ad double-penalty procedures promising)

25 / 30

Links and Acknowledgments

26 / 30

References

  • Friedman, Michael. “Piecewise Exponential Models for Survival Data with Covariates.” The Annals of Statistics 10, no. 1 (1982): 101–113.
  • Gasparrini, Antonio. “Modeling Exposure–lag–response Associations with Distributed Lag Non-Linear Models.” Statistics in Medicine 33, no. 5 (February 28, 2014): 881–99. https://doi.org/10.1002/sim.5963.
  • Gasparrini, Antonio, Fabian Scheipl, Ben Armstrong, and Michael G. Kenward. “A Penalized Framework for Distributed Lag Non-Linear Models.” Biometrics, January 1, 2017. https://doi.org/10.1111/biom.12645.
  • Holford, Theodore R. “The Analysis of Rates and of Survivorship Using Log-Linear Models.” Biometrics 36, no. 2 (1980): 299–305. https://doi.org/10.2307/2529982.
  • Laird, Nan, and Donald Olivier. “Covariance Analysis of Censored Survival Data Using Log-Linear Analysis Techniques.” Journal of the American Statistical Association 76, no. 374 (1981): 231–240. https://doi.org/10.2307/2287816.
  • Marra, Giampiero, and Simon N. Wood. “Coverage Properties of Confidence Intervals for Generalized Additive Model Components.” Scandinavian Journal of Statistics 39, no. 1 (March 1, 2012): 53–74. https://doi.org/10.1111/j.1467-9469.2011.00760.x.
  • Sylvestre, Marie-Pierre, and Michal Abrahamowicz. “Flexible Modeling of the Cumulative Effects of Time-Dependent Exposures on the Hazard.” Statistics in Medicine 28, no. 27 (2009): 3437–3453. https://doi.org/10.1002/sim.3701.
27 / 30

References

  • Whitehead, John. “Fitting Cox’s Regression Model to Survival Data Using GLIM.” Journal of the Royal Statistical Society. Series C (Applied Statistics) 29, no. 3 (1980): 268–75. https://doi.org/10.2307/2346901.
  • Wood, Simon N. Generalized Additive Models: An Introduction with R. Boca Raton and FL: Chapman & Hall/CRC, 2006.
  • Wood, Simon N. “Low-Rank Scale-Invariant Tensor Product Smooths for Generalized Additive Mixed Models.” Biometrics 62, no. 4 (December 1, 2006): 1025–36. https://doi.org/10.1111/j.1541-0420.2006.00574.x.
  • Wood, Simon N. “Fast Stable Restricted Maximum Likelihood and Marginal Likelihood Estimation of Semiparametric Generalized Linear Models.” Journal of the Royal Statistical Society: Series B (Statistical Methodology) 73, no. 1 (2011): 3–36. https://doi.org/10.1111/j.1467-9868.2010.00749.x.
  • Wood, Simon N. “On P-Values for Smooth Components of an Extended Generalized Additive Model.” Biometrika 100, no. 1 (March 1, 2013): 221–28. https://doi.org/10.1093/biomet/ass048.
  • Wood, Simon N., Fabian Scheipl, and Julian J. Faraway. “Straightforward Intermediate Rank Tensor Product Smoothing in Mixed Models.” Statistics and Computing, 2012. https://doi.org/10.1007/s11222-012-9314-z.
28 / 30

References

  • Wickham, Hadley. Ggplot2: Elegant Graphics for Data Analysis. 2nd ed. 2016. New York, NY: Springer, 2016.
  • Yihui Xie (2017). xaringan: Presentation Ninja. R package version 0.4.4. https://github.com/yihui/xaringan
  • Hadley Wickham, Romain Francois, Lionel Henry and Kirill Müller (2017). dplyr: A Grammar of Data Manipulation. R package version 0.7.4. https://CRAN.R-project.org/package=dplyr
29 / 30

Caloric Adequacy

  • caloric intake = calories from EN + PN + PF

  • caloric adequacy (CA):
    CA(%)=caloric intake/prescribed calories100

  • discretized caloric adequacy (in 3 categories):

    • C1: 0%CA<30% and no OI
    • C2:
      • 30%CA<70% and no OI or
      • 0%CA<30% and additional OI
    • C3:
      • CA70% or
      • 30%CA<70% and additional OI
30 / 30

Outline




  • Motivation

  • Exposure-Lag-Response Associations

  • Application

2 / 30

Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
Esc Back to slideshow