G-Computation or standardization for the Cox, Fine-Gray and binomial regression models for survival data

Klaus Holst & Thomas Scheike

2025-08-30

G-computation for the Cox and Fine-Gray models

Computing the standardized estimate (G-estimation) based on the Cox or Fine-Gray model : \[ \hat S(t,A=a) = n^{-1} \sum_i S(t,A=a,X_i) \] and this estimator has influence function \[ S(t,A=a,X_i) - S(t,A=a) + E( D_{A_0(t), \beta} S(t,A=a,X_i) ) \epsilon_i(t) \] where \(\epsilon_i(t)\) is the iid decomposition of \((\hat A(t) - A(t), \hat \beta- \beta)\).

These estimates have a causal interpration under the assumption of no-unmeasured confounders, and even without the causal assumptions this standardization can still be a useful summary measure.

First looking cumulative incidence via the Fine-Gray model for the two causes and making a plot of the standardized cumulative incidence for cause 1.

library(mets)
set.seed(100)

data(bmt); bmt$time <- bmt$time+runif(nrow(bmt))*0.001
dfactor(bmt) <- tcell~tcell
bmt$event <- (bmt$cause!=0)*1

fg1 <- cifregFG(Event(time,cause)~tcell+platelet+age,bmt,cause=1)
summary(survivalG(fg1,bmt,time=50))
#> G-estimator :
#>       Estimate Std.Err   2.5%  97.5%   P-value
#> risk0   0.4331 0.02749 0.3793 0.4870 6.321e-56
#> risk1   0.2727 0.05863 0.1577 0.3876 3.313e-06
#> 
#> Average Treatment effect: difference (G-estimator) :
#>     Estimate Std.Err   2.5%    97.5% P-value
#> ps0  -0.1605 0.06353 -0.285 -0.03597 0.01153
#> 
#> Average Treatment effect: ratio (G-estimator) :
#>        Estimate  Std.Err      2.5%     97.5%    P-value
#> [ps0] 0.6295004 0.139248 0.3565794 0.9024214 0.00779742
#> 
#> Average Treatment effect: 1-G (survival)-ratio (G-estimator) :
#> NULL

fg2 <- cifregFG(Event(time,cause)~tcell+platelet+age,bmt,cause=2)
summary(survivalG(fg2,bmt,time=50))
#> G-estimator :
#>       Estimate Std.Err   2.5%  97.5%   P-value
#> risk0   0.2127 0.02314 0.1674 0.2581 3.757e-20
#> risk1   0.3336 0.06799 0.2003 0.4668 9.281e-07
#> 
#> Average Treatment effect: difference (G-estimator) :
#>     Estimate Std.Err     2.5%  97.5% P-value
#> ps0   0.1208 0.07189 -0.02009 0.2617 0.09285
#> 
#> Average Treatment effect: ratio (G-estimator) :
#>       Estimate   Std.Err      2.5%    97.5%   P-value
#> [ps0] 1.567915 0.3627528 0.8569321 2.278897 0.1174496
#> 
#> Average Treatment effect: 1-G (survival)-ratio (G-estimator) :
#> NULL

cif1time <- survivalGtime(fg1,bmt)
plot(cif1time,type="risk"); 

Now looking at the survival probability

ss <- phreg(Surv(time,event)~tcell+platelet+age,bmt)
sss <- survivalG(ss,bmt,time=50)
summary(sss)
#> G-estimator :
#>       Estimate Std.Err   2.5%  97.5%    P-value
#> risk0   0.6539 0.02709 0.6008 0.7070 9.218e-129
#> risk1   0.5640 0.05971 0.4470 0.6811  3.531e-21
#> 
#> Average Treatment effect: difference (G-estimator) :
#>     Estimate Std.Err    2.5%   97.5% P-value
#> ps0 -0.08992  0.0629 -0.2132 0.03337  0.1529
#> 
#> Average Treatment effect: ratio (G-estimator) :
#>        Estimate    Std.Err      2.5%    97.5%   P-value
#> [ps0] 0.8624974 0.09446477 0.6773499 1.047645 0.1455042
#> 
#> Average Treatment effect: 1-G (survival)-ratio (G-estimator) :
#>       Estimate   Std.Err      2.5%    97.5%   P-value
#> [ps0] 1.259836 0.1894627 0.8884963 1.631176 0.1702385

Gtime <- survivalGtime(ss,bmt)
plot(Gtime)

G-computation for the binomial regression

We compare with the similar estimates using the Doubly Robust estimating equations using binregATE. The standardization from the G-computation can also be computed using a specialized function that takes less memory and is quicker (for large data).


## survival situation
sr1 <- binregATE(Event(time,event)~tcell+platelet+age,bmt,cause=1,
         time=40, treat.model=tcell~platelet+age)
summary(sr1)
#>    n events
#>  408    241
#> 
#>  408 clusters
#> coeffients:
#>              Estimate   Std.Err      2.5%     97.5% P-value
#> (Intercept)  0.676409  0.137007  0.407880  0.944939  0.0000
#> tcell1      -0.023675  0.346994 -0.703770  0.656420  0.9456
#> platelet    -0.492952  0.246158 -0.975412 -0.010492  0.0452
#> age          0.343939  0.115561  0.117444  0.570434  0.0029
#> 
#> exp(coeffients):
#>             Estimate    2.5%  97.5%
#> (Intercept)  1.96680 1.50363 2.5727
#> tcell1       0.97660 0.49472 1.9279
#> platelet     0.61082 0.37704 0.9896
#> age          1.41049 1.12462 1.7690
#> 
#> Average Treatment effects (G-formula) :
#>            Estimate   Std.Err      2.5%     97.5% P-value
#> treat0     0.521758  0.121476  0.283669  0.759846  0.0000
#> treat1     0.498083  0.323887 -0.136725  1.132891  0.1241
#> treat:1-0 -0.023675  0.346994 -0.703770  0.656420  0.9456
#> 
#> Average Treatment effects (double robust) :
#>            Estimate   Std.Err      2.5%     97.5% P-value
#> treat0     0.622969  0.027450  0.569168  0.676771  0.0000
#> treat1     0.607296  0.100270  0.410771  0.803821  0.0000
#> treat:1-0 -0.015673  0.103731 -0.218982  0.187636  0.8799

## relative risk effect 
estimate(coef=sr1$riskDR,vcov=sr1$var.riskDR,f=function(p) p[2]/p[1],null=1)
#>          Estimate Std.Err  2.5% 97.5% P-value
#> [treat1]   0.9748  0.1662 0.649 1.301  0.8797
#> 
#>  Null Hypothesis: 
#>   [treat1] = 1

## competing risks 
br1 <- binregATE(Event(time,cause)~tcell+platelet+age,bmt,cause=1,
         time=40,treat.model=tcell~platelet+age)
summary(br1)
#>    n events
#>  408    157
#> 
#>  408 clusters
#> coeffients:
#>              Estimate   Std.Err      2.5%     97.5% P-value
#> (Intercept) -0.191519  0.130883 -0.448044  0.065007  0.1434
#> tcell1      -0.712880  0.351489 -1.401786 -0.023974  0.0425
#> platelet    -0.531919  0.244495 -1.011119 -0.052718  0.0296
#> age          0.432939  0.107314  0.222607  0.643271  0.0001
#> 
#> exp(coeffients):
#>             Estimate    2.5%  97.5%
#> (Intercept)  0.82570 0.63888 1.0672
#> tcell1       0.49023 0.24616 0.9763
#> platelet     0.58748 0.36381 0.9486
#> age          1.54178 1.24933 1.9027
#> 
#> Average Treatment effects (G-formula) :
#>            Estimate   Std.Err      2.5%     97.5% P-value
#> treat0    -0.358395  0.118278 -0.590216 -0.126574  0.0024
#> treat1    -1.071275  0.331947 -1.721878 -0.420671  0.0012
#> treat:1-0 -0.712880  0.351489 -1.401786 -0.023974  0.0425
#> 
#> Average Treatment effects (double robust) :
#>            Estimate   Std.Err      2.5%     97.5% P-value
#> treat0     0.417302  0.027089  0.364208  0.470395  0.0000
#> treat1     0.153517  0.087910 -0.018785  0.325818  0.0808
#> treat:1-0 -0.263785  0.091551 -0.443222 -0.084348  0.0040

and using the specialized function

br1 <- binreg(Event(time,cause)~tcell+platelet+age,bmt,cause=1,time=40)
Gbr1 <- binregG(br1,data=bmt)
summary(Gbr1)
#> G-estimator :
#>       Estimate Std.Err   2.5%  97.5%   P-value
#> risk0   0.4177 0.02703 0.3648 0.4707 6.988e-54
#> risk1   0.2671 0.06185 0.1459 0.3883 1.571e-05
#> 
#> Average Treatment effect: difference (G-estimator) :
#>    Estimate Std.Err    2.5%   97.5% P-value
#> p1  -0.1506 0.06758 -0.2831 -0.0182  0.0258
#> 
#> Average Treatment effect: ratio (G-estimator) :
#>       Estimate   Std.Err      2.5%     97.5%    P-value
#> [p1] 0.6393758 0.1538551 0.3378254 0.9409262 0.01908221
#> 
#> Average Treatment effect: 1-G (survival)-ratio (G-estimator) :
#> NULL

## contrasting average age to +2-sd age, Avalues
Gbr2 <- binregG(br1,data=bmt,varname="age",Avalues=c(0,2))
summary(Gbr2)
#> G-estimator :
#>       Estimate Std.Err   2.5%  97.5%   P-value
#> risk0   0.3932 0.02537 0.3434 0.4429 3.738e-54
#> risk2   0.5997 0.05531 0.4913 0.7081 2.136e-27
#> 
#> Average Treatment effect: difference (G-estimator) :
#>    Estimate Std.Err   2.5%  97.5%   P-value
#> p1   0.2066 0.04996 0.1086 0.3045 3.564e-05
#> 
#> Average Treatment effect: ratio (G-estimator) :
#>      Estimate   Std.Err     2.5%    97.5%      P-value
#> [p1] 1.525375 0.1325776 1.265528 1.785223 7.408296e-05
#> 
#> Average Treatment effect: 1-G (survival)-ratio (G-estimator) :
#> NULL

SessionInfo

sessionInfo()
#> R version 4.5.1 (2025-06-13)
#> Platform: aarch64-apple-darwin24.5.0
#> Running under: macOS Sequoia 15.6.1
#> 
#> Matrix products: default
#> BLAS:   /Users/kkzh/.asdf/installs/R/4.5.1/lib/R/lib/libRblas.dylib 
#> LAPACK: /Users/kkzh/.asdf/installs/R/4.5.1/lib/R/lib/libRlapack.dylib;  LAPACK version 3.12.1
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> time zone: Europe/Copenhagen
#> tzcode source: internal
#> 
#> attached base packages:
#> [1] splines   stats     graphics  grDevices utils     datasets  methods  
#> [8] base     
#> 
#> other attached packages:
#> [1] timereg_2.0.7  survival_3.8-3 mets_1.3.7    
#> 
#> loaded via a namespace (and not attached):
#>  [1] cli_3.6.5           knitr_1.50          rlang_1.1.6        
#>  [4] xfun_0.53           KernSmooth_2.23-26  jsonlite_2.0.0     
#>  [7] future.apply_1.20.0 listenv_0.9.1       lava_1.8.1         
#> [10] htmltools_0.5.8.1   sass_0.4.10         rmarkdown_2.29     
#> [13] grid_4.5.1          evaluate_1.0.5      jquerylib_0.1.4    
#> [16] fastmap_1.2.0       numDeriv_2016.8-1.1 yaml_2.3.10        
#> [19] mvtnorm_1.3-3       lifecycle_1.0.4     compiler_4.5.1     
#> [22] codetools_0.2-20    ucminf_1.2.2        Rcpp_1.1.0         
#> [25] future_1.67.0       lattice_0.22-7      digest_0.6.37      
#> [28] R6_2.6.1            parallelly_1.45.1   parallel_4.5.1     
#> [31] Matrix_1.7-4        bslib_0.9.0         tools_4.5.1        
#> [34] globals_0.18.0      cachem_1.1.0